Open standards for annotating and indexing networked media

The simplicity of html, http, and URIs has enabled the ubiquitous, distributed, world-wide data network we all value so highly. However, the World Wide Web is a predominantly text based information and community space. Some data is still being handled as "dark matter" on the Web: data whose content is easily perceived by humans, but badly handled by computers. One such type of data is time-continuously sampled data like audio and video. is developing open standards and open source tools to build webs of distributed media, hooking into the existing World Wide Web infrastructure of Web searching, Web proxying, and Web browsing.


The Annodex technology aims at creating a self-sustaining community of people based around a compelling mode of interaction with digital media. This Continuous Media Web is composed of time-continuous data - video such as self-published home movies, webcam streams and news feeds, and audio including music and Internet radio. It is orthogonal to the existing text based Web and links cross over from either side easily.

The core technical functions tying time-continuously sampled data together are a standardised method of deep hyperlinks, standardised forms of annotation and indexing, and a standardised format to package and transfer these. The user experience is similar to that of an interactive television with a near infinite number of contextually linked channels. New services are possible including accurate search engines, nested discussion sites and customised video portals and news feeds.

The compelling social activity that underpins this environment is the ability of users to contribute to it and interact with it directly. All these factors combine to create a community who sustains the usage of technology by creating increasing amounts of ever more intricately linked and annotated content.


Technically speaking, the CMWeb technology is enabling a composition of media which are more loosely linked than current multimedia compositions and do not share a common timeline in their presentation. A document author may link to any clip of a media stream elsewhere on the Internet. Content-based searching of clips is enabled via textual annotations of clips also created by the document author.

To that end, there are new techniques for:

  1. the identification of a position in a media stream for referencing to it,
  2. the attachment of a link to a media clip for linking out of the stream to other Web resources, and
  3. the annotation of media clips for search applications.

The technology is independent of the encoding format of the media.

The base technology consists of a way for marking up and distributing time-continuous media files over the Internet in a simple way. It provides a markup language to author the hyperlinks and annotations in a structured way (CMML), an encapsulation format that integrates the markup with the media data (ANNODEX), and an extension to the existing URI specification for deep hyperlinking (temporal URI addressing).

Proven, standards based technology

The technology is based on proven file formats including XML by W3 for the continuous media markup language CMML, and Ogg (RFC 3533) by for the Annodex format, which together provide streamable, proxyable encapsulation of media and annotations.


Tutorial notes (1.6MB pdf)
Video demo (6.4MB mov)

Recommended Codecs
How to Annodex Theora

Creating content
Example Science CMWeb
Searching Science CMWeb
Other sites


Annodex(TM) is a trademark of CSIRO Australia. All other trademarks are the properties of their respective owners.