TOC 
Network Working GroupS. Pfeiffer
Internet-DraftC. Parker
Expires: December 7, 2003CSIRO
 June 8, 2003

Specification of the ANNODEX(TM) annotation format for time-continuous bitstreams, Version 1.0
draft-pfeiffer-annodex-00

Status of this Memo

This document is an Internet-Draft and is in full conformance with all provisions of Section 10 of RFC2026 except that the right to produce derivative works is not granted.

Internet-Drafts are working documents of the Internet Engineering Task Force (IETF), its areas, and its working groups. Note that other groups may also distribute working documents as Internet-Drafts.

Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress."

The list of current Internet-Drafts can be accessed at http://www.ietf.org/ietf/1id-abstracts.txt.

The list of Internet-Draft Shadow Directories can be accessed at http://www.ietf.org/shadow.html.

This Internet-Draft will expire on December 7, 2003.

Copyright Notice

Copyright (C) The Internet Society (2003). All Rights Reserved.

Abstract

This specification defines a file format for annotating and indexing time-continuous bitstreams for the World Wide Web. The format has been named ANNODEX(TM) for annotating and indexing. The ANNODEX(TM) format enables the specification of named anchor points in time-continuous bitstreams together with textual annotations and hyperlinks in URI[4] format. These anchor points are merged time-synchronously with the time-continuous bitstreams on authoring a file in ANNODEX(TM) format. The ultimate aim of the ANNODEX(TM) format is to enable an integration of time-continous bitstreams into the browsing and searching functionality of the World Wide Web.

At this point in time, the right to produce derivative works is not granted to the IETF as the authors are uncertain about the necessity to create a working group. The specification is not encumbered by patents. The ANNODEX(TM) format is protected by a trademark to prevent the use of the term "annodex" for any related but non-conformant and therefore non-interoperable technology.



 TOC 

Table of Contents




 TOC 

1. Introduction

When searching the World Wide Web, continuous media files such as audio and video files are still treated as "dark matter". It is not possible to look inside such files, search for their content through common text-based search engines, and actually directly access points of interest inside them. The file can only be consumed in its entirety until the point of interest is reached. In addition, such files are "dead ends" in that by consuming their content the hyperlinking functionality of the Web is left behind.

This document specifies a file format for interleaving of XML markup with time-continuous data giving ANNODEX(TM) format media. The ANNODEX(TM) format together with the Continuous Media Markup Language (CMML)[14] and the URI standard[4], extended by temporal URI references[13] build the basis technology to enable searching and surfing of time-continuous data via existing Web infrastructure. The ANNODEX(TM) format enables encapsulation of any type of streamable time-continuous bitstream format thus being independent of current or future compression formats. The XML tags were chosen to be very similar to XHTML to enable a simple transfer of knowledge for HTML authors.

The file extension of ANNODEX(TM) files is ".anx". This document also applies for registration of the mime-type "application/annodex" for ANNODEX(TM) format bitstreams.

The structure of this document is as follows: the introduction describes the architecture of a Continuous Media Web based on ANNODEX(TM) format media files and give an overview of the ANNODEX(TM) media file format. The XML tags required to create ANNODEX(TM) format media consist of two types of frames: header and anchor frames. They form the annotation bitstream and are described in section 4. The section on media encapsulation then describes the mutiplexing format. The handling of the different time constructs in ANNODEX(TM) format media is quite complex and gets discussed in detail in section 6. The MIME type application and security considerations build the final sections. Last but not least the appendices give the actual specifications of the "head" and "a" DTDs and definitions and acronyms.

Please note that this document assumes that the reader has a fluent working knowledge of XML[1], HTML[2], XHTML[3] and the World Wide Web. Deep knowledge of the Ogg encapsulation format version 0[10] is also a prerequisite to understanding of this specification. It is also a sister document to the specification of the Continous Media Markup Language (CMML Version 1.0)[14] for authoring ANNODEX(TM) format bitstreams.



 TOC 

2. The architecture of a Continuous Media Web

As with Webpages, ANNODEX(TM) format bitstreams first have to be authored and then published on a Server. Authoring includes the creation of the media bitstream plus the creation of annotations (i.e. textual data descriptions), indexes (i.e. anchor points) and hyperlinks (i.e. URIs[4]) for fragments of the media data. Annotations, indexes and hyperlinks are encoded in XML[1] according to the DTDs given in the appendix and interleaved into the media document to create ANNODEX(TM) format bitstreams in a time-synchronous fashion. This procedure can be performed both on files and live streams. The collection of ANNODEX(TM) format bitstreams on the Internet is called the Continuous Media Web as it builds a Web of time-continuous resources.

Distribution of ANNODEX(TM) format bitstreams happens via a network protocol such as HTTP[5] or RTP/RTSP[6]. The basic process is the following: The client dispatches a download or streaming request to the server with a certain URI. The server resolves the URI and starts packetising ANNODEX(TM) format bitstreams, taking into account temporal URI fragment specifications.

In case of packet loss due to an unreliable transport, media data or anchor data may get lost; this may be important to the application or not. Both media and mark-up data are treated with the same importance. If a user doesn't care whether the media data is completely received, then the mark-ups will be regarded the same way. Anchors are typically treated as state changes; if an anchor tag gets lost, the next anchor tag will restore the proper state. We envisage, however, that a client may require the current state information, so there should be a protocol request for sending the current state again. This will be delivered by the server by inserting another copy of the currently active anchor into the ANNODEX(TM) bitstream.

To access the Continuous Media Web, a client such as a conformant Web browser is required. A client can link to an ANNODEX(TM) bitstream via a URI. A URI can point to a temporal offset in the ANNODEX(TM) bitstream using temporal URI fragment identifiers[13] or to a named offset by using the id tag of an anchor frame as a URI fragment identifier. In this way, direct access to points of interest in the media document is enabled. While playing back ANNODEX(TM) format bitstreams, a user is being offered hyperlinks (URIs) to other Web resources which (in the author's eye) are related to the currently displayed media content.

A client may be a special player or a browser plugin. This application must split an ANNODEX(TM) format bitstream into its constituent header and anchor frames, and the media document. A decoder is required to display the encapsulated media document after decoding it with the appropriate media decoder. While playing back the media document, the application displays the hyperlinks and the annotations for the active anchor frames.

Search engines can include published ANNODEX(TM) format files into their search repertoire by finding annotations in the anchor frames in a standard way independent of the encoding and packetising format of the media document. This allows any media format to be spidered. In addition, the protocol should allow to download only the CMML mark-up from a published ANNODEX(TM) format file. This will stop spiders from creating extensive network loads as they do not need to download the media bitstream to gain the necessary information. It also reduces the size of search archives, even for large amounts of published ANNODEX(TM) format files, because a CMML file contains all searchable annotations for the media fragments of its ANNODEX(TM) format file.



 TOC 

3. Overview of the ANNODEX(TM) bitstream format

The format of ANNODEX(TM) bitstreams consists of a bitstream of time-continuous data interspersed with structured XML mark-up of an annotation bitstream. It is designed to be used both as a persistent file format and as a streaming format. Any encoding format for time-continuous data can be encapsulated in the ANNODEX(TM) format as long as it is streamable and is based on a regular data sampling rate (called granulerate). XML mark-up is inserted between media packets at the synchronised point in time.

There are two types of XML mark-up that are inserted: a header frame ("head"), and an arbitrary number of anchor frames ("a"). There is only one head at the start of an annotation bitstream. It contains structured and unstructured meta data describing the complete time-continuous data bistream. In the simple case, an anchor frame contains information on the fragment of media between the current anchor and the next one (or the end of the document if none follows).

The following figure gives an example of such an ANNODEX(TM) format bitstream and the temporal regions during which the "head" and "a" frames are valid. It describes the simple case where anchor frames don't overlap in time and there is only one media bitstream.

	  
  Annodexed media file (default annotation track only)
  _______________________________________________________________________
  |    |   | |           | |             | |                  | |        |
  |head|   |a|           |a|             |a|                  |a|        |
  |    |   | |           | |             | |                  | |        |
  _______________________________________________________________________
           |-------------|
                         |---------------|
                                         |--------------------|
                                                              |----------|
  |----------------------------------------------------------------------|

        

There is also a more complex scheme of authoring anchors for ANNODEX(TM) format bitstreams. In it, anchors are grouped together by giving them a type. Anchors of one type are not allowed to overlap in time, but anchors of different type may overlap. This enables the creation of different tracks of annotation. The advantage is that it gives the author the choice to describe a specific media file from different aspects, e.g. by giving different language tracks. The client application then has the choice to display only the default track or offer all existing tracks to the user.



 TOC 

4. The annotation bitstream

The annotation bitstream consists of a "head" frame and and arbitrary number of "a" frames. These tags are briefly explained in this section.

4.1 The 'head' frame

A "head" frame is an XML document that contains information about the complete ANNODEX(TM) format bitstream. It is enclosed in "head" tags. The DTD for the "head" frame can be found at http://www.annodex.net/DTD/anxhead_1_0.dtd . It can be used for validation of a "head" frame.

An example for a "head" frame is the following:

<?xml version="1.0" encoding="UTF-8"?> 
<!DOCTYPE head SYSTEM "anxhead_1_0.dtd">
<head lang="en" defltlang="de">
  <title lang="en">The Matrix</title>
  <base href="http://www.foo.bar/"/>
  <meta name="Movie"     content="The Matrix"/>
  <meta name="Producer"  content="Joel Silver"/>
  <meta name="Director"  content="Larry Wachowski"/>
  <meta name="Director"  content="Andy Wachowski"/>
  <meta name="Writer"    content="The Wachowski Brothers"/>
</head>
	    

The xml declaration and the reference to the DTD make the "head" frame a proper xml document. The DTD of the "head" frame is given in the appendix and technically fully specifies the "head" frame. The semantic meaning of each of the tags and attributes is the same as in the CMML[14]. Please refer to the CMML specification document for details.

4.2 The 'a' frame

An anchor frame is an XML document that contains information about a fragment of the encapsulated time-continuous bitstream. It is active from the time instant in the time-continuous bitstream at which it is inserted until the time instant at which it is deactivated either through another anchor frame (on the same annotation track) or through the end of the file. It is enclosed in "a" tags. The DTD for the "a" frame can be found at http://www.annodex.net/DTD/anxa_1_0.dtd . It can be used for validation of an "a" frame.

An example for an "a" frame is the following:

<a id="no_spoon" 
   lang="en"
   href="http://www.blah.au/spoons.anx#@bent"
   hrefdesc="More images of the bent spoon"
   image="no_spoon.jpg">
  <meta name="Actor"       content="Keanu Reeves"/>
  <meta name="Actor"       content="Rowan Witt"/>
  <meta name="Cast.Reeves" content="Thomas A. Anderson/Neo"/>
  <meta name="Cast.Witt"   content="Spoon boy"/>
  <meta name="Scene"       content="Oracle"/>
  <desc>There is no spoon: Neo is waiting to see the Oracle in a room
  full of children doing seemingly impossible things. One is making
  spoons bend through telekinesis. Neo tries to do it himself, but
  fails. Spoon boy: "Do not try and bend the spoon that's impossible,
  instead only try to realize the truth." Neo: "What truth?" Spoon
  boy: "There is no spoon." Neo: "There is no spoon?" Spoon boy: "Then
  you'll see that it is not the spoon that bends, it is only
  yourself." Neo tries again...
  </desc>
  <desc lang="de">Den L&ouml;ffel gibt es nicht: Neo entdeckt beim Besuch
  des Orakels wie unwirklich seine Welt ist. Beim Versuch, einen
  L&ouml;ffel durch Telekinese zu verbiegen, bekommt er von dem Kind den
  Rat: "Den L&ouml;ffel gibt es nicht."
  </desc>
</a>
	    

Unlike the "head" frame, the anchor frame contained within an ANNODEX(TM) format media file does not contain an xml declaration and a reference to the DTD. This information can be extrapolated from the information stored in the "head" frame and would create an unnecessary overhead if included in every anchor frame. It is however necessary when extracting the anchor frame into a proper XML document. The used version of xml can be extracted from the related head frame, the dtd reference is the same as the one for the head frame with replacing every occurance of "head" by "a". For example:

<?xml version="1.0" encoding="UTF-8"?> 
<!DOCTYPE a SYSTEM "anxa_1_0.dtd">
	    

Although the "a" element can be considered as the root element of an anchor frame described as an XML document, it also does not contain a xmlns attribute. The reason is that the same namespace as in the associated "head" frame is valid for all "a" frames in an ANNODEX(TM) format bitstream and a repetition would only spend bandwidth unnecessarily and be a cause for error. Similarly the default language and directionality specified in the "head" frame are also valid for the anchor frames.

The DTD of the anchor frame is given in the appendix and technically fully specifies the frame format. The semantic meaning of each of the tags and attributes is the same as in the CMML[14]. Please refer to the CMML specification document for details.



 TOC 

5. Media encapsulation format

An ANNODEX(TM) format bitstream consists of XML markup in the annotation bitstream interleaved with the related media frames of the media bitstreams into a single bitstream.

It is not possible to use straight XML as encapsulation because XML cannot enclose binary data except encoded as Unicode. The use of Unicode would introduce too much overhead. Therefore, an encapsulation format that could handle binary bitstreams and textual frames was required.

The following list gives a summary of the requirements for the ANNODEX(TM) format bitstream:

The Ogg encapsulation format version 0[10] was chosen as the encapsulation format for ANNODEX(TM) format bitstreams as it provides for all the requirements and has proven reliable and stable.

5.1 Media mapping for Ogg encapsulation

This section specifies the way the Ogg media encapsulation framework is used for creating ANNODEX(TM) format bitstreams. As such, knowledge of the Ogg bitstream format as specified in the Ogg RFC[10] is presumed. Please also refer to that document for descriptions of the terms used in this document. This section describes the specific media mapping that is used for ANNODEX(TM) format bitstreams.

ANNODEX(TM) format bitstreams consist of one or more time-continuous media bitstreams and an XML annotation bitstream concurrently interleaved (in Ogg terms: multiplexed) into an Ogg bitstream. Sequential multiplexing is allowed, but can only happen with complete ANNODEX(TM) format bitstreams.

Every ANNODEX(TM) format bitstream consists of at least two logical bitstreams: the ANNODEX(TM) media mapping bitstream and the annotation bitstream that contains the "head" and the "a" tags. The bos pages of these two (in order) are followed by the bos pages of any number of media bitstreams. Then all the secondary header pages of all the media bitstreams follow, including a packet of the annotation bitstream containing the "head" tag as secondary header for the annotation bitstream. Then, the bitstream data is multiplexed in time-synchronous fashion.

5.2 The format of the ANNODEX(TM) media mapping bos

The ANNODEX(TM) media mapping bitstream consists only of one bos page which contains information for the complete physical bitstream. The bos page has the following format:

    0                   1                   2                   3
    0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1| Byte
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   | Identifier 'Annodex\0'                                        |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |                                                               |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   | Version major                 | Version minor                 |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   | Timebase numerator                                            |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |                                                               |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   | Timebase denominator                                          |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |                                                               |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   | UTC                                                           |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |                                                               |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |                                                               |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |                                                               |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |                                                               |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
	    

The LSb (least significant bit) comes first in the Bytes. Fields with more than one byte length are encoded LSB (least significant byte) first.

The fields in the ANNODEX(TM) bos page have the following meaning:

  1. Identifier: a 8 Byte field that identifies this file to be of ANNODEX(TM) format. It contains the magic numbers:
    0x41 'A'
    0x6e 'n'
    0x6e 'n'
    0x6f 'o'
    0x64 'd'
    0x65 'e'
    0x78 'x'
    0x00 '\0'

  2. Version major: 2 Byte short integer number signifying the major version number of the ANNODEX(TM) format bitstream. This document specifies the major version 1.
  3. Version minor: 2 Byte short integer number signifying the minor version number of the ANNODEX(TM) format bitstream. This document specifies the minor version 0.
  4. Timebase numerator & denominator: 8 Byte integer number each. They represent together the timebase of the ANNODEX(TM) format bitstream given as a rational number. The denominator represents the temporal resolution at which the timebase is given. E.g. 5 on 1000 results in a timebase of 0.005 sec. This enables a very high temporal resolution without having to store floating point numbers.
  5. UTC: a 20 Byte string containing a UTC time in the form of YYYYMMDDTHHMMSS.sssZ. It associates a calendar date and a wall-clock time with the timebase. It is a zero length string if not in use.

Please note: The possible temporal resolution of the timebase is on the order of 2^-64. However the time formats in use for media that are described in this document range from 1/24 to 1/60 for the different smpte formats and to 1/1000 for npt. Thus, this resolution is enough for anyone of them. What's more, this resolution is expected to accommodate any future needs of time resolution for any other time format (and time-continuous sampled data).

5.3 The format of the media and annotation bitstream media mapping bos

The media and annotation bitstreams start each with one bos page containing information required for the decoding of the bitstream. After that, secondary header pages follow that contain information to set up the decoder for the bitstream and other stream-specific information. Then, the pages that contain the actual data follow. For the annotation bitstream, the "head" frame is encapsulated into one (or more) secondary header pages. The anchor frames represent the actual data of the annotation bitstream.

The bos page of a media or annotation bitstream has the following format:

    0                   1                   2                   3
    0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1| Byte
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   | Identifier 'AnxData\0'                                        |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   | Granule rate numerator                                        |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |                                                               |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   | Granule rate denominator                                      |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |                                                               |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   | Number of secondary header pages                              |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   | Number of bytes used for mime type string                     |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   | Mime type string                                              |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   | ...                                                           |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   | Number  of bytes used for identifier string                   |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   | Identifier string                                             |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   | ...                                                           |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
	    

The LSb (least significant bit) comes first in the Bytes. Fields with more than one byte length are encoded LSB (least significant byte) first.

The fields in the ANNODEX(TM) bos page have the following meaning:

  1. Identifier: a 8 Byte field that identifies this file to be of a logical input bitstream with encoded information. It contains the magic numbers:
    0x41 'A'
    0x6e 'n'
    0x78 'x'
    0x44 'D'
    0x61 'a'
    0x74 't'
    0x61 'a'
    0x00 '\0'

  2. Granule rate numerator & denominater: 8 Byte integer number each. They represent the temporal resolution of the logical bitstream in Hz given as a rational number in the same way as the timebase attribute above.
  3. Number of secondary header pages: a 4 Byte integer number that contains the number of secondary header pages of that particular logical bitstream following after this bos page.
  4. Number of bytes used for mime type string: a 4 Byte integer number giving the length of the Mime type field following directly after this field and including the final NUL character.
  5. Mime type string: a character sequence containing the mime[8] type of the data encoded in this logical bitstream. E.g. for the annotation bitstream it will be "text/cmml" as upon extraction of the annotation bitstream from the ANNODEX(TM) format bitstream a CMML document[14] results.
  6. Number of bytes used for identifier string: a 4 Byte integer number giving the length of the identifier string following directly after this field, including the final NUL character.
  7. Identifier string: a character sequence containing an XML ID identifier text for this logical bitstream.



 TOC 

6. Handling time in the ANNODEX(TM) format bitstream

ANNODEX(TM) format bitstreams inherently represent one timeline only, where the different media and the annotation bitstream can be thought of as content tracks on that timeline. All of these tracks relate to the same timeline which starts at a certain time point and ends when the last bitstream ends. An example bitstream can be seen in the following figure. It consists of an ANNODEX(TM) format bitstream that contains 4 media bitstreams and the annotation bitstream. In the flat representation these will be multiplexed such that the data frames of each of these bitstreams occurs at the correct time.

The following bitstream is a conceptual representation of the time intervals covered by the different logical bitstreams:

t0                                                                   tn
|------------------------------------------------------------------->|
----------------------------------------------------------------------
| a1    | a2     | a3                   | a4                         |
----------------------------------------------------------------------
annotation bitstream

---------------------------------------------
| audio bitstream 1                         |
---------------------------------------------
        --------------------------------------------------------------
        | video bitstream 1                                          |
        --------------------------------------------------------------
                 -----------------------------------------------------
                 | audio bitstream 2                                 |
                 -----------------------------------------------------
                        ------------------------------
                        | video bitstream 2          |
                        ------------------------------
	    

The time point at which the ANNODEX(TM) format bitstream starts (t0 in the above example) is called the "timebase" and represents the playback time in seconds associated with the beginning of the ANNODEX(TM) format bitstream. This start time may but does not have to be 0 - it can be any positive time offset. This time offset is stored in the ANNODEX(TM) bitstream bos page.

Each one of the encapsulated media bitstreams and the annotation bitstream have their own temporal resolution at which they can provide data to cover the given timeline. This temporal resolution is usually given through the sampling rate of the particular bitstream. For example, a raw audio bitstream at CD quality is sampled with a sampling rate of 44100 Hz. A video bitstream may be sampled with a frame rate of 25 frames per second. This temporal resolution is stored in the "granulerate" field of the bos page of the bitstream.

The "granulerate" is used for the calculation of the time position for which a data packet of the media bitstreams contains data. The "granulepos" field in an Ogg page when divided by the "granulerate" of that page's logical bitstream provides the time position that is reached in that bitstream after decoding all data packets finished on this page. E.g. if an audio bitstream has a granulerate of 44100 and starts at 0, then a granulepos of 88200 signifies that the bitstream has reached the second sec after the end of decoding this page's packets.

The annotation bitstream's "granulerate" can be chosen arbitrarily by the bitstream multiplexer. One option is to choose the least common multiple of the granulerates of all the media bitstreams to gain at least the resolution of the bitstreams. However, that resultion may not be enough compared to the one that the author of anchors is asking for on insertion time. One solution is to accommodate for all possible time schemes of the anchors. Thus, a time resolution of the least common multiple of the resolution of all the npt and smpte time schemes is another option.

The possible time schemes with their respecitve resolutions are:

To get to integer values, it is necessary to multiply all resolutions by 1000 and then take the least common multiple: lcm(1000000, 24000, 23976, 25000, 30000, 29970, 50000, 60000, 59940) = 2997000000. The "granulerate" would therefore be 2997000. This provides for a temporal resolution on the order of 10^-6, accommodating for a mixed use of all the above given time schemes.

The "granulepos" of the (set of) page(s) holding an anchor frame of the annotation stream has to signify the start time of that anchor frame. E.g. if the "granulerate" of the annotation bitstream is 1000, the "timebase" is 0, and an anchor is to be inserted at npt=12.020, its "granulepos" will be 12020. Anchors can be repeated in the ANNODEX(TM) format bitstream, which will be signified by having the same "track" attribute and the same page_sequence_number as the previous anchor frame.



 TOC 

7. MIME media type registration for 'application/annodex'

This section contains the registration information for the 'application/annodex' media type. While this media type is not approved by the IANA, 'application/x-annodex' may be used.

To: ietf-types@iana.org

Subject: Registration of MIME media type application/annodex

MIME media type name: application

MIME subtype name: annodex

Required parameters: none

Optional parameters: none

Encoding Considerations: the ANNODEX(TM) enables encapsulation of any type of encoding format. The authoring software has to provide for the encoders, while the client software has to look out for the browsers.

Security considerations: see next section.

Interoperability considerations: the ANNODEX(TM) bitstream format is a free specification that is independent of any media encoding format. It is designed to provide interoperability with the existing World Wide Web. Its specification is not patented and can be implemented by third parties without patent considerations.

Additional information:

  • Magic numbers: "OggS" identifies an Ogg page, "Annodex" identifies an Ogg page with an ANNODEX(TM) format bitstreams, and "AnxData" signifies an Ogg page with media or annotation bitstream
  • File extension: .anx
  • Macintosh File Type Code: "ANDX"
  • Intended usage: COMMON
  • Fragment identifiers: Any named element, i.e. element that contains an "id" attribute, may be referenced through a fragment identifier of a URI. However, the values of the id attribute of the anchor tags are the most important ones used for addressing the identified "a" tags in the ANNODEX(TM) format bitstream. Also, the generic temporal URI fragment addressing scheme[13] can be used as a fragment identifier on ANNODEX(TM) format bitstreams and then relates to that specific time offset in the ANNODEX(TM) format bitstream, calculated with respect to the "timebase" of the ANNODEX(TM) bos page.

    An example for a URI to a named media fragment is the following:

    	      http://www.foo.bar/matrix.anx#no_spoon
    	      

    Examples for URIs to temporal fragments are the following:

    	      http://www.foo.bar/matrix.anx#@npt=21.4
    	      http://www.foo.bar/matrix.anx#@smpte-25=01:00:21:10
    	      http://www.foo.bar/matrix.anx#@utc=20030601T240000Z
    	      



     TOC 

    8. Security considerations

    ANNODEX(TM) format bitstreams contain several multiplexed binary media and one XML annotation bitstream. There is no generic encryption or signing mechanism provided for the complete bitstream or anyone of its parts. As the format of the encapsulated media bitstreams is not prescribed and is identified through the "mimetype" field in that bitstream's bos page, it is possible to encrypt or sign that media bitstream and then mark it accordingly with a mime type that signifies the encryption. It is up to the applications that use this bitstream to provide an appropriate codec to handle such bitstreams.

    As ANNODEX(TM) format bitstreams contain binary media bitstreams, it is possible to include executable content in them. This can be an issue with applications that decode these bitstreams, especially when they are used in a network scenario. Such applications have to ensure correct handling of manipulated bitstreams, of buffer overflow and the like.



     TOC 

    References

    [1] World Wide Web Consortium, "Extensible Markup Language (XML) 1.0", W3C XML, October 2000.
    [2] World Wide Web Consortium, "HTML 4.01 Specification", W3C HTML, December 1999.
    [3] World Wide Web Consortium, "XHTML(TM) 1.0 The Extensible Hyper Text Markup Language", W3C XHTML, January 2000.
    [4] Berners-Lee, T., Fielding, R. and L. Masinter, "Uniform Resource Identifiers (URI): Generic Syntax", RFC 2396, August 1998.
    [5] Fielding, R., Gettys, J., Mogul, J., Nielsen, H., Masinter, L., Leach, P. and T. Berners-Lee, "Hypertext Transfer Protocol -- HTTP/1.1", RFC 2616, June 1999.
    [6] Schulzrinne, H., Rao, A. and R. Lanphier, "Real Time Streaming Protocol (RTSP)", RFC 2326, April 1998.
    [7] Alvestrand, H., "Tags for the Identification of Languages", RFC 1766, March 1995.
    [8] Freed, N. and N. Borenstein, "Multipurpose Internet Mail Extensions (MIME) Part Two: Media Types", RFC 2046, November 1996.
    [9] Whitehead, E. and M. Murata, "XML Media Types", RFC 2376, July 1998.
    [10] Pfeiffer, S., "The Ogg encapsulation format version 0", RFC 3533, May 2003.
    [11] The Society of Motion Picture and Television Engineers, "SMPTE STANDARD for Television, Audio and Film - Time and Control Code", ANSI 12M-1999, September 1999.
    [12] ISO, TC154., "Data elements and interchange formats -- Information interchange -- Representation of dates and times", ISO 8601, 2000.
    [13] Pfeiffer, S. and C. Parker, "Syntax of temporal URI fragment specifications (work in progress)", I-D draft-pfeiffer-temporal-fragments-00.txt, Feburary 2003.
    [14] Pfeiffer, S. and C. Parker, "Specification of the Continuous Media Markup Language (CMML), Version 1.0 (work in progress)", I-D draft-pfeiffer-cmml-00.txt, June 2003.


     TOC 

    Authors' Addresses

      Silvia Pfeiffer
      Commonwealth Scientific and Industrial Research Organisation CSIRO, Australia
      Locked Bag 17
      North Ryde, NSW 2113
      Australia
    Phone:  +61 2 9325 3141
    EMail:  Silvia.Pfeiffer@csiro.au
    URI:  http://www.cmis.csiro.au/Silvia.Pfeiffer/
      
      Conrad D. Parker
      Commonwealth Scientific and Industrial Research Organisation CSIRO, Australia
      Locked Bag 17
      North Ryde, NSW 2113
      Australia
    Phone:  +61 2 9325 3133
    EMail:  Conrad.Parker@csiro.au
    URI:  http://www.cmis.csiro.au/Conrad.Parker/


     TOC 

    Appendix A. Head frame DTD

    <?xml version="1.0" encoding="UTF-8" ?>
    
    <!--
    
       Markup of a ANNODEX(TM) format head frame DTD.
       Derived from the
       Continuous Media Markup Language (CMML), version 1.0
    
       Namespace = http://www.annodex.net/cmml
    
       Copyright (c) 2001 
       Commonwealth Scientific and Industrial Research Organisation
       (CSIRO), Australia.
       All Rights Reserved. 
    
       This DTD module is identified by the PUBLIC and SYSTEM identifiers:
    
       PUBLIC "-//CSIRO//DTD ANXHEAD 1.0//EN"
       SYSTEM "http://www.annodex.net/DTD/anxhead_1_0.dtd"
    
       $Revision: 1.0 $
       $Date: 2003/06/01 24:00:00 $
    -->
    
    <!-- **************************** -->
    <!-- Definition of Imported Names -->
    <!-- **************************** -->
    
    <!-- a Uniform Resource Identifier, see [RFC2396] -->
    <!ENTITY % URI "CDATA">
    
    <!-- a language code, as per [RFC1766] -->
    <!ENTITY % LanguageCode "NMTOKEN">
    
    <!-- internationalization attributes
      xml:lang    language code (as per XML 1.0 spec)
      dir          direction for weak/neutral text
    -->
    <!ENTITY % i18n
     "lang    %LanguageCode; #IMPLIED
      dir     (ltr|rtl)      #IMPLIED"
      >
    
    <!-- **************************** -->
    <!-- Document Structure           -->
    <!-- **************************** -->
    
    <!-- ROOT ELEMENT: -->
    <!-- head tag containing description of a specific media document -->
    <!-- ============================================================ -->
    <!-- xmlns   = namespace of the head tags -->
    <!-- profile = space-separated list of URIs to locate meta tag schemes -->
    <!-- i18n    = the base language of the head's attribute values and text 
                   content -->
    <!-- defltlang & defltdir  = the default language for the whole document -->
    <!ELEMENT head (meta*,
                    ((title, meta*, (base, meta*)?) |
                     (base, meta*, (title, meta*)?)))>
    <!ATTLIST head
      id          ID             #IMPLIED
      xmlns       %URI;          #FIXED 'http://www.annodex.net/cmml'
      %i18n;
      defltlang   %LanguageCode; #IMPLIED
      defltdir    (ltr|rtl)      #IMPLIED
      profile     %URI;          #IMPLIED
      >
    
    <!-- TITLE tag giving descriptive title of the media document  -->
    <!-- ========================================================= -->
    <!-- i18n  = the base language of the title -->
    <!ELEMENT title (#PCDATA)>
    <!ATTLIST title 
      id          ID             #IMPLIED
      %i18n;
      >
    
    <!-- BASE URI of the document (empty content) --> 
    <!-- ======================================== -->
    <!-- href = URI associated with the document; all relative URI references
                get interpreted relative to this base -->
    <!ELEMENT base EMPTY>
    <!ATTLIST base
      id          ID             #IMPLIED
      href        %URI;          #REQUIRED
      >
    
    <!-- META description tags of the document (empty content) -->
    <!-- ===================================================== -->
    <!-- i18n    = the default language for the meta attribute and content text -->
    <!-- name    = identifies a property name; does not list legal values for this 
                   attribute --> 
    <!-- content = specifies a property's value; does not list legal values for 
                   this attribute -->
    <!-- scheme  = names a scheme to be used to interpret the property's value 
                   (see the profiles tag in the head element for locating these) -->
    <!ELEMENT meta EMPTY>
    <!ATTLIST meta
      id          ID             #IMPLIED
      %i18n;
      name        NMTOKEN        #IMPLIED
      content     CDATA          #REQUIRED
      scheme      CDATA          #IMPLIED
      >
    
    
    
      


     TOC 

    Appendix B. Anchor frame DTD

    <?xml version="1.0" encoding="UTF-8" ?>
    
    <!--
    
       Markup of a ANNODEX(TM) format anchor frame DTD.
       Derived from the
       Continuous Media Markup Language (CMML), version 1.0
    
       Namespace = http://www.annodex.net/cmml
    
       Copyright (c) 2001 
       Commonwealth Scientific and Industrial Research Organisation
       (CSIRO), Australia.
       All Rights Reserved. 
    
       This DTD module is identified by the PUBLIC and SYSTEM identifiers:
    
       PUBLIC "-//CSIRO//DTD ANXA 1.0//EN"
       SYSTEM "http://www.annodex.net/DTD/anxa_1_0.dtd"
    
       $Revision: 1.0 $
       $Date: 2003/06/01 24:00:00 $
    -->
    
    <!-- **************************** -->
    <!-- Definition of Imported Names -->
    <!-- **************************** -->
    
    <!-- a Uniform Resource Identifier, see [RFC2396] -->
    <!ENTITY % URI "CDATA">
    
    <!-- a language code, as per [RFC1766] -->
    <!ENTITY % LanguageCode "NMTOKEN">
    
    <!-- internationalization attributes
      xml:lang    language code (as per XML 1.0 spec)
      dir          direction for weak/neutral text
    -->
    <!ENTITY % i18n
     "lang    %LanguageCode; #IMPLIED
      dir     (ltr|rtl)      #IMPLIED"
      >
    
    <!-- **************************** -->
    <!-- Document Structure           -->
    <!-- **************************** -->
    
    <!-- ROOT ELEMENT -->
    <!-- A tag containing information for a specific fragment -->
    <!-- ==================================================== -->
    <!-- xmlns    = namespace of the anchor tags -->
    <!-- i18n     = default language for all the desc tags in the anchor -->
    <!-- track    = defines different sets of anchor tags; anchor tags of same 
                    type cannot overlap temporally-->
    <!-- href     = specifies the location of a Web resource, thus defining a 
                    link between the current element (the source anchor) and the 
                    destination anchor given by this attribute -->
    <!-- hrefdesc = textual description of the link between the current element 
                    (the source anchor) and the destination anchor given by the 
                    href attribute -->
    <!-- image    = link to an image that is representative for this fragment -->
    <!ELEMENT a (meta*, desc*)>
    <!ATTLIST a
      id          ID             #IMPLIED
      xmlns       %URI;          #FIXED 'http://www.annodex.net/cmml'
      %i18n;
      track       CDATA          "default"
      href        %URI;          #IMPLIED
      hrefdesc    CDATA          #IMPLIED
      image       %URI;          #IMPLIED
      >
    
    <!-- META description tags of the document (empty content) -->
    <!-- ===================================================== -->
    <!-- i18n    = the default language for the meta attribute and content text -->
    <!-- name    = identifies a property name; does not list legal values for 
                   this attribute --> 
    <!-- content = specifies a property's value; does not list legal values for 
                   this attribute -->
    <!-- scheme  = names a scheme to be used to interpret the property's value 
                   (see the profiles tag in the head element for locating these) -->
    <!ELEMENT meta EMPTY>
    <!ATTLIST meta
      id          ID             #IMPLIED
      %i18n;
      name        NMTOKEN        #IMPLIED
      content     CDATA          #REQUIRED
      scheme      CDATA          #IMPLIED
      >
    
    <!-- DESC human-readable, textual description of the anchor (annotation) -->
    <!-- =================================================================== -->
    <!-- i18n = language of the data in the description, as per [RFC1766] -->
    <!ELEMENT desc (#PCDATA)>
    <!ATTLIST desc
      id          ID             #IMPLIED
      %i18n;
      >
    	  


     TOC 

    Appendix C. Definitions of terms and abbreviations

    Anchor frame:
    XML data containing information on a fragment of a time-continuous bitstream.
    Fragment:
    a subpart of a media document covering some temporal interval.
    Mark-up:
    XML tags and their content used to describe a media document.
    ANNODEX(TM) media:
    encapsulated time-continuous bitstream with Head and Anchor frames.
    Annotating:
    the task of giving textual descriptions to fragments of media documents.
    Indexing:
    the task of identifying index points for media documents or fragments thereof.
    Hyperlinking:
    the task of linking from one Web resource to another. If a link has an offset into the resource, this is sometimes called deep hyperlinking.
    Head frame:
    XML data containing information on an ANNODEX(TM)ed media file.
    media packet:
    a block of digital data that represents a temporal subpart of a stream of continuous media. Media packets of one continuous media file do not overlap in time.
    bitstream:
    a sequence of time-continuous data.



     TOC 

    Appendix D. Glossary of acronyms

    CMML:
    Continuous Media Markup Language.
    DTD:
    Document Type Declaration.
    XML:
    eXtensible Markup Language.
    CMWeb:
    Continuous Media Web.
    Web:
    World Wide Web.
    URI:
    Unified Resource Identifier.



     TOC 

    Appendix E. Acknowledgements

    The authors greatly acknowledge the contributions of: Andre Pang, Andrew Nesbit, and Simon Lai in developing this standard..



     TOC 

    Intellectual Property Statement

    Full Copyright Statement

    Acknowledgement