Subtitle format comparison

Here is a table with the syntactical differences of different existing subtitle formats for which Chris Chiu wrote conversion scripts to CMML.

 

Subtitle Name Extension Subtitle Type Character Encoding Line Break Text Styling Metadata Info Timings Timing Precision
AQTitle *.aqt Text-based None Yes No No Framings Dependent on Frame
JACOSub *.usf Text-based None Yes Yes No Elapsed Time 10 Milliseconds (1/100th sec)
MicroDVD *.sub Text-based None Yes No No Framings Dependent on Frames
MPSub *.sub Text-based None Yes No Yes Sequential Time 10 Milliseconds (1/100th sec)
Phoenix Subtitle *.pjs Text-based None No No No Framings Dependent on Frames
RealText *.rt HTML-based Unicode (UTF-8) Yes Yes (SMIL) No Elapsed Time 10 Milliseconds (1/100th sec)
SubRip *.srt Text-based Informally Unicode Yes No No Elapsed Time 1 Millisecond (1/1000th sec)
SSA/ASS *.ssa Text-based None No Yes Yes Elapsed Time 10 Milliseconds (1/100th sec)
SubViewer *.sub Text-based None Yes No Yes Elapsed Time 10 Milliseconds (1/100th sec)
SAMI *.smi HTML-based Unicode (Windows-1252) Yes Yes (CSS) Yes Framings Dependent on Frames
Universal Subtitle *.usf XML Unicode (UTF-8) No Yes (XML DTD) No Elapsed Time 1 Millisecond (1/1000th sec)
VOBSub *.sub, *.idx Image-based N/A N/A N/A N/A Elapsed Time 1 Millisecond (1/1000th sec)
VPlayer *.txt Text-based None Yes No No Framing or Time 10 Milliseconds (1/100th sec)


Structural Syntax Information:

  • File Extensions:
    • They are not standardised for all subtitle formats. Hence, the syntax
      and structure of the subtitle is important to verify to determine the
      subtitle type, rather than from the file extension alone.
    • As an example, MicroDVD, MPSub, SubViewer and VOBSub all use the same
      extension.
  • Subtitle Type:
    • Subtitle type refers to the formatting of the subtitle format that is
      employed.
    • Text-based subtitles use a simplified ASCII text notation to define the
      start and end times/frames of the subtitles, and to define the text
      positioning and styles if applicable.
    • HTML-based subtitles use HTML style tags to define timings and text
      formatting. However, these formats do not strictly adhere to XML standards
      as attributes may not be in quotation marks, or that closing tags can be
      missing.
    • XML-based subtitles adhere to XML Schema standards, with DTD's to define
      the structure and syntax of the subtitles. Parsing is being implemented
      using the XML Parser Libraries for Python
    • Image-based subtitles use raster images (compressed JPEG images
      typically) to display subtitles. These kind of subtitle formats typically
      encapsulate the subtitle images in one layer, while the synchronisation of
      the subtitle images are stored in a separate layer. Hence, the extraction of
      image-based subtitles, depending on the format, can result in more than one
      file.
  • Character Encoding:
    • The character encoding is important as it means that the parsed CMML
      script can be strictly validated.
    • While some of the subtitle formats define Unicode as the character
      encoding type, older formats tend to have no encoding standard whatsoever
      (and so by assumption, plain old ASCII).
    • The scripts do not currently convert non-unicode characters to their
      unicode equivalents.
  • Line Breaks:
    • It is important to determine whether line breaks are used for text, HTML
      or XML subtitle formats. This is because CMML does not currently support the
      use of line breaks as of present. When parsing a subtitle line break, the
      two methods with parsing line breaks is to remove all line breaks and
      concatenate the lines together, or recreate the line break with indentation.
      The second method would be used to make the CMML file more readable, but
      transcoding the Ogg Theora file and CMML file to Annodex may result in the
      line breaks being ignored.
  • Text Style Type:
    • Some formats support the use of text styles to allow the positioning and
      formatting of subtitles shown on the screen. The degree of text manipulation
      will vary depending on the format if it is specifically defined by the
      original subtitle creators, but the HTML subtitle formats typically employ
      standards-based text formatting such as SMIL and CSS. XML subtitle formats
      can either specify the styling from the DTD, or use standards-based
      formatting.
    • Text styles not in CSS are not supported by the scripts at the moment. A
      future functionality would be to convert these styles into CSS, but it must
      be noted that the variety of text styles used by some subtitle formats may
      be poorly documented and/or difficult to obtain.
  • Metadata Info:
    • Some of the subtitle formats support metadata information to detail the
      video title, the names and authors of the subtitles, as well as a variety of
      attributes to store extraneous information. Some of this data is parsed by
      the scripts into CMML, while others have been omitted. Some subtitle formats
      include technical data including the file hash and size, as well as
      copyright information.
  • Timings:
    • Timings are of particular importance to subtitles for the most obvious
      reason: to ensure the correct synchronisation of subtitles to the video and
      audio playback.
    • As CMML timings are based on an elapsed time format (Hours, Minutes,
      Seconds), the subtitles that employ the same timing method is the easiest to
      convert into CMML without problems.
    • Sequential timing formats (which is used by MPSub) is different in that
      the subtitles are timed depending on when the last subtitle was displayed.
      The addition of these times results converts these timings to an elapsed
      time format.
    • The older formats use the video/audio frame to synchronise the subtitles.
      The main issue that arises is that we need to know the frames per second
      (fps) value of the media before the timings can be converted into an elapsed
      time format.
  • Timing Precision:
    • The timing precision of most subtitles is 10 milliseconds (1x10^-2 sec),
      but newer subtitle formats employ greater precisions up to 1 millisecond
      (1x10^-3 sec).
    • Note for Frame timing subtitles converted into elapsed times:
      • They are converted to a level of 1 millisecond precision due to the
        potential of rounding up/down errors from fractional conversion between
        timing values.