Skip to content
English
  • There are no suggestions because the search field is empty.

What is AMT (Aiconix Metadata Transport)?

The Aiconix Metadata Transport (AMT) standard is an innovative low-latency protocol for real-time transcription, subtitling, and AI-powered metadata enrichment—such as sentiment analysis, speaker ID, and multilingual translation—designed for modern streaming platforms, live events, and broadcast workflows.

1. Introduction

    The Aiconix Metadata Transport (AMT) standard is a protocol developed by Aiconix GmbH for live transcription, subtitling and translation in real-time media applications. Designed for low-latency environments, AMT ensures smooth integration of automatic speech recognition (ASR), natural language processing (NLP) and AI-powered metadata processing for both live and recorded media.  

    Unlike traditional closed captioning formats that rely on static, pre-populated text files, AMT enables real-time processing and metadata enrichment. This includes speaker identification, sentiment analysis and other time-coded events. Rather than simply displaying subtitles, AMT provides instant speech-to-text conversion, automatic translation, word-level synchronisation and AI-enhanced metadata. These features make AMT particularly useful for live events, streaming platforms and multilingual broadcasts where speed, accuracy and adaptability are required. 
     

    COMPARISON 

    AMT 

    WebVTT 

    SRT 

    TTML 

    EBU STL 

    CEA-708 

    Live Support 

    ✅ Yes 

    ⚠️ Limited 

    ❌ No 

    ❌ No 

    ❌ No 

    ✅ Yes 

    Enabling Real-Time Translation 

    ✅ Yes 

    ❌ No 

    ❌ No 

    ❌ No 

    ❌ No 

    ❌ No 

    Additional AI Metadata (Sentiment, Speaker ID,  etc.) 

    ✅ Yes 

    ❌ No 

    ❌ No 

    ❌ No 

    ❌ No 

    ⚠️ Limited 

    Streaming-Optimized 

    ✅ Yes 

    ✅ Yes 

    ✅ Yes 

    ✅ Yes 

    ❌ No 

    ❌ No 

    Low Latency 

    ✅ Yes 

    ✅ Yes 

    ❌ No 

    ❌ No 

    ❌ No 

    ❌ No 

    AMT bridges the gap between traditional subtitle formats and AI-powered metadata transport, making it ideal for live events, streaming and real-time multilingual workflows where additional metadata is critical. 

    2. Overview of AMT

    AMT is a real-time metadata transport protocol that structures speech-to-text, translation, sentiment analysis and object recognition and transports this information through a webhook-like HTTPS Transport close to realtime - with minimal latency to the actual event. 

    KEY FEATURES:
    • Automatic transcription of live speech from streaming input 
    • Real-time multilingual translation 
    • Advanced metadata support, including 
    • Sentiment analysis (to classify emotional tone) 
    • Object, face and scene recognition 
    • Speaker identification 
    • Export to a structured format for seamless workflow integration 

    AMT's flexible structure allows additional metadata layers to be added, making it adaptable for lip-syncing, user interface enhancements and improved accessibility. 

    2.1 AMT Usage 

    AMT is currently used to deliver partial transcriptions to a live editor for real-time subtitle adjustments. 

    • Partials are single-word elements that allow an editor to modify the transcript before publishing. 
    • The protocol distinguishes between partial and full transcriptions, ensuring better sentence structure updates. 
    • Translations are also processed using AMT, allowing all metadata enhancements to be made prior to final subtitle creation. 

    Aiconix plans to expand the use of AMT both internally and for its users. 

    3.0 Data Transport Mechanism

    AMT uses HTTP POST to transport metadata and subtitles. 

    • Authentication: Each session is identified by an authentication token and a session ID in the URL. 
    • Encoding: UTF-8 is the default format. 
    • Response expectation: The endpoint must return a 201 on successful transmission. 

    3.1 Main Object Structure 

    Each transmitted metadata object contains the following fields: 

    Fieldname 

    Type 

    Example 

    Description 

    Optional 

    sequenceId 

    integer 

    Subtitle sequence identifier 

    No 

    startTime 

    float 

    1.123 

    Start time in milliseconds 

    No 

    endTime 

    float 

    2.121 

    End time in milliseconds 

    No 

    content 

    string 

    "Example" 

    Subtitle text 

    No 

    contentLength 

    integer 

    Length of content 

    No 

    locale 

    string 

    "en" 

    Language code (ISO 639-1) 

    No 

    translations 

    array 

    Array of translations 

    Yes 

    elements 

    array 

    Lip-sync elements 

    No 

    sentiment 

    object 

    Sentiment analysis metadata 

    Yes 

    speaker 

    string 

    "s1" 

    Speaker ID 

    Yes 

    channel 

    string 

    "Agent" 

    Channel metadata 

    Yes 

    isComplete 

    bool 

    true 

    Whether the sequence is final 

    No 

    isPartial 

    bool 

    false 

    Whether the sequence is partial 

    No 

    3.2 Translation Object 

    AMT supports multi-language translation of subtitles: 

    Fieldname 

    Type 

    Example 

    Description 

    Optional 

    locale 

    string 

    "de" 

    Target language code (ISO 639-1) 

    No 

    content 

    string 

    "Dies ist ein Beispiel." 

    Translated text 

    No 

    contentLength 

    integer 

    21 

    Length of translated content 

    No 

    elements 

    array 

    Lip-sync aligned elements 

    No 

    3.3 Element Object 

    Each subtitle can be broken down into elements for precise time synchronization: 

    Fieldname 

    Type 

    Example 

    Description 

    Optional 

    startTime 

    float 

    1.01 

    Element start time 

    No 

    endTime 

    float 

    1.50 

    Element end time 

    No 

    content 

    string 

    "Example" 

    Text content 

    No 

    contentLength 

    integer 

    Length of content 

    No 

    isPunctuation 

    bool 

    false 

    Whether it's punctuation 

    Yes 

    isLineBreak 

    bool 

    false 

    Whether it’s a line break 

    Yes 

    3.4 Sentiment Object 

    AMT incorporates sentiment analysis to classify the emotional tone of speech. 

    Fieldname 

    Type 

    Example 

    Description 

    Optional 

    score 

    float 

    0.757 

    Confidence score (0-1) 

    No 

    label 

    string 

    "Neutral" 

    Sentiment category 

    No 

    3.5 Example JSON Payload 


      "sequenceId": 1, 
      "startTime": "1.001", 
      "endTime": "2.00", 
      "content": "This is an example.", 
      "contentLength": 19, 
      "locale": "en", 
      "isComplete": true, 
      "translations": [ 
        { 
          "locale": "de", 
          "content": "Dies ist ein Beispiel.", 
          "contentLength": 21, 
          "elements": [ 
            { "content": "Dies", "startTime": "1.001", "endTime": "1.002", "isPunctuation": false }, 
            { "content": "ist", "startTime": "1.002", "endTime": "1.200", "isPunctuation": false } 
          ] 
        } 
      ], 
      "elements": [ 
        { "content": "This", "startTime": "1.001", "endTime": "1.002", "isPunctuation": false } 
      ], 
      "sentiment": { 
        "score": 0.757, 
        "label": "Neutral" 
      }, 
      "speaker": "s1", 
      "channel": "Agent", 
      "isComplete": true, 
      "isPartial": false 

     

    Advantages of AMT 

    The AMT standard is an innovative real-time metadata transport system designed to enhance live transcription, subtitling and translation with AI-powered metadata. It outperforms traditional subtitling formats by providing low latency, high accuracy and multilingual capabilities that integrate seamlessly into modern streaming environments and adds a versatile metadata layer. 

    📩 Questions? Ideas? 💡

    Feel free to contact our team, we are looking for partners who are interested in exploring the capabilities of AMT and are happy to help!