What is AMT (Aiconix Metadata Transport)?
The Aiconix Metadata Transport (AMT) standard is an innovative low-latency protocol for real-time transcription, subtitling, and AI-powered metadata enrichment—such as sentiment analysis, speaker ID, and multilingual translation—designed for modern streaming platforms, live events, and broadcast workflows.
1. Introduction
The Aiconix Metadata Transport (AMT) standard is a protocol developed by Aiconix GmbH for live transcription, subtitling and translation in real-time media applications. Designed for low-latency environments, AMT ensures smooth integration of automatic speech recognition (ASR), natural language processing (NLP) and AI-powered metadata processing for both live and recorded media.
Unlike traditional closed captioning formats that rely on static, pre-populated text files, AMT enables real-time processing and metadata enrichment. This includes speaker identification, sentiment analysis and other time-coded events. Rather than simply displaying subtitles, AMT provides instant speech-to-text conversion, automatic translation, word-level synchronisation and AI-enhanced metadata. These features make AMT particularly useful for live events, streaming platforms and multilingual broadcasts where speed, accuracy and adaptability are required.
COMPARISON |
AMT |
WebVTT |
SRT |
TTML |
EBU STL |
CEA-708 |
Live Support |
✅ Yes |
⚠️ Limited |
❌ No |
❌ No |
❌ No |
✅ Yes |
Enabling Real-Time Translation |
✅ Yes |
❌ No |
❌ No |
❌ No |
❌ No |
❌ No |
Additional AI Metadata (Sentiment, Speaker ID, etc.) |
✅ Yes |
❌ No |
❌ No |
❌ No |
❌ No |
⚠️ Limited |
Streaming-Optimized |
✅ Yes |
✅ Yes |
✅ Yes |
✅ Yes |
❌ No |
❌ No |
Low Latency |
✅ Yes |
✅ Yes |
❌ No |
❌ No |
❌ No |
❌ No |
AMT bridges the gap between traditional subtitle formats and AI-powered metadata transport, making it ideal for live events, streaming and real-time multilingual workflows where additional metadata is critical.
2. Overview of AMT
AMT is a real-time metadata transport protocol that structures speech-to-text, translation, sentiment analysis and object recognition and transports this information through a webhook-like HTTPS Transport close to realtime - with minimal latency to the actual event.
KEY FEATURES:- Automatic transcription of live speech from streaming input
- Real-time multilingual translation
- Advanced metadata support, including
- Sentiment analysis (to classify emotional tone)
- Object, face and scene recognition
- Speaker identification
- Export to a structured format for seamless workflow integration
AMT's flexible structure allows additional metadata layers to be added, making it adaptable for lip-syncing, user interface enhancements and improved accessibility.
2.1 AMT Usage
AMT is currently used to deliver partial transcriptions to a live editor for real-time subtitle adjustments.
- Partials are single-word elements that allow an editor to modify the transcript before publishing.
- The protocol distinguishes between partial and full transcriptions, ensuring better sentence structure updates.
- Translations are also processed using AMT, allowing all metadata enhancements to be made prior to final subtitle creation.
Aiconix plans to expand the use of AMT both internally and for its users.
3.0 Data Transport Mechanism
AMT uses HTTP POST to transport metadata and subtitles.
- Authentication: Each session is identified by an authentication token and a session ID in the URL.
- Encoding: UTF-8 is the default format.
- Response expectation: The endpoint must return a 201 on successful transmission.
3.1 Main Object Structure
Each transmitted metadata object contains the following fields:
Fieldname |
Type |
Example |
Description |
Optional |
sequenceId |
integer |
1 |
Subtitle sequence identifier |
No |
startTime |
float |
1.123 |
Start time in milliseconds |
No |
endTime |
float |
2.121 |
End time in milliseconds |
No |
content |
string |
"Example" |
Subtitle text |
No |
contentLength |
integer |
8 |
Length of content |
No |
locale |
string |
"en" |
Language code (ISO 639-1) |
No |
translations |
array |
- |
Array of translations |
Yes |
elements |
array |
- |
Lip-sync elements |
No |
sentiment |
object |
- |
Sentiment analysis metadata |
Yes |
speaker |
string |
"s1" |
Speaker ID |
Yes |
channel |
string |
"Agent" |
Channel metadata |
Yes |
isComplete |
bool |
true |
Whether the sequence is final |
No |
isPartial |
bool |
false |
Whether the sequence is partial |
No |
3.2 Translation Object
AMT supports multi-language translation of subtitles:
Fieldname |
Type |
Example |
Description |
Optional |
locale |
string |
"de" |
Target language code (ISO 639-1) |
No |
content |
string |
"Dies ist ein Beispiel." |
Translated text |
No |
contentLength |
integer |
21 |
Length of translated content |
No |
elements |
array |
- |
Lip-sync aligned elements |
No |
3.3 Element Object
Each subtitle can be broken down into elements for precise time synchronization:
Fieldname |
Type |
Example |
Description |
Optional |
startTime |
float |
1.01 |
Element start time |
No |
endTime |
float |
1.50 |
Element end time |
No |
content |
string |
"Example" |
Text content |
No |
contentLength |
integer |
8 |
Length of content |
No |
isPunctuation |
bool |
false |
Whether it's punctuation |
Yes |
isLineBreak |
bool |
false |
Whether it’s a line break |
Yes |
3.4 Sentiment Object
AMT incorporates sentiment analysis to classify the emotional tone of speech.
Fieldname |
Type |
Example |
Description |
Optional |
score |
float |
0.757 |
Confidence score (0-1) |
No |
label |
string |
"Neutral" |
Sentiment category |
No |
3.5 Example JSON Payload
{
"sequenceId": 1,
"startTime": "1.001",
"endTime": "2.00",
"content": "This is an example.",
"contentLength": 19,
"locale": "en",
"isComplete": true,
"translations": [
{
"locale": "de",
"content": "Dies ist ein Beispiel.",
"contentLength": 21,
"elements": [
{ "content": "Dies", "startTime": "1.001", "endTime": "1.002", "isPunctuation": false },
{ "content": "ist", "startTime": "1.002", "endTime": "1.200", "isPunctuation": false }
]
}
],
"elements": [
{ "content": "This", "startTime": "1.001", "endTime": "1.002", "isPunctuation": false }
],
"sentiment": {
"score": 0.757,
"label": "Neutral"
},
"speaker": "s1",
"channel": "Agent",
"isComplete": true,
"isPartial": false
}
Advantages of AMT
The AMT standard is an innovative real-time metadata transport system designed to enhance live transcription, subtitling and translation with AI-powered metadata. It outperforms traditional subtitling formats by providing low latency, high accuracy and multilingual capabilities that integrate seamlessly into modern streaming environments and adds a versatile metadata layer.
📩 Questions? Ideas? 💡
Feel free to contact our team, we are looking for partners who are interested in exploring the capabilities of AMT and are happy to help!