What is AMT (Aiconix Metadata Transport)?

1. Introduction

The Aiconix Metadata Transport (AMT) standard is a protocol developed by Aiconix GmbH for live transcription, subtitling and translation in real-time media applications. Designed for low-latency environments, AMT ensures smooth integration of automatic speech recognition (ASR), natural language processing (NLP) and AI-powered metadata processing for both live and recorded media.

Unlike traditional closed captioning formats that rely on static, pre-populated text files, AMT enables real-time processing and metadata enrichment. This includes speaker identification, sentiment analysis and other time-coded events. Rather than simply displaying subtitles, AMT provides instant speech-to-text conversion, automatic translation, word-level synchronisation and AI-enhanced metadata. These features make AMT particularly useful for live events, streaming platforms and multilingual broadcasts where speed, accuracy and adaptability are required.

COMPARISON	AMT	WebVTT	SRT	TTML	EBU STL	CEA-708
Live Support	✅ Yes	⚠️ Limited	❌ No	❌ No	❌ No	✅ Yes
Enabling Real-Time Translation	✅ Yes	❌ No	❌ No	❌ No	❌ No	❌ No
Additional AI Metadata (Sentiment, Speaker ID, etc.)	✅ Yes	❌ No	❌ No	❌ No	❌ No	⚠️ Limited
Streaming-Optimized	✅ Yes	✅ Yes	✅ Yes	✅ Yes	❌ No	❌ No
Low Latency	✅ Yes	✅ Yes	❌ No	❌ No	❌ No	❌ No

AMT bridges the gap between traditional subtitle formats and AI-powered metadata transport, making it ideal for live events, streaming and real-time multilingual workflows where additional metadata is critical.

2. Overview of AMT

AMT is a real-time metadata transport protocol that structures speech-to-text, translation, sentiment analysis and object recognition and transports this information through a webhook-like HTTPS Transport close to realtime - with minimal latency to the actual event.

KEY FEATURES:

Automatic transcription of live speech from streaming input

Real-time multilingual translation

Advanced metadata support, including

Sentiment analysis (to classify emotional tone)

Object, face and scene recognition

Speaker identification

Export to a structured format for seamless workflow integration

AMT's flexible structure allows additional metadata layers to be added, making it adaptable for lip-syncing, user interface enhancements and improved accessibility.

2.1 AMT Usage

AMT is currently used to deliver partial transcriptions to a live editor for real-time subtitle adjustments.

Partials are single-word elements that allow an editor to modify the transcript before publishing.

The protocol distinguishes between partial and full transcriptions, ensuring better sentence structure updates.

Translations are also processed using AMT, allowing all metadata enhancements to be made prior to final subtitle creation.

Aiconix plans to expand the use of AMT both internally and for its users.

3.0 Data Transport Mechanism

AMT uses HTTP POST to transport metadata and subtitles.

Authentication: Each session is identified by an authentication token and a session ID in the URL.

Encoding: UTF-8 is the default format.

Response expectation: The endpoint must return a 201 on successful transmission.

3.1 Main Object Structure

Each transmitted metadata object contains the following fields:

Fieldname	Type	Example	Description	Optional
sequenceId	integer	1	Subtitle sequence identifier	No
startTime	float	1.123	Start time in milliseconds	No
endTime	float	2.121	End time in milliseconds	No
content	string	"Example"	Subtitle text	No
contentLength	integer	8	Length of content	No
locale	string	"en"	Language code (ISO 639-1)	No
translations	array	-	Array of translations	Yes
elements	array	-	Lip-sync elements	No
sentiment	object	-	Sentiment analysis metadata	Yes
speaker	string	"s1"	Speaker ID	Yes
channel	string	"Agent"	Channel metadata	Yes
isComplete	bool	true	Whether the sequence is final	No
isPartial	bool	false	Whether the sequence is partial	No

3.2 Translation Object

AMT supports multi-language translation of subtitles:

Fieldname	Type	Example	Description	Optional
locale	string	"de"	Target language code (ISO 639-1)	No
content	string	"Dies ist ein Beispiel."	Translated text	No
contentLength	integer	21	Length of translated content	No
elements	array	-	Lip-sync aligned elements	No

3.3 Element Object

Each subtitle can be broken down into elements for precise time synchronization:

Fieldname	Type	Example	Description	Optional
startTime	float	1.01	Element start time	No
endTime	float	1.50	Element end time	No
content	string	"Example"	Text content	No
contentLength	integer	8	Length of content	No
isPunctuation	bool	false	Whether it's punctuation	Yes
isLineBreak	bool	false	Whether it’s a line break	Yes

3.4 Sentiment Object

AMT incorporates sentiment analysis to classify the emotional tone of speech.

Fieldname	Type	Example	Description	Optional
score	float	0.757	Confidence score (0-1)	No
label	string	"Neutral"	Sentiment category	No

3.5 Example JSON Payload

{ "sequenceId": 1, "startTime": "1.001", "endTime": "2.00", "content": "This is an example.", "contentLength": 19, "locale": "en", "isComplete": true, "translations": [ { "locale": "de", "content": "Dies ist ein Beispiel.", "contentLength": 21, "elements": [ { "content": "Dies", "startTime": "1.001", "endTime": "1.002", "isPunctuation": false }, { "content": "ist", "startTime": "1.002", "endTime": "1.200", "isPunctuation": false } ] } ], "elements": [ { "content": "This", "startTime": "1.001", "endTime": "1.002", "isPunctuation": false } ], "sentiment": { "score": 0.757, "label": "Neutral" }, "speaker": "s1", "channel": "Agent", "isComplete": true, "isPartial": false }

Advantages of AMT

The AMT standard is an innovative real-time metadata transport system designed to enhance live transcription, subtitling and translation with AI-powered metadata. It outperforms traditional subtitling formats by providing low latency, high accuracy and multilingual capabilities that integrate seamlessly into modern streaming environments and adds a versatile metadata layer.

📩 Questions? Ideas? 💡

Feel free to contact our team, we are looking for partners who are interested in exploring the capabilities of AMT and are happy to help!