What is the Speaker Training?

Module Description

Our Deep Model Customizer with its Speaker Training module simplifies the process of training new voices for speaker identification. This requires only a few training audio samples and no negative training data. Our algorithms have been designed to train in the most efficient way possible.

The feature is simple and intuitive to use, no expertise is required. The samples can either be uploaded directly or it can be extracted automatically from the videos using the Speaker Dataset Creation feature. Once trained, the resulting Model from the training will become available in the Speaker Identification.

How does it work?

Select the "Dataset" module in the vertical menu on the left - it shows an icon with a box on it.
Create a new Audio Dataset. A dataset can be all persons of a project or a parliament.
Create Classes in the Dataset. Each class is one specific speaker.
Upload audio training samples.
Evaluate the finished Dataset. Giving you feedback on the quality.
Train the Model. This makes it available for Recognition
Use the trained Model. In the Speaker Identification Module of the Deep Media Analyzer.

Custom Dataset Types:
The following dataset types can be custom trained in the Deep Model Customizer:

Faces, for building regional face recognition models
Speaker, for builging regional voice identification models
Landmark, for recognizing regional buildings
Logo, for identifying custom logos.

How to create a new class?

Once you have opened a Audio Training dataset, you can create new classes or edit existing ones.
When you create a new class, you will need to add certain information:

Class name: Label of the person you want to recognise, e.g. name or ID.
Note: User-defined text, e.g. for versioning or multiple aged versions of the same person.
Priority Level: Defines the priority level for the class
Expiration Date: When the class should be automatically removed from the dataset.

How to add training samples to a class?

Our speaker training is based on few-shot learning, which allows us to work with only a few training audio samples, compared to generic models that require thousands of training samples.

For uploading audio samples, open the class and click "Upload Sample"
Drag and Drop or Select one or multiple audio samples for this person.
The audio snippets now appear in the class as samples, each audio file is a single sample.

Each sample can be activated or deactivated using the toggle switch or being edited or deleted via the Three Dots button (...).

How to evaluate a class?

Once your classes are ready and contain enough training samples, you want to evaluate the quality of your dataset, before triggering the training. For starting the evaluation, go to the Dataset Overview and click "Evaluate". Depending on the size of the dataset, the evaluation will take some time.

A dataset is considered "good" (shown in green) if there are no errors or warnings in the dataset. It is therefore suitable for training an AI model and can be trained.

Can I train an AI model despite errors and warnings in the dataset?

Despite errors and warnings, an AI model can be trained from a dataset. The results of the evaluation do not prevent training, but it is still recommended to exclude errors from the training and to take a closer look at warnings.

How to train a custom AI model?

There are two ways to start speaker training on a custom model:

If you are in the "Dataset" view: Select the dataset you want to train, click "Train", and provide the necessary information.
If you are in the "AI Models" view: Click on "Start Training", enter the necessary information and select a dataset to use as training examples for this model.

When creating a new Model you are asked the following informations:

Model Name: Name of this Model
Model Type: Audio, Face Recognition, Logo Recognition , Landmark Recognition.
Model Description: Userdefined text for describing the model.

Versioning of AI models:
Versioning of AI Models is done automatically in the "AI Models" view on each AI Model card. The version and history of each model can be found in the top right-hand corner in a grey box with the version number.
This way you don't need to use the model description field for versioning.

How use the custom AI Model?

Once the model has been successfully trained, it will appear as an option in the "Model" drop-down menu when configuring speaker identification in the Deep Media Analyzer.