What is the Speaker Dataset Creation?

Module Description

The Speaker Dataset Creation module allows you to automatically read names from on-screen text inserts (such as lower thirds) and associate them with the corresponding voice in the video. This helps you create a custom speaker identification dataset from your own media files.

The module is designed to simplify and accelerate the collection of training data for personalized speaker identification. Once your dataset is created, it can be further trained using the Deep Model Customizer and then be used for the recognition of the the Deep Media Analyzer.

How does it work?

Create a new Audio Dataset:

Note:
In order to use the Speaker Dataset Creation you first need a target Audio Dataset where the extracted classes will be directed to. The explanation how to create a new custom dataset you find here. Then progress with the follwing steps:

Select the Media File: Choose the media file you want to analyze.
Activate the Speaker Dataset Creation Module: In the left column, select the "Speaker Dataset Creation" module.
Define the Model & Parameters: Choose the model for analysis from the available options, set the parameters, and click the yellow "Add Module" button.
Start the Analysis: You can either add more modules or begin the analysis immediately by clicking "Start Analysis"

What Parameters are available?

Apply generic method (Checkbox)
Use generic method for the name recognition (First Name and Family Name combined)
Dataset (Dropdown)
Select the target Dataset for the extracted classes. All extracted classes will appear in this Dataset, once the module is finished.
Single Name Detection (Checkbox)
Enable single name recognition to allow only first names to be recognised as well.
Min. Face Size (50-112)
Minimum height of the smallest side of the detected face box. Decrease to allow extraction of smaller faces.
Sharpness Threshold (0-500)
Minimum sharpness value of an extracted face. Decrease to extract more blurred faces.
Offset Start
Offset for the start of the audio extraction in seconds.
Offset End
Offset for the end of the audio extraction in seconds.
Segment Merging Threshold
Segments will be merged if the distance between segments is lower than this threshold in seconds.

Voice Index:
The second way of learning custom voices is using the Voice Indexer. It offers the easiest way to manage unknown voices. Each speaker is automatically assigned a unique ID, allowing you to rename it instantly. With the Voice Index, every voice becomes recognizable right away without the need for training data, but requires manual labeling.

Displaying and Editing the Results:

Go to "Datasets" section and open the Dataset you selected in Step 1 of the Speaker Dataset Creation.

Classes overview:

The first level shows all the classes contained in this dataset.
Each card of such a recognition class contains the following information

Class name: Name extracted from the creation of the speaker dataset.
Number of samples: Number of sample images (training images) contained in this class
Evaluation: Has the dataset already been evaluated, if so, was it good or were there any problems?
Click on the three dots (...) to access the following functions:
- Edit: To edit a class. See below for more details.
- Copy ID: Copies the ID of this class to the clipboard.
- Delete: Delete this class.

Dataset Summary Card

On the right you find all informations on this dataset in the Dataset summary card.
This card contains the following information:

Dataset ID: Unique-ID of this dataset.
Dataset Description: User edited text describing the dataset e.g. for versioning.
Created: Date when the dataset was created.
Last Modified: Date when the dataset was Last Modified
Samples (Active/Total): Number of sample images in this dataset (across all classes), separated in active and inactive ones.
Classes (Active/Total): Number of classes in this dataset, splitted in active once and inactive ones.

Edit the entire Dataset:

In the Dataset view you can use your cursor and the keyboard to select classes for perform an action.
In the top you find the buttons for editing the dataset and setting up the search and filtration.

The Button with the Pencil brings you into the Dataset Editing mode.
- Dataset name: Rename the dataset
- Dataset type: Change the type of the datset
- Dataset Description: Additional information for identifiying the correct dataset or version.
The Button with the page and the arrow bring you to the Dataset export function
This function will help you to export the entire classes with all training samples from this dataset. You need to define the standard along which this export should be performed via a dropdown. These standards are available: Basic Dataset Export, IAIS Audio Dataset Export, CSV Dataset Summary.
The button with the X will delete all classes of this dataset at once.
For avoiding the accidental deletion of all classes, you need to enter the dataset name again to perform this action.
The button with the list icon will bring you to the list view, displaying more items but with smaller thumbnails. Once you are in the list view, you can get back to the grid view by clicking the grid icon.
The button with the checkbox icon will select all classes to perform an action from the menu.

When you have selected on or several classes you can perform the following actions by using the buttons in the header bar:

Move to a dataset: Move these classes to an existing dataset, they will disappear in this dataset.
Copy to a dataset: Move these classes to an existing dataset, they will still be as well in this dataset.
Remove selected classes: Delete only this classes.
Activate/Deactivate selected: Activate or Deactivate all selected classes.
Merge into a class: Merge selected classes into one class.

Class Summary Card

On the right you find all informations on this class in the Class summary card.
This card contains the following information:

Class ID: Unique ID of this class.
Label: Name of this class
Active (Yes / No): Is this class active?
Reference: Reference number of this class
Class Description: User edited text describing the class e.g. for versioning.
Priority Level: User set priority level of this class.
Created: Date when the class was created.
Last Modified: Date when the class was Last Modified
Samples: Number of sample audios in this class.

Edit a Class:

Once you clicked on Edit in a dataset class the Editing Fields of this class appear in a pop-up window.
Here you have the following options:

Name field: Rename the class, this will affect all past results, where this class has previously been recognized.
Note: Free field for entering additional informations like versions.
Priority Level: Select the priority level of this class.
Expiration date: Set the date on which this class will automatically disappear.
Enter a date in yyyy/mm/dd format or use the date picker (opens popup).

Edit the training samples of a class:

When you are in a class, you see all training images contained in this class.
As in the classes menu, you are able to

Activate / Deactivate training samples: Decide whether these training images should be used for training a model.
Edit Sample: Add a custom note to an image for providing more context to the sample or a version control.
Manually add samples: Upload manual training samples from your storage or device.
Copy Sample to a class: Copy the selected sample image to another class or another dataset.
The class will still be as well part of this dataset.
Move Sample to a class: Move the selected sample image to another class or another dataset.
The class will be removed from this dataset.
Remove selected Sample: Delete this Audio sample-

Evaluation of a Dataset:

Once you edited all classes you need to run the evaluation in the dataset overview.
The evaluation runs all sample images of a class through quality checking algorithms in order to get information on the quality of the dataset and whether there might be errors.
After the evaluation has been accomplished, training the dataset will be possible.

Training of a Dataset:

When the evaluation has been accomplished, training the dataset becomes possible.
Once clicking Train will open a popup where you need to select whether a new model should be trained or an existing model should be updated.

You are able to name the model and give additional information in the model description field.
Once the training was successfully completed, the model becomes available in the AI models module.