What is the Object and Scene Recognition Module?

Module Description

Object and scene recognition detects and labels various objects and scenes, from general to more specific ones. With this module you can immediately summarize the content of pictures or videos. It can be used to conveniently and reliably categorize and archive visual data with more than 1,500 object classes, but also be used with custom dictionairies to fulfill what ever purpose you have in mind.

Customized Object and Scene Recognition
To use a custom Object and Scene Recognition model, you need to upload a dictionairy in the Dictionaries secion (Deep Model Customizer).

How does it work?

Select the Media File: Choose the media file you want to analyze.
Activate the Object and Scene Recognition Module: In the left column, select the "Object and Scene Recognition" module.
Define the Model & Parameters: Choose the model for analysis from the available options, set the parameters, and click the yellow "Add Module" button.
Start the Analysis: You can either add more modules or begin the analysis immediately by clicking "Start Analysis"

What Parameters are available?

Model (Dropdown): Choose one of our pre-trained models or your custom-trained models.
Two models are currently available:
- Zero Shot
  A large set of pre-trained labels with powerful zero-shot generalization capabilities that can be customized. Recommended for most applications.
- General-C
  Various objects and scenes, from general to more specific, limited to more general abilities.

Custom Models:
To use custom models, they must be uploaded as a dictionary and then applied when using the Zero Shot model.

Minimum Confidence Threshold (0-1): A confidence level for detecting objects. Only return predictions with at least a higher confidence than this threshold. A lower value will provide more accurate results, a higher value may result in more false positives.

Note:
The minimum confidence can also be applied later by filtering the search results, in which case it is recommended to use a lower value.

Dictionairy (Drop-Down)
This is an object that represents a text-based dictionary of words and phrases.
In the case of Zero Shot models, they are typically used to provide additional labels for recognition. We have several default dictionaries built into Object and Scene Recognition, but you can also add your own.

You can use more than one dictionary in an analysis job, just click "Add another dictionary" and all labels of all applied dictionaries will be included in the analysis.

These default dictionaries are currently available in addition to custom dictionaries:
- IAB Content Taxonomy 3.0
  The Interactive Advertising Bureau created this content tagging standard to provide a consistent taxonomy for contextual advertising. This list includes all 700 categories of the content taxonomy.
  Examples include: Pets, Family and Relationships, Skateboarding, Health, ...

Note:
Read more about the IAB Taxonomy here, see the full list of the labels here.

- GARM Brand Safety
  The Global Alliance for Responsible Media (GARM) has developed common definitions to ensure that the advertising industry categorizes harmful content in the same way across all platforms.
  Examples include: Explicit Content, Arms, Ammunition, Drugs, Tobacco

Note:
Read more about the GARM Brand Safety here.
Unfortunately, GARM ceased all activities in August 2024 after losing a lawsuit with X, but their brand safety standard is used in many media houses, so we keep their dictionary.

- Animal Names (mainly for Speech Recognition)
  Broad list of animal names, works only generalized with object and scene recognition.
  Not suitable for accurate classification of animal species or subspecies.
- European Football Clubs (for Speech Recognition only!)
  Names of more than 550 European soccer teams.

Dictionary Types:
Different custom dictionary types provide different behavior when used in the Object and Scene Recognition. Note that the type is defined by the format of the uploaded file.

Simple dictionary
Simple dictionaries are used to provide category information for predicted labels. They are defined by a UTF-8 encoded text file (.txt), where each line represents a single entry in the dictionary. Blank lines in the file are ignored.
Mapping Dictionary
The purpose of map dictionaries is to provide label substitution behavior during inference. Each map dictionary is defined by a UTF-8 encoded CSV file (.csv) that should contain at least the header and two columns without missing values: source and target. Each time a label from the source column is predicted by the Object and Scene Recognition, it is replaced by the value provided in the target column.

Read more about creating custom dictionaries here.

Include Preset Labels (Checkbox)
In addition to the dictionary labels, this option also includes the pre-trained standard labels of the Zero Shot model.
Language (Dropdown)
This selects the language in which the labels are displayed, currently only English and German are supported. Others can be added on demand.
Enable Captioning (Checkbox)
Enable prediction of a scene caption (description of the scene). The result is not displayed in the UI, only via the API!

Scene Captioning:
The result is only included in the API and currently not displayed in the frontend!

Displaying the Results:

Timeline:

The timeline, located below the player, displays the entire video runtime and the results from each module as gray bars.

By clicking on any of the grey result bars, you will see details such as:
- Label Name
- Confidence Value (0%-100%)
- Timecode (TC)
- Exact frame numbers
- Runtime/Duration
Additional information may also be displayed depending on the module. Clicking on a result moves the playhead to the beginning of that result.
These results are identical to those provided by the API, but in a more user-friendly, graphical format. If there are multiple results, use your mouse wheel to scroll through the timeline.

Search Field:

Located in the top bar, the search field includes filter settings for refining your results.

Label field: Enter a name to view results that either match or don't match the entered name.
Sorting: Results can be sorted alphabetically, by similarity, or by duration. You can toggle between ascending and descending order.
Confidence: The confidence slider filters results based on confidence levels, displaying only results above a certain threshold.

After adjusting filters, click "Apply" to apply them. Active filters appear in a black box beneath the search field and can be cleared by clicking the X symbol.

Module Section

On the right side of the player, you’ll see a section with detailed results for each module used in the analysis. Clicking on the module name opens a dropdown with specific parameters, useful for troubleshooting or viewing metadata.

Result Cards

Results are displayed as cards in chronological order. Each card provides key information, such as:

- Label of the result: Which labels have been identified and with what level of confidence.
- Knowledge Graph Link:
  If the object is listed in the Deep Explorer Knowledge Graph, you’ll see a Knowledge Graph icon (a ball of dots and lines). Clicking it opens a pop-up showing all related information.

Troubleshooting

To investigate problems with your Object and Scene Recognition results, follow these steps
In the Module section on the right side of the Results Viewer, you can access detailed parameters for each module. These parameters can provide additional information that can help you troubleshoot by allowing you to verify settings or fine-tune module performance.

Adjust confidence levels: Increase or decrease the confidence level to get more matches or fewer false positives.

Apply a custom dictionary: If the labels don't match your expectations, try uploading a custom dictionary with the labels appropriate for your use case.

Varying Labels in User-Defined Dictionaries: In some cases, it is helpful to vary the labels in the Zero Shot dictionary to get the most accurate result. You will most likely need to use a mapping dictionary to combine the correct source label with the correct target label to display.

Mapping Dictionary Example:
Target Label: Pipeline Compressor Station
Source Label: Industrial machinery with large pipes