Generate AI Training Data
Generate synthetic training data for AI vision tasks
Creating large, diverse datasets is a crucial step in training AI models, especially for tasks like image recognition, object detection, and classification. However, collecting and labeling real-world data can be time-consuming, costly, and sometimes impractical. This is where synthetic AI training data comes in as a powerful alternative.
Synthetic AI training data involves generating artificial data that mimics real-world conditions. Using simulation environments like Unity and tools such as realvirtual.io AIBuilder, developers can create vast amounts of synthetic data by varying parameters such as object position, lighting, textures, and camera angles. This process helps generate robust datasets that improve AI model performance without the need for large, manually-collected datasets.
Benefits of Synthetic AI Training Data
Cost-effective: Eliminates the need for expensive data collection processes and reduces manual labeling time.
Control over variables: Allows developers to adjust conditions like lighting, camera perspective, and object configurations to cover a wide range of scenarios.
Faster iterations: Enables rapid creation of diverse training sets, leading to quicker AI model training cycles.
Enhanced robustness: By simulating edge cases and varying environments, AI models trained on synthetic data tend to generalize better to real-world conditions.
Training pipeline flexibility: Generated datasets are exported in standard formats, making them compatible with various AI training frameworks beyond the built-in YOLO integration.
In realvirtual.io AI Builder, the Variators are responsible for dynamically adjusting various parameters during the training process, helping generate a wide variety of synthetic data automatically. This method allows AI models to learn from a range of inputs and ensures the model performs well under different conditions.
Components for Training
The AI Builder Demo scene demonstrates how to create synthetic AI training data and train object detection models using the realvirtual.io AI Builder framework. This demo is designed to showcase key functionalities like camera placement, data recording, and AI model training, using a conveyor belt and sorting system with different colored Lego bricks.
AI Camera
The AI Camera is a vital component in the synthetic training process. It allows you to position the camera within the scene to capture images from different angles, lighting conditions, and perspectives.
The camera's primary role is to record synthetic data, which is then used for training the AI model.
Once set up, the AI Camera continuously captures images of the objects (e.g., bricks) on the conveyor belt, which will be used to train the AI to recognize and sort these objects.
The AI Camera is straightforward to use, requiring no additional configuration. Simply place it in the desired position where synthetic images should be generated.

AIBuilder
The AIBuilder component is designed to manage the entire process of AI training and data recording.

The AI Builder needs to be set to the desired mode, depending on the current stage of your workflow:
Training Mode: In this mode, synthetic training data can be generated using the AI Camera. Once enough data has been recorded, you can start the training process based on the generated data. This mode is used to prepare and train the AI model for object detection.
Detection Mode: In detection mode, the AI performs object detection based on the previously trained model. The detection is still done using the AI Camera (not a real one), making it ideal for testing and developing the entire AI-based process within a fully digital twin. This allows you to test the detection capabilities and refine the AI integration with PLC and robotics interfaces before deploying it to a physical system. See section Testing AI in a Digital Twin.
Additionally, you can activate the use OBB flag. When activated, the system will use Oriented Bounding Boxes for both data generation and object detection.

Training Data Recorder
The Data Recorder, located under "Step 1 - Data Recorder" in the AIBuilderDemo scene, is responsible for capturing synthetic data that will later be used for training AI models. This component collects labeled data from the scene and saves it into a designated folder.
In the scene, the Data Recorder works in combination with the AI Camera and other variators (such as Sky and Light Variators) to create a wide variety of training data. You can configure key parameters like timing, sample count, and validation ratio to control how the synthetic data is generated and stored.

In the Inspector panel of the Training Data Recorder, you can:
Set Labels for the objects being captured (e.g., Lego brick types).
Adjust the Timing settings, including Time Scale and Delay between captures.
Specify the number of Samples to collect and the Validation Ratio for splitting the dataset.
Choose the folder to Export the data to, ensuring the synthetic data is saved for later training use.
Additionally, variators such as Sky and Light Variator are also present to introduce variation in lighting conditions, ensuring the AI becomes robust across different environments.
Starting the Training Process
To begin the AI training process:
Open the scene in Unity and ensure that the Data Recorder is properly configured with the labels, timing, and samples as needed.
Start the scene by entering Play Mode.
In the Inspector panel under the Data Recorder, click on Start Recording.

Once selected, the Data Recorder will begin capturing synthetic data according to the defined settings. The images and their corresponding labels will be saved in the specified folder for later use in training the AI model.
Variators
Sky Variator
The Sky Variator is responsible for varying the sky conditions during the synthetic data recording process. By changing the sky settings, the AI model is exposed to different lighting conditions, making it more robust to real-world variability. In the Inspector, you can configure the following options:
Apply On Init: Enables the sky variation when the recording starts.
Apply On Snapshot: Applies the sky variation each time a snapshot of synthetic data is recorded.
Step: Defines the number of different sky conditions that will be applied during the recording.
Light Variator
The Light Variator is responsible for adjusting the lighting conditions during data recording, helping to expose the AI to various lighting intensities and temperatures. In the Inspector, you can configure the following options:
Apply On Init: Enables light variation at the start of the recording.
Apply On Snapshot: Applies light variation during each snapshot.
Min/Max Intensity: Defines the range of light intensity that will be applied.
Min/Max Temperature: Defines the range of light temperature (from cooler to warmer tones) to be used.
Light: Specifies the light source to be varied, such as the sun in the scene.

Object Variator:
Apply On Init: This option is unchecked, meaning the variation of objects will not happen immediately when the scene starts.
Apply On Snapshot: This option is unchecked, indicating that variations won’t automatically occur when snapshots are taken.
Objects: There are four types of objects configured for variation:
2x2 Brick
3x2 Brick
4x2 Brick
6x2 Brick
Weights: Each object has an equal weight of 0.25, meaning each object has an equal chance of being selected during the variation process.
Transform Variator:
Apply On Init: Checked, meaning that variations in position, rotation, and scale will be applied when the scene is initialized.
Apply On Snapshot: Unchecked, so variations will not happen automatically with each snapshot.
Position Variation: The position of the object can vary between -0.5 and 0.5 units on each of the X, Y, and Z axes.
Rotation Variation: The object’s rotation can vary by up to 360 degrees along the X, Y, and Z axes.
Scale Variation: The object’s scale remains fixed (no scaling variation), as indicated by the zero values for scale variation.
Making objects detectable
When AI Builder is recording training data, it needs to know which objects belong to which class and looks at their bounding boxes to generate appropriate annotation for the training data.
Detectables
In order to make an object detectable (e.g a brick), it requires the Detectable component. You can add the Detectable component to any 3D object.

Once the detectable is added, you can compute the 3D bounds of the object by clicking the respective button in the inspector of the detectable component. The computed bounds are then shown as yellow gizmo box encapsulating the object. Later in the process, these 3D bounds will be projected in the 2D space of the AiCamera and used as the bounding box for data labelling when recording training data.

Additionally to computing the bounds of the target object, the system also needs information on the class label of the object. In order to define such class labels, you first need to generate a LabelDefinition object. In your project window, right click and navigate to Create > realvirtual > AiBuilder > LabelDefinition. This generates the LabelDefinition object.

In the inspector of the LabelDefinition object, you can define your classes, name them and give them a color.
Once your LabelDefinition object is set up, you need to add a label condition to your detectable object. The standard label condition is the FixedCondition. This Component gives your detectable object a fixed class label.
Other components such as the OrientationCondition switches the class label of the object based on its rotation/orientation. This can for instance be useful if you want to detect if the bricks are facing upwards or if they are flipped.

In the FixedCondition component you can set the class id of your object. This id is an integer representing the index of the respective class in the LabelDefinition object mentioned above.
Following these steps your detectable object is ready. The system will recognize it and automatically label it during data recording.
In play mode, with the system set to training mode, you can verify your set up throgh the Data Recording preview. You should see an encapsulating bounding box in the respective color, labeled with the class name defined in your LabelDefinition.

Last updated