# Generate AI Training Data

Creating large, diverse datasets is a crucial step in training AI models, especially for tasks like image recognition, object detection, and classification. However, collecting and labeling real-world data can be time-consuming, costly, and sometimes impractical. This is where **synthetic AI training data** comes in as a powerful alternative.

**Synthetic AI training data** involves generating artificial data that mimics real-world conditions. Using simulation environments like Unity and tools such as **realvirtual.io AIBuilder**, developers can create vast amounts of synthetic data by varying parameters such as object position, lighting, textures, and camera angles. This process helps generate robust datasets that improve AI model performance without the need for large, manually-collected datasets.

#### Benefits of Synthetic AI Training Data

* **Cost-effective**: Eliminates the need for expensive data collection processes and reduces manual labeling time.
* **Control over variables**: Allows developers to adjust conditions like lighting, camera perspective, and object configurations to cover a wide range of scenarios.
* **Faster iterations**: Enables rapid creation of diverse training sets, leading to quicker AI model training cycles.
* **Enhanced robustness**: By simulating edge cases and varying environments, AI models trained on synthetic data tend to generalize better to real-world conditions.
* **Training pipeline flexibility**: Generated datasets are exported in standard formats, making them compatible with various AI training frameworks beyond the built-in YOLO integration.

In **realvirtual.io AI Builder**, the **Variators** are responsible for dynamically adjusting various parameters during the training process, helping generate a wide variety of synthetic data automatically. This method allows AI models to learn from a range of inputs and ensures the model performs well under different conditions.

## Components for Training

The AI Builder Demo scene demonstrates how to create synthetic AI training data and train object detection models using the realvirtual.io AI Builder framework. This demo is designed to showcase key functionalities like camera placement, data recording, and AI model training, using a conveyor belt and sorting system with different colored Lego bricks.

### AI Camera

The **AI Camera** is a vital component in the synthetic training process. It allows you to position the camera within the scene to capture images from different angles, lighting conditions, and perspectives.

The camera's primary role is to record synthetic data, which is then used for training the AI model.

Once set up, the AI Camera continuously captures images of the objects (e.g., bricks) on the conveyor belt, which will be used to train the AI to recognize and sort these objects.

The AI Camera is straightforward to use, requiring no additional configuration. Simply place it in the desired position where synthetic images should be generated.

<figure><img src="https://260262196-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FpYxFg97YnJX96UzNNTSd%2Fuploads%2Fgit-blob-ab49487d065db9f8078024ca6f8969394ad02274%2Fai-camera.png?alt=media" alt=""><figcaption><p>AI Camera in the scene</p></figcaption></figure>

### AIBuilder

The **AIBuilder** component is designed to manage the entire process of AI training and data recording.

<figure><img src="https://260262196-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FpYxFg97YnJX96UzNNTSd%2Fuploads%2Fgit-blob-34b860b7f8e65f105403016e5c545f91ce046971%2Fai-builder.png?alt=media" alt=""><figcaption><p>AI Builder main setup (Training, Detection, Deployment)</p></figcaption></figure>

The AI Builder needs to be set to the desired mode, depending on the current stage of your workflow:

**Training Mode:** In this mode, synthetic training data can be generated using the AI Camera. Once enough data has been recorded, you can start the training process based on the generated data. This mode is used to prepare and[ train the AI model ](https://github.com/game4automation/doc/blob/doc/extensions/realvirtual.io-aibuilder/ai-training/README.md)for object detection.

**Detection Mode:** In detection mode, the AI performs object detection based on the previously trained model. The detection is still done using the AI Camera (not a real one), making it ideal for testing and developing the entire AI-based process within a fully digital twin. This allows you to test the detection capabilities and refine the AI integration with PLC and robotics interfaces before deploying it to a physical system. See section [Testing AI in a Digital Twin](https://doc.realvirtual.io/extensions/realvirtual.io-aibuilder/testing-ai-in-a-digital-twin).

Additionally, you can activate the use OBB flag. When activated, the system will use Oriented Bounding Boxes for both data generation and object detection.

<figure><img src="https://260262196-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FpYxFg97YnJX96UzNNTSd%2Fuploads%2Fgit-blob-4ae0f21e084f9aaf6a36745f0af72eae29e586e1%2FScreenshot%202025-07-17%20230356.png?alt=media" alt="" width="375"><figcaption></figcaption></figure>

###

### Training Data Recorder

The Data Recorder, located under "Step 1 - Data Recorder" in the AIBuilderDemo scene, is responsible for capturing synthetic data that will later be used for training AI models. This component collects labeled data from the scene and saves it into a designated folder.

In the scene, the Data Recorder works in combination with the AI Camera and other variators (such as Sky and Light Variators) to create a wide variety of training data. You can configure key parameters like timing, sample count, and validation ratio to control how the synthetic data is generated and stored.

<figure><img src="https://260262196-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FpYxFg97YnJX96UzNNTSd%2Fuploads%2Fgit-blob-2cb9720b6c0cda40ac139f2181fcbcab48db18d3%2Fai-datarecorder.png?alt=media" alt=""><figcaption><p>Training Data Recorder</p></figcaption></figure>

In the Inspector panel of the Training Data Recorder, you can:

* Set **Labels** for the objects being captured (e.g., Lego brick types).
* Adjust the **Timing** settings, including **Time Scale** and **Delay** between captures.
* Specify the number of **Samples** to collect and the **Validation Ratio** for splitting the dataset.
* Choose the folder to **Export** the data to, ensuring the synthetic data is saved for later training use.

Additionally, variators such as **Sky** and **Light Variator** are also present to introduce variation in lighting conditions, ensuring the AI becomes robust across different environments.

**Starting the Training Process**

To begin the AI training process:

1. Open the scene in Unity and ensure that the **Data Recorder** is properly configured with the labels, timing, and samples as needed.
2. Start the scene by entering **Play Mode**.
3. In the **Inspector** panel under the **Data Recorder**, click on **Start Recording**.

<figure><img src="https://260262196-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FpYxFg97YnJX96UzNNTSd%2Fuploads%2Fgit-blob-ae477363d9328bf20d985d634e923031ae3d8709%2Fai-startrecording.png?alt=media" alt=""><figcaption><p>Starting the training in Play mode</p></figcaption></figure>

Once selected, the Data Recorder will begin capturing synthetic data according to the defined settings. The images and their corresponding labels will be saved in the specified folder for later use in training the AI model.

### Variators

#### **Sky Variator**

The **Sky Variator** is responsible for varying the sky conditions during the synthetic data recording process. By changing the sky settings, the AI model is exposed to different lighting conditions, making it more robust to real-world variability. In the **Inspector**, you can configure the following options:

* **Apply On Init**: Enables the sky variation when the recording starts.
* **Apply On Snapshot**: Applies the sky variation each time a snapshot of synthetic data is recorded.
* **Step**: Defines the number of different sky conditions that will be applied during the recording.

#### **Light Variator**

The **Light Variator** is responsible for adjusting the lighting conditions during data recording, helping to expose the AI to various lighting intensities and temperatures. In the **Inspector**, you can configure the following options:

* **Apply On Init**: Enables light variation at the start of the recording.
* **Apply On Snapshot**: Applies light variation during each snapshot.
* **Min/Max Intensity**: Defines the range of light intensity that will be applied.
* **Min/Max Temperature**: Defines the range of light temperature (from cooler to warmer tones) to be used.
* **Light**: Specifies the light source to be varied, such as the sun in the scene.

<figure><img src="https://260262196-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FpYxFg97YnJX96UzNNTSd%2Fuploads%2Fgit-blob-66833ea5e8c1521e51cd983f3456df307aab45c7%2FAI-sourcevariator.png?alt=media" alt=""><figcaption><p>Object and Transform Variator on the Source</p></figcaption></figure>

**Object Variator**:

* **Apply On Init**: This option is unchecked, meaning the variation of objects will not happen immediately when the scene starts.
* **Apply On Snapshot**: This option is unchecked, indicating that variations won’t automatically occur when snapshots are taken.
* **Objects**: There are four types of objects configured for variation:
  * **2x2 Brick**
  * **3x2 Brick**
  * **4x2 Brick**
  * **6x2 Brick**
* **Weights**: Each object has an equal weight of 0.25, meaning each object has an equal chance of being selected during the variation process.

**Transform Variator**:

* **Apply On Init**: Checked, meaning that variations in position, rotation, and scale will be applied when the scene is initialized.
* **Apply On Snapshot**: Unchecked, so variations will not happen automatically with each snapshot.
* **Position Variation**: The position of the object can vary between -0.5 and 0.5 units on each of the X, Y, and Z axes.
* **Rotation Variation**: The object’s rotation can vary by up to 360 degrees along the X, Y, and Z axes.
* **Scale Variation**: The object’s scale remains fixed (no scaling variation), as indicated by the zero values for scale variation.

### Making objects detectable

When AI Builder is recording training data, it needs to know which objects belong to which class and looks at their bounding boxes to generate appropriate annotation for the training data.

#### Detectables

In order to make an object detectable (e.g a brick), it requires the Detectable component. You can add the Detectable component to any 3D object.

<figure><img src="https://260262196-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FpYxFg97YnJX96UzNNTSd%2Fuploads%2Fgit-blob-e6e418e06fe527939141d2f4820b742621bf42a3%2FScreenshot%202025-07-17%20193036.png?alt=media" alt="" width="375"><figcaption></figcaption></figure>

Once the detectable is added, you can compute the 3D bounds of the object by clicking the respective button in the inspector of the detectable component. The computed bounds are then shown as yellow gizmo box encapsulating the object. Later in the process, these 3D bounds will be projected in the 2D space of the AiCamera and used as the bounding box for data labelling when recording training data.

<figure><img src="https://260262196-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FpYxFg97YnJX96UzNNTSd%2Fuploads%2Fgit-blob-7d05bd9671cc1b84d7b706d78487aeddb09c77b3%2FScreenshot%202025-07-17%20192934.png?alt=media" alt="" width="375"><figcaption></figcaption></figure>

Additionally to computing the bounds of the target object, the system also needs information on the class label of the object. In order to define such class labels, you first need to generate a LabelDefinition object. In your project window, right click and navigate to Create > realvirtual > AiBuilder > LabelDefinition. This generates the LabelDefinition object.

<figure><img src="https://260262196-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FpYxFg97YnJX96UzNNTSd%2Fuploads%2Fgit-blob-4eccda199cd0e0af2cd3ffda53bf9bf08f42b9d8%2FScreenshot%202025-07-17%20193936.png?alt=media" alt="" width="356"><figcaption></figcaption></figure>

In the inspector of the LabelDefinition object, you can define your classes, name them and give them a color.

Once your LabelDefinition object is set up, you need to add a label condition to your detectable object. The standard label condition is the FixedCondition. This Component gives your detectable object a fixed class label.

Other components such as the OrientationCondition switches the class label of the object based on its rotation/orientation. This can for instance be useful if you want to detect if the bricks are facing upwards or if they are flipped.

<figure><img src="https://260262196-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FpYxFg97YnJX96UzNNTSd%2Fuploads%2Fgit-blob-b6e117c3fe12a466b08aec31bf656c45036240e7%2FScreenshot%202025-07-17%20194928.png?alt=media" alt="" width="358"><figcaption></figcaption></figure>

In the FixedCondition component you can set the class id of your object. This id is an integer representing the index of the respective class in the LabelDefinition object mentioned above.

Following these steps your detectable object is ready. The system will recognize it and automatically label it during data recording.

In play mode, with the system set to training mode, you can verify your set up throgh the Data Recording preview. You should see an encapsulating bounding box in the respective color, labeled with the class name defined in your LabelDefinition.

<figure><img src="https://260262196-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FpYxFg97YnJX96UzNNTSd%2Fuploads%2Fgit-blob-f0acbbb3f7cd425bf992a3edd55650d40eced72f%2FScreenshot%202025-07-17%20195752.png?alt=media" alt="" width="375"><figcaption></figcaption></figure>


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://doc.realvirtual.io/extensions/realvirtual.io-aibuilder/generate-ai-training-data.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
