AI Training

Training the Vision AI

The AI training process in AI Builder involves using synthetic data generated in Step 1 - Data Recorder to fine-tune a pre-trained AI model. While AI Builder provides built-in integration with YOLO (You Only Look Once) for real-time object detection, the training pipeline is flexible and designed to work with multiple AI frameworks and training approaches.

Prerequisites

realvirtual.io License: Requires realvirtual.io Starter or Professional edition
Unity 6: Unity 6000.0.28f1 or later
Ultralytics YOLO Pipeline: Automatically installable through the AI Builder interface

YOLO Integration and Licensing

Before you can start a training session, you need to install our Ultralytics pipeline. In the inspector of Select Step 2 - Training simply click the install Ultralytics pipeline. The pipeline is open source and released on our GitHub.

Important Licensing Notice

The YOLO integration uses Ultralytics YOLO, which is licensed under AGPL-3.0. This means:

Open Source Use: Free for open source projects under AGPL-3.0
Commercial Use: Requires careful review of licensing terms
Distribution: Any software using this integration must comply with AGPL-3.0 requirements

For commercial licensing options, please review Ultralytics licensing terms or contact Ultralytics directly.

For detailed information, see Ultralytics YOLO documentation.

Training Pipeline Flexibility

AI Builder supports multiple training approaches:

Built-in YOLO Integration

One-Click Training: Integrated YOLO pipeline for immediate object detection model training
Multiple Model Sizes: Support for Nano, Small, Medium, Large, and XLarge YOLO variants
Automatic Configuration: Pre-configured training parameters optimized for industrial scenarios

Custom Training Pipelines

The synthetic training data generated by AI Builder can be used with external training frameworks:

Standard Export Formats: Training data exported in industry-standard formats (images + annotations)
Framework Compatibility: Compatible with TensorFlow, PyTorch, Detectron2, and other ML frameworks
Cloud Training: Data can be exported for use with cloud-based AI training services
Custom Workflows: Developers can integrate their own training pipelines and methodologies

Overview

To perform AI training:

Select Step 2 - Training under AI Builder.
Use the synthetic data generated in Step 1 - Data Recorder (Generate AI Training Data) for training. The built-in training fine-tunes a pre-trained YOLO model, allowing the selection of different model sizes suitable for various applications.
Alternatively, export the generated training data for use with your preferred AI training framework.

AI Training Properties

The AI Training component has several key parameters and settings that must be configured to initiate the training process:

Data Folder: Specifies the path to the folder where the recorded training data is stored. This folder should contain the images and annotations generated during Step 1. These files are used as input for the training process.
Base Model: Allows for the selection of a YOLO pre-trained model to serve as the starting point for the training. Various model sizes can be chosen based on the application's requirements. The options include:
1. Nano:
  - Description: The Nano model is the smallest and most lightweight version of YOLO. It has a minimal number of layers and parameters, making it highly efficient for real-time inference on devices with limited computational power, such as embedded systems or mobile devices.
  - Use Case: Ideal for applications where speed is critical and hardware resources are constrained. Suitable for low-power devices, real-time applications, or scenarios where only basic object detection is required.
  - Performance: Fastest inference time but lower accuracy compared to larger models.
2. Small:
  - Description: The Small model is slightly larger than the Nano version, with more layers and parameters to improve detection accuracy. It still maintains a relatively low computational footprint, making it suitable for scenarios that require a balance between speed and accuracy.
  - Use Case: Suitable for edge devices that need real-time detection but can afford a bit more computational load than Nano. It's a good choice for moderately complex tasks on devices like drones or industrial robots.
  - Performance: Faster than larger models with moderate accuracy improvements over Nano.
3. Medium:
  - Description: The Medium model offers a more significant improvement in accuracy by utilizing a larger network with more parameters. It requires more processing power than Nano and Small but provides a better balance between performance and accuracy for general-purpose tasks.
  - Use Case: Suitable for desktop GPUs or powerful embedded devices where real-time performance is still required but with a higher accuracy demand. Ideal for applications like automated quality inspection or surveillance.
  - Performance: Offers a good trade-off between speed and detection accuracy.
4. Large:
  - Description: The Large model is a more complex version of YOLO, featuring a significantly higher number of layers and parameters. It is designed for scenarios where detection accuracy is crucial, even if it means sacrificing some real-time performance.
  - Use Case: Suitable for systems with powerful GPUs where high accuracy is essential, such as in medical imaging, autonomous driving, or detailed industrial inspection tasks.
  - Performance: Higher accuracy than Medium, but requires more computational resources and has slower inference times.
5. XLarge:
  - Description: The XLarge model is the largest and most powerful version of YOLO, with the highest number of layers and parameters. It provides the best detection accuracy at the cost of increased computational demands.
  - Use Case: Ideal for server-grade hardware or cloud-based AI services where maximum accuracy is the priority, and real-time performance is less of a concern. Suitable for complex tasks like high-resolution video analysis, research, or scenarios requiring fine-grained object detection.
  - Performance: Most accurate model but also the slowest in terms of inference speed due to its complexity.

Summary Table for Base Models

Model

Size & Complexity

Inference Speed

Accuracy

Use Case

Nano

Very Small

Fastest

Lower

Low-power devices, basic real-time tasks

Small

Small

Faster

Moderate

Edge devices, moderately complex tasks

Medium

Medium

Balanced

Good

General-purpose, desktop GPUs

Large

Large

Slower

High

High-accuracy tasks, powerful GPUs

XLarge

Very Large

Slowest

Highest

Maximum accuracy, server or cloud

Model Name: The name assigned to the new model being created and trained. It is used to identify the model within the system, particularly when managing multiple training sessions.
Epochs: Specifies the number of training epochs. An epoch represents one complete pass through the entire training dataset. Increasing the number of epochs may improve the model's performance but will also increase training time. Usually, depending on your vision task, you should select something between 30 and 200 epochs.

Starting the Training

Choose Data Folder: Set the path to the folder containing the synthetic data generated in Step 1.
Configure the Model:
- Select a Base Model (YOLO variant) that fits your requirements.
- Provide a Model Name for easy identification.
- Set the desired number of Epochs for the training.
Start Training: Click Start New Session to begin the training process. The AI model will be trained using the specified settings, updating its parameters with each epoch to improve its performance on the given dataset.

During the Training

When the training process begins, a Windows command line window will open, displaying the progress of the training session.

It may take some time before the first training epoch starts, as the system prepares the data and initializes the model.

For more advanced analysis during training, you can use TensorBoard to monitor various metrics. TensorBoard is a powerful visualization tool that provides insights into the model's performance by displaying:

Loss and Accuracy Graphs: Shows how the model's loss decreases and accuracy improves over time, helping to assess whether the model is learning effectively.
Scalars: Provides numerical data such as training speed, memory usage, and other custom metrics.
Histograms and Distributions: Visualizes the distribution of weights and biases in the model, indicating how these parameters change during training.
Graphs: Displays the computational graph, which helps in understanding the structure and flow of the neural network.

Note that TensorBoard does not automatically refresh its data during the training process. To see the latest updates, you need to restart TensorBoard periodically for refreshed statistics.

For more details on how to use TensorBoard and its various features, you can refer to the TensorBoard documentation.

After the training

Once training is complete, you will find the results in the "ONNX" folder. Each trained model will be stored in its own subfolder, containing the trained ONNX model and associated files. This ONNX model can now be used in subsequent steps, such as deploying it in a Unity application or exporting it for use in an external inference system.

TensorBoard can also be used after training to analyze the results more thoroughly, providing insights into the model's final performance and areas where additional tuning or retraining may be beneficial.