AI Training
Training the Vision AI
Last updated
Training the Vision AI
Last updated
The AI training process in AIBuilder involves using synthetic data generated in Step 1 - Data Recorder to fine-tune a pre-trained AI model. This process utilizes YOLO (You Only Look Once) pre-trained models, which are popular for real-time object detection tasks. The training step enhances the model's capabilities by adapting it to the target environment or use case using the generated data.
To perform AI training:
Select Step 2 - Training under AIBuilder.
Use the synthetic data generated in Step 1 - Data Recorder (Generate AI Training Data) for training. The training fine-tunes a pre-trained YOLO model, allowing the selection of different model sizes suitable for various applications.
The AI Training component has several key parameters and settings that must be configured to initiate the training process:
Data Folder: Specifies the path to the folder where the recorded training data is stored. This folder should contain the images and annotations generated during Step 1. These files are used as input for the training process.
Base Model: Allows for the selection of a YOLO pre-trained model to serve as the starting point for the training. Various model sizes can be chosen based on the application's requirements. The options include:
Nano:
Description: The Nano model is the smallest and most lightweight version of YOLO. It has a minimal number of layers and parameters, making it highly efficient for real-time inference on devices with limited computational power, such as embedded systems or mobile devices.
Use Case: Ideal for applications where speed is critical and hardware resources are constrained. Suitable for low-power devices, real-time applications, or scenarios where only basic object detection is required.
Performance: Fastest inference time but lower accuracy compared to larger models.
Small:
Description: The Small model is slightly larger than the Nano version, with more layers and parameters to improve detection accuracy. It still maintains a relatively low computational footprint, making it suitable for scenarios that require a balance between speed and accuracy.
Use Case: Suitable for edge devices that need real-time detection but can afford a bit more computational load than Nano. It’s a good choice for moderately complex tasks on devices like drones or industrial robots.
Performance: Faster than larger models with moderate accuracy improvements over Nano.
Medium:
Description: The Medium model offers a more significant improvement in accuracy by utilizing a larger network with more parameters. It requires more processing power than Nano and Small but provides a better balance between performance and accuracy for general-purpose tasks.
Use Case: Suitable for desktop GPUs or powerful embedded devices where real-time performance is still required but with a higher accuracy demand. Ideal for applications like automated quality inspection or surveillance.
Performance: Offers a good trade-off between speed and detection accuracy.
Large:
Description: The Large model is a more complex version of YOLO, featuring a significantly higher number of layers and parameters. It is designed for scenarios where detection accuracy is crucial, even if it means sacrificing some real-time performance.
Use Case: Suitable for systems with powerful GPUs where high accuracy is essential, such as in medical imaging, autonomous driving, or detailed industrial inspection tasks.
Performance: Higher accuracy than Medium, but requires more computational resources and has slower inference times.
XLarge:
Description: The XLarge model is the largest and most powerful version of YOLO, with the highest number of layers and parameters. It provides the best detection accuracy at the cost of increased computational demands.
Use Case: Ideal for server-grade hardware or cloud-based AI services where maximum accuracy is the priority, and real-time performance is less of a concern. Suitable for complex tasks like high-resolution video analysis, research, or scenarios requiring fine-grained object detection.
Performance: Most accurate model but also the slowest in terms of inference speed due to its complexity.
Nano
Very Small
Fastest
Lower
Low-power devices, basic real-time tasks
Small
Small
Faster
Moderate
Edge devices, moderately complex tasks
Medium
Medium
Balanced
Good
General-purpose, desktop GPUs
Large
Large
Slower
High
High-accuracy tasks, powerful GPUs
XLarge
Very Large
Slowest
Highest
Maximum accuracy, server or cloud
Model Name: The name assigned to the new model being created and trained. It is used to identify the model within the system, particularly when managing multiple training sessions.
Epochs: Specifies the number of training epochs. An epoch represents one complete pass through the entire training dataset. Increasing the number of epochs may improve the model's performance but will also increase training time. Usually, depending on your vision task, you should select something between 30 and 200 epochs.
Choose Data Folder: Set the path to the folder containing the synthetic data generated in Step 1.
Configure the Model:
Select a Base Model (YOLO variant) that fits your requirements.
Provide a Model Name for easy identification.
Set the desired number of Epochs for the training.
Start Training: Click Start New Session to begin the training process. The AI model will be trained using the specified settings, updating its parameters with each epoch to improve its performance on the given dataset.
When the training process begins, a Windows command line window will open, displaying the progress of the training session.
It may take some time before the first training epoch starts, as the system prepares the data and initializes the model.
For more advanced analysis during training, you can use TensorBoard to monitor various metrics. TensorBoard is a powerful visualization tool that provides insights into the model's performance by displaying:
Loss and Accuracy Graphs: Shows how the model's loss decreases and accuracy improves over time, helping to assess whether the model is learning effectively.
Scalars: Provides numerical data such as training speed, memory usage, and other custom metrics.
Histograms and Distributions: Visualizes the distribution of weights and biases in the model, indicating how these parameters change during training.
Graphs: Displays the computational graph, which helps in understanding the structure and flow of the neural network.
Note that TensorBoard does not automatically refresh its data during the training process. To see the latest updates, you need to restart TensorBoard periodically for refreshed statistics.
For more details on how to use TensorBoard and its various features, you can refer to the TensorBoard documentation.
Once training is complete, you will find the results in the "ONNX" folder. Each trained model will be stored in its own subfolder, containing the trained ONNX model and associated files. This ONNX model can now be used in subsequent steps, such as deploying it in a Unity application or exporting it for use in an external inference system.
TensorBoard can also be used after training to analyze the results more thoroughly, providing insights into the model's final performance and areas where additional tuning or retraining may be beneficial.