I have retrained YOLOv11 model using my own image dataset, then i try to export the model in ONNX format:
retrained_model.export(format="onnx")
Once i got the model in ONNX format, I tried to use this model in two different environment, C# with .NET framework and python. Below is my code and their respective output:
#r "nuget: YoloDotNet"
#r "nuget: SkiaSharp"
using YoloDotNet;
using YoloDotNet.Enums;
using YoloDotNet.Models;
using YoloDotNet.Extensions;
using SkiaSharp;
// Instantiate a new Yolo object
var yolo = new Yolo(new YoloOptions
{
OnnxModel = "retrained_model.onnx", // Your Yolo model in onnx format
ModelType = ModelType.ObjectDetection, // Set your model type
Cuda = false, // Use CPU or CUDA for GPU accelerated inference. Default = true
GpuId = 0, // Select Gpu by id. Default = 0
PrimeGpu = false, // Pre-allocate GPU before first inference. Default = false
});
// Load image
var image = SKImage.FromEncodedData("test_images/test_1.jpeg");
// Run inference and get the results
var results = yolo.RunObjectDetection(image, confidence: 0.2, iou: 0.7);
// Draw results and save it
var resultImage = image.Draw(results);
resultImage.Save("result_images/result_1.jpg", SKEncodedImageFormat.Jpeg, 80);
// Print detection results to the console
Console.WriteLine("Detected objects:");
foreach (var detection in results)
{
Console.WriteLine($" - Label: {detection.Label}, Confidence: {detection.Confidence:F2}, Bounding box: {detection.BoundingBox}");
}
The ONNX model works fine in both environment. However, I realized that when I use python environment, there is only one object been detected (which is correct), but in C# envrionment, there are another 3 objects been detected other than the correct one. Both environment are using same testing image and same ONNX model, and having same confidence score and IOU, but the result in C# environment seems like providing extra wrong detection. I saw from some sources mentioned that the image processing step in python and C# is different which caused this issue. May I know is that true? What should I do to make sure the code in C# environment also provide the same output as the one in python?
The output from Ultralytics (in the Python environment) shows that one object was detected, which is correct. The bounding box and label are also accurate. However, in the C# version, the correct output is shown, but three additional objects are incorrectly detected, even though there is only one object in the image.
I suspect that the way Yolodotnet (used in C#) resizes the test image differs from how Ultralytics resizes the image, but I’m not sure how each of them handles the resizing process to fit the image into the retrained model.
Are you running both environments on the same hardware? Specifically, if you’re running on a Windows PC (as an example), are you using Python on native Windows as well as the C# environment, or are you running the Python inside of a WSL environment? Beyond different hardware/environments, it’s entirely possible that the float handling for Python and C# are different enough that you end up with varying results.
Sharing the output of both, the tensor/metadata and/or annotated image results, would be helpful to understand more specifically the problem you’re facing. There could be lots of different causes that could produce additional detections, but without seeing how the extra detections manifest, it makes it a bit more difficult to deduce.
As you can observed from the screenshots, the same model been used in 2 different environment give different confidence score. This is also one of the problem that I encountered other than the challenge that i mentioned in the question (multiple objects been detected when using C# environment).
I understand that when a new testing image is passed in, the method YoloDotnet uses to resize the image to 1088x1088 may differ from how Ultralytics resizes the image to the same size. However, I’m not sure how Ultralytics resizes the input image and adjusts it to fit the model before making predictions.
There are two pre-processing steps for image resizing for inference.
and from L153, the pre_transform method call
For ONNX exported models, the image would be resized to (640, 640) by default for Python. First the largest side of the image would be scaled to 640 and the short side would be scaled the same amount to maintain the image aspect ratio. Then the short side of the image would be padded evenly with the difference to make the image 640 square.
If the YOLODotnet library is resizing to (1088, 1088) that would certainly cause a difference in the confidence values. Differences in the method that is used to calculate IOU for non-max suppression could also contribute to very different results for both confidence values and number of bounding boxes, despite all other settings being equal.
When I retrained the YOLO model with my own dataset, I used the command model.train(imgsz=1088). Now, if I perform inference on a new test image that is already 1088x1088 in size, will the image still be resized, or will it remain unchanged since it’s already in the required size?
Since the metadata is showing (1088, 1088) and if your image is already resized to those dimensions, the no resizing should take place. Perhaps try exporting with nms=True to see if that helps with getting closer alignment between the two. Beyond that, you may need to get support from the YOLODotenet author to understand the difference in the results.