Manually Calculated Metrics in YOLOv8

Hello everyone, I am using the YOLOv8 model and I wanted to do analyze my results depending on the zone of the image.

To do this I can’t just use overall metrics returned by the model and have to calculate them myself, so that is what I am doing.

I do a model.predict with conf = 0.25 and IoU = 0.5 and then compare the preds with the ground truth calculating from here the TP, FN and FP and subsequently the overall Precision and Recall.

To make sure I am calculating them correctly I am testing on the val set and comparing them to the best epoch results of my model (the output when training shows the P and the R, those are the values I am comparing to).

However, my Precision and Recall values from the manual calculations seem to be like 10% better than the output values when training.

I heard that YOLO’s table reports macro averages (per-class P and R, then averaged across classes and that could be a cause of the problem. Is this real? Does that make sense or am I probably calculating the TP, FN and FP the wrong way? (visually they look correct)

The recall and other metrics in the log are at the confidence threshold that produces the best F1-score.