Classification vs Detection differential backpropagation in the archtiecture

Hi, I’ve been reading a lot about different versions of YOLO and started my tests with this YOLOv5 version. I am checking about the training process and I found very interesting information about the process in this other thread How to combine weights.

What I haven’t been able to see is whether previous YOLO approach to training is maintained or it changed. In YOLO900 paper it was stated the following:

During training we mix images from both detection and
classification datasets. When our network sees an image
labelled for detection we can backpropagate based on the
full YOLOv2 loss function. When it sees a classification
image we only backpropagate loss from the classification-
specific parts of the architecture.

I cannot find whether this feature was discarded in later versions of YOLO (i.e. v5) or not. Maybe I missed important information about it.

My problem comes from the use case in which I am working. Just to give a bit of context, I am trying to complete a given dataset (KAIST dataset, with thermal-visual image pairs) and it would be pretty useful to be able to complete it with just some classification examples. The dataset has a lot of detection images with people on the walking or standing on streets but I would need to enhance it to be able to detect humans in different positions and postures. These examples should be useful to generalize the ‘concept’ of person in the network in later detection-classification use cases.

If this feature, this differential training based on type of data, is not available anymore (not sure later architectures allow it), having a label of the whole image with the associated class could do the trick?
In that case, does anyone when was this approach changed? I would like to check the details and try to understand it.

Obviously I’ll have less images of this type, but I think the flag image-weights could solve this imbalance of examples.

Many thanks!

I cannot eddit the message (or I don’t know how).
This question in StackOverflow seems to be related as in YOLO documentation was said that when no anchor box is provided is it used as classification only. Is it the same with YOLOv8?

Many thanks