Helping with elevating YOLOv11's performance in Human Detection task

Thanks for the reply, your insight about including the initial COCO dataset is really helpful, and I will definitely try that. I would like to ask a few more questions related to the topic:

  1. Suppose I want to train YOLOv11 to detect a brand new class, would including COCO dataset is still needed?
  2. For the additional data, would the person-only dataset I have collected is enough to overcome the problem of falsely detecting canines, or should I include images of dogs, cats, etc., too? And if yes, what will be the ideal ratio among the classes? If most images in the dataset don’t have multiple classes (eg: an image of dogs only), will it still contribute to the training?
  3. I intended to train on my custom dataset only (which is not included data from sources like roboflow). This dataset includes ~4000 images, 8000 instances of person class, and the images are cropped from 50 CCTV videos at 1fps rate. However, the videos are not in 640x640 px, will this affect the training result? Moreover, albeit having lots of images, a lot of them share the same background setting (surroundings, lighting, ect.) and many images have same person instances; the videos are different camera angles. What would you think of this dataset?
  4. I first used COCO YOLOv11 to label all the images and then manually modify the labels. I omit all person label that are too close/far to the camera, have missing body parts or have “abnormal” postures (is sitting, bending, etc.). Is this a good labelling practice?
  5. Can you share any detailed, or reliable sources I can read about preparing dataset, labeling, and evaluating the “clealiness” of the dataset? I am very new to this training on custom dataset, and preparing my own data, so kinda at a loss here :joy:

If you find my questions are too much/nonsense, please feel free to skip them :joy:, as your previous reply was already more than helpful to me.

Again, thank you so much for your reply. Have a good day!