Pose Estimation - Key Points Outside Bounding Box

pedestrianman · January 15, 2025, 7:59am

Hi everyone!

I’m working on a project where I need to track a flock of sheep entering a pen. Here’s an example of the kind of images I’m dealing with:

The issue is that the sheep are packed closely together, and there’s a lot of overlap between them, so trying to define a bounding box for each one has been a nightmare. I’m doing the labeling process in Roboflow.

I was thinking of simplifying things by using the sheep’s heads as the bounding box and then adding two additional key points outside the box to estimate the body position.

Does this approach make sense? Could having key points outside the bounding box cause any issues when training a YOLO Pose Estimation Model?

Thanks for you help!

Toxite · January 15, 2025, 9:30am

I don’t think you can have keypoints outside the box. Overlapping boxes aren’t an issue. They’re pretty normal and common.

BurhanQ · January 15, 2025, 11:49am

Why not use a segmentation model instead?

pedestrianman · January 16, 2025, 11:18am

Do you think this would be a better approach for this case?

I’m a bit worried that, since later I’ll need to do quite accurate tracking without losing any sheep, a segmentation-based approach might make more errors than simply being able to detect the head and body orientation.

The goal of the project is to study the sheep’s entry times and the distribution of their speeds, so accuracy in detection is a key factor.

BurhanQ · January 16, 2025, 1:39pm

I think that given the crowding of the sheep. Practically speaking, you’ll have to test it out to see how well it does or doesn’t work. If the view will always be a top-down view like you’ve shown, then you can probably use pose estimation, but if you run into trouble with reliability, you could try using bounding box annotationing on the heads and then use any number of key points localized to the head (eyes, ears, or just center), but you’d lose track of one of their head got covered, like how maybe by one sheep would jump onto another (I’m just speculating here, I’m not an expert in sheep behavior).

You could quickly try some segmentation using models like SAM2 or FastSAM to help you generate segmentation data to train with. The useful thing about having those segments is that they can be used to get bounding boxes as well.

pedestrianman · January 20, 2025, 3:41pm

Thank you! I’m going to run some initial tests with a segmentation model to see how it turns out

pderrenger · February 12, 2025, 1:32pm

Hello!

Regarding your question about keypoints outside the bounding box, this is supported in YOLOv11 pose estimation. The model predicts keypoints and can handle situations where keypoints fall outside the detected bounding box.

For your project involving tracking sheep, pose estimation could indeed be a viable approach, especially if you focus on detecting heads and adding keypoints for body orientation. The calculations for keypoint loss occurs in the calculate_keypoints_loss function, which you can read more about here. You can also examine the Pose forward pass here.

Ultimately, the best approach depends on your specific data and goals. Since accuracy is crucial for your project, I recommend experimenting with both pose estimation and segmentation to see which performs better in your real-world conditions.

Topic		Replies	Views
YOLOv11 pose YOLO pose , support , code	2	156	December 4, 2024
Changing bounding boxes to polygons Discussion question	2	97	December 7, 2024
Yolov8m pose - post-processing error after exporting to onnx and tensorrt YOLO yolov8 , bug-fix	5	26	July 25, 2025
Extracting Wind Turbine Blade Tips as Keypoints YOLO	3	39	February 12, 2025
Clarification on YOLOv8-Pose Visibility Flags (0,1,2) – Do Occluded Keypoints Affect Training? Support support	5	330	May 9, 2025

Pose Estimation - Key Points Outside Bounding Box

Related topics