Help with Yolo Pose 1.0 Export

Hi,

i am experiencing an issue training a yolo dataset. I have only 5 keypoints (skeleton) and im exporting in YOLO 1.0 Pose (in CVAT).

Labels format (4,3): 0 0.438824 0.522189 0.463846 0.347398 0.347192 0.348490 2 0.531111 0.695888 2 0.670747 0.454067 2 0.206901 0.591736 2

I have a a base of 200 Images (160 train, 40 val). Since its just 4 Keypoints i thought 200 pics should be enough?

here is my train command:

yolo pose train model=yolo11n-pose.pt data=“configs/Keypoints_Anno.yaml” epochs=50 imgsz=640 batch=16 name=“board_keypoints_y16” lr0=0.001 patience=30

So when the training is finished the key points are completly off, but not the Bounding Box of the skeleton model. So far i thought that the automatic boundingbox should be display the Keypoints outline. But in my predictions the Bounding box is exactly where the keypoints should be. The keypoints themself are pretty much all over the place.

I attached 1 annotated batch and one predicted. So my aim is that the keypoints lock in on the exact edge of the tripple fields.

As you can see in the outcomes they are completely off.

Still i get a map50-95 of 0.995

Im really confused. When i use yolo segmentation everything works fine for me.

Please help, kind regards,

Chris

The problem is that the bounding box in CVAT is auto-generated from keypoints. But the correct bounding box should cover the outline of the object, not be over the keypoints. You need to fix that.

Also with that few images, you need to train for more epochs. Can you post the content of your data.yaml file?

Hey thanks for you help, appreciate it! ,

i did not know that i have to put a bounding box around the object. I thought that i cant export Keypoints (I used a skeleton model for that, is that even right?) & Bounding boxes in yolo pose 1.0 at the same time.

I just tried 100 epochs but it’s not improving :confused:

I attached the .yaml file

kind regards,

Which points are 0, 1, 2, and 3 in your image?

First is, TOP → Bottom → Right → Left

now i tried to create a bbox around the board using using one label and one skeleton structure (4 points). I exported with yolo pose 1.0 still it ignores the bbox and creates the default one. Maybe yolo pose aint working and i have to use a different export format?

really dont get it.

From your original post:

0 0.438824 0.522189 0.463846 0.347398 0.347192 0.348490 2 0.531111 0.695888 2 0.670747 0.454067 2 0.206901 0.591736 2

split into sections:

0  # class ID
0.438824 0.522189 0.463846 0.347398  # XYWH bbox
0.347192 0.348490 2  # top keypoint (x, y, visibility)
0.531111 0.695888 2  # bottom keypoint 
0.670747 0.454067 2  # right keypoint
0.206901 0.591736 2. # left keypoint

The bounding box will be the minimum area that encloses the keypoints. I suspect that using keypoints for this use case might be tricky, as the placement will have to be highly consistent with respect to the placement. Since it’s an actual intersection that’s being annotated, there are a low number of features for the model to detect (as opposed to something like a hand).
You mention using segmentation is fine, but I’m curious, why use key points? What’s the specific purpose you had hoped to accomplish with using key point pose estimations versus segmentation?

Also, just an FYI, the docs are always a good reference for understanding the annotation:

Hi!

Means that i dont have to create another bbox around the object?

Because i just wanted to geometrical unwarp the dartboard to be able to put on a dartboard grid. So i dont have to annotate all the different segments. But of course i can do this as well using segments. Just thought that keypoints should be a bit easier (As you just said its much harder, so ill switch to segments again). Now i used cvat images 1.1 (Bounding box around the objects and the 4 keypoints) converted it to YOLO format and trained again. Still off :confused:

Can you estimate how many images i do need to get accurate results?

Regards, Chris

No clue on how many images you’ll need to achieve the your goal, but generally several hundred or thousand will be ideal. From the examples you’ve shared, there’s some decent variety, but you might want to try using more location, additional angles, various lighting, and maybe even find another dartboard or two to image. The more variety in your source images, the better. Including other objects in the photos will help add variety as well. You could even include boards that have darts stuck in various locations as well (I would recommend since it’ll probably be part of your final use case). You can probably grab some images from online to help, if you don’t have other boards, or items that are normally found around dart boards. Here’s an example I found from a quick search:


I would also recommend including negative images, where you place other disc-shaped obects (even better if they somewhat resemble a dart board) to help the model train against what’s not what you want to detect.

If I understand your goal, you wanted to use the key points to help deskew the board image, making it planar relative to the image (flat)? As an assumption, I presume that you’ll want to know the upright orientation, as it won’t be very useful if the deskewing placed the board upside down. That might make using segmentation more difficult, as you don’t get an orientation, and generally b/c the board is circular, orientation is going to be tricky without the use of distinct points. If I were to attempt this, I think I would annotate all of the numbers on the board. This way, you can establish the orientation, since the number will (I’m guessing) always have the same layout. There are two ways you could attempt using the numbers.
The first would be using key points, but instead of only 4, you would include one key point for each number. I would place the key points in order of the numbers for simplicity, and at the edge of the board so if a line is drawn through all points, you could generate the board outline. Hopefully, this will help with additional features for the model to learn, as the numbers span two-dimensions. You’ll still need to be fairly consistent with the placement of the key points to ensure consistent prediction locations. Assuming that works well, this will give you the orientation and the outer counter of the board, which should be able to help with deskewing the board.
If you find that including key points for all numbers on the board still doesn’t work very well, then you could instead using bounding boxes (a standard YOLO detect model). Annotating bounding boxes for each number will again help provide you with multiple locations to help with deskewing and orientation. It won’t give you the outer contour as easily, but that could be a secondary step. The key to annotating with bounding boxes, will be to ensure that the sides of the boxes are touching the boundary of the number. When you do this, the center of the bounding box (x, y) can be used as a key point, and you should be able to place all of these at the same radius from the bullseye. Alternatively, you could attempt to capture three numbers inside a single bounding box, for the top, left, right, and bottom locations, but I suspect that this would be challenging to do consistently.