Detecting court keypoints from a pickleball video

I am trying to train a Yolov8 model to detect the keypoints of a pickleball court. I have annotated a dataset of 2000 images (1920 x 1080) which I have split into 1400 train - 400 valid - 200 test.

In addition I applied the following preprocessing and augmentation methods to my dataset:

  • Preprocessing: Auto-Orient, Resize: Fit within 640x640
  • Augmentations: Outputs per training example: 2, Grayscale: Apply to 25% of images, Hue: Between -25° and +25°

I trained a Roboflow 3.0 Keypoint Detection (Accurate) model on the platform itself and got fairly good results. However, when I trained my custom Yolov8-pose model on this dataset, it is performing terribly, as visible in the screenshot below:

My code for training the model:

from ultralytics import YOLO

model = YOLO(‘yolov8l-pose.pt’)

results = model.train(
data = ‘/content/PB-3/data.yaml’,
epochs=100,
imgsz=640,
device=0,
batch=16)

My code for detecting keypoints in my video:

model = YOLO(‘models/court_best_2.pt’) # Load the pre-trained YOLOv8l-pose model

model.predict(source=r’file.jpg’, save=True, device=0)

Can someone please explain where I am going wrong in my approach and how can I improve on this? Thanks in advance!

Can you post your data.yaml file?

Are all the keypoints consistent? They appear in the same position everytime.

My data.yaml:

flip_idx:

Yes all the keypoints are consistent, you can check out this embedded link to view my annotated dataset.

The flip_idx is incorrect:

I am unable to understand what’s being discussed, what does it mean when the user says points 5 and 7 are symmetrical?

I just read up on how the yolov8-pose training includes flipping the images horizontally and this is what the flip_idx is for. Given my annotations, how can I find out which integer idx (0,1,2…) corresponds to the respective court key points?

DeepSeek recommend training a model with fliplr = 0.0 since all my images have been taken just from the behind the baseline angle. Is this recommended?

That depends on the labelling tool you used. Probably based on the order you labelled the keypoints. The first keypoint you labeled is 0, second is 1 and so on.

This is why asked if your keypoints are consistent. By consistency, I was referring to each keypoint always being used for the same visual point. The order of keypoints is important. You can’t use inconsistent or arbritrary order, like the first keypoint you label is top left corner in one image, but in the next image, the first keypoint is top right. This is inconsistent and wouldn’t work.

flip_idx is used to fix the keypoints when horizontal and vertical flip augmentations are applied. For the same reason mentioned before, i.e., the order of keypoints is important, so after flipping if the keypoint that’s in top left becomes top right, it should be swapped because visually, top left looks like top right after flipping horizontally, so the keypoint indices are also swapped, otherwise it will make the labels inconsistent. The keypoints are visually tied to a dedicated location. They are not arbritrary or random.

You can use fliplr=0, but if your keypoint order isn’t consistent, then it will still cause issue.

I used Roboflow itself to annotate these images. And yes, my keypoints are consistent throughout the images - to annotate them I had to draw the skeleton on the images and drag each point to the respective point on the court. I have figured out the order of the points:

0 - background baseline left

1 - background baseline mid

2 - background baseline right

3 - background kitchen-line right

4 - background kitchen-line mid

5 - background kitchen-line left

6 - foreground kitchen-line left

7 - foreground kitchen-line mid

8 - foreground kitchen-line right

9 - foreground baseline right

10 - foreground baseline mid

11 - foreground baseline left

Which indices will get flipped for my use case?