Greetings,
I have been working on a custom pose estimation task where I defined a custom skeleton for human keypoints. My goal is to train an auto-annotator model to help label the rest of my dataset more efficiently.
To ensure high-quality annotations, I manually labeled a subset of images using CVAT.ai and followed the standard visibility convention:
- 0 = Out-of-view (keypoint is not visible and not labeled)
- 1 = Occluded (keypoint is present but not visible)
- 2 = Fully visible
During this process, I was very careful to switch the visibility flag to 1 for occluded keypoints—for example, if a human’s shoulder, elbow, or hand was blocked by another object or the human’s own body, but still present, I assigned it as occluded (1) instead of visible (2).
After training my YOLOv8-Pose model on this annotated subset, I used it to auto-annotate the rest of the dataset. However, when inspecting the output, I noticed somethin:
- The model does not seem to predict any occluded keypoints (1).
- Instead, it only outputs fully visible keypoints (2) or out-of-view keypoints (0).
- I checked the results.keypoints.data object and confirmed that all keypoints were either marked as visible with a score of 0.999… or out-of-view (with coordinates 0,0 and a score between 0.02 and 0.5), with no predictions for 1 (occluded) at all.
I conducted a thorough research on whether the occlusion flag (1) has any effect on training in YOLOv8-Pose. However, I found conflicting information:
- Some sources indicate that visibility flags (1 vs. 2) do not affect training, meaning all labeled keypoints are treated equally regardless of visibility.
- Other sources state that occluded keypoints (1) do contribute to training and loss calculations (OKS-based loss).
This leads to my question:
Does setting a keypoint’s visibility to 1 (occluded) have any effect on the training process in YOLOv8-Pose?
Or does the model treat occluded keypoints the same as visible keypoints (2) during optimization?
This is extremely important for me because:
- If occlusion labels matter, I will continue manually switching occluded keypoints from 2 to 1 while fine-tuning my auto-annotated dataset.
- However, if occlusion labels do not impact training, I can skip this tedious manual process, saving a huge amount of time.
Can you confirm how YOLOv8-Pose actually handles occluded keypoints (1) during training? Does it use OKS loss with different weighting for occlusions, or does it simply treat all labeled keypoints equally? In other words, if I skip the manual switching of the occluded flag on my auto-annotated dataset where needed, would this give me a problematic outcome?
I am aware that this question was asked before on Ultralytics Github repo discussion section by different users under #6945 and #3409 but the responses from Ultralytics to these inquiries seemed a bit contradicting, or I misinterpreted those responses. While one say “no” the other seems to say “yes”.
As far as I understand, if the architecture uses the OKS metric during training, then occlusion should come into play, and OKS is mentioned in the Ultralytics documentation but still not very clear if it actually uses a different weighting mechanism for the occluded keypoints.
Thank you in advance for the help.