Has anyone succeeded in building a Flutter app using either 1) a single YOLO pose model having multiple classes or 2) multiple YOLO pose single-class models on the same camera (video) stream?
I’ve succeeded in transfer learning a single-class yolo11n-pose and exporting a tflite model (via Google Colab) and basing my app on examples here, but I can’t get any of these methods to succeed for multiclass (3 classes of 6, 4, and 2 keypoints in my case) pose estimation:
single model: 3-class, 3d (x, y, visibility) training with rows of annotation files padded to same number (6) of max keypoints (updated YOLO.kt, YOLOView.kt, PoseEstimator.kt files for Android).
#1 but 2d (x, y) only.
#1 and #2 with annotation files padded to the sum of the total number of keypoints (6+4+2 = 12).
Separately trained models for each class with annotations padded to the max (6) number of keypoints (and updated YOLO.kt, YOLOView.kt, PoseEstimator.kt files for Android) and tried to call 3 instances of YOLOView (which probably causes video stream conflicts and/or shows only the last called instance)
Multi-class pose is supported in Ultralytics YOLO (including YOLO11), but the key limitation is that a single pose model can only have one fixed kpt_shape for all classes. So “3 classes with 6, 4, and 2 keypoints” can’t be represented natively as three different keypoint schemas inside one model head.
The practical way to do this is to define one global keypoint set (e.g. 12 keypoints total) and train with kpt_shape: [12, 3] (or [12, 2]), then for classes that don’t use some keypoints you pad them as “missing” (typically x=0 y=0 v=0 for the unused keypoints). Your app then only renders the subset that applies to the predicted class.
# data.yaml
path: /data
train: images/train
val: images/val
names: [c0, c1, c2]
kpt_shape: [12, 3] # fixed for ALL classes
If you instead go with multiple single-class models, the main issue in Flutter is exactly what you suspected: don’t create 3 camera views/streams. Use one camera stream and run 3 interpreters on the same frame buffer (serially is simplest). With the current ultralytics_yolo Flutter examples, that typically means one YOLOView and custom native changes if you want multi-model inference in the same pipeline.
One more important gotcha: many mobile examples hardcode COCO assumptions (like 17 keypoints). For custom pose, make sure your Android PoseEstimator parsing is derived from the model output shape (compute nkpt and ndim) rather than hardcoding. A quick sanity check is to confirm the exported model behaves on desktop first by running inference/validation on the exported .tflite, since Ultralytics supports predicting directly on exports as shown in the Pose export docs.
If you paste your data.yaml (especially kpt_shape) and the TFLite output tensor shape(s) you see on Android, I can tell you exactly what the parser should infer for nc, nkpt, and keypoint dims for your model.
Thanks, @pderrenger, I’m now exploring OBB. If I can figure out how to query the oriented box coordinates (x, y, angle) from YOLOView/yolo_result.dart, that will probably meet my needs.