I plan to use an external camera on a Raspberry Pi, typically at 1920×1080, while the model input is 640×640. Does this mean the image will be downsampled from HD to 640×640?
Wouldn’t that imply that if an object is small or occupies only a small part of the frame—and is unlikely to move closer to the camera to appear larger—its detection accuracy could be much lower, or it might not be detected at all?
In this case, is it possible to use a tiling approach (segmenting the main frame into smaller tiles) instead of downscaling the entire frame?
640x640 is sufficient for most use cases. It’s not even a small resolution in the context of deep learning models. Classification models for example eun at 224x224 image size usually. Unless the objects are really small, like a football in a field, you shouldn’t have an issue.
Also the input isn’t fixed. You can train a YOLO model with higher image size, or even predict with a higher image size using an existing model. You just need to pass the imgsz argument.
You don’t need tiling unless your objects are tiny.
Like I said, tiling isn’t the only thing you can do. You can just train the model with larger imgsz for small objects. You don’t need tiling unless we are talking about tiny objects.
And both tiling and increasing image size will drastically increase your compute requirement and reduce the speed. Tiling even more than increasing image size.