As the title says, after I trained my dataset directly using YOLO weights (resized to 1280 before training and the original obeject is only 16×16 pixels in size within a 1600×1200 image.), I got new weights. When I validate a particularly difficult image with these weights alone, the confidence is still around 0.24. But after adding SAHI and tuning the parameters several times, the confidence actually drops to 0.15.
Unless the model was trained on tiles of the same size you’re using for SAHI, you won’t see any real improvement. The model needs exposure to tiled images during training so it learns how to detect objects within that context. You can’t train on full images and expect performance to improve on tiles afterward. The only exception is when tiling makes the objects appear at a scale similar to what the model saw during training. In that case, tiling is effectively just enlarging the objects to a familiar size.
Thanks for clearing that up! But why is it that in the official examples they don’t need to consider tiling during training, and the results still turn out really good?
It’s the exception I mentioned. The model was trained on normal sized objects, and then the inference is being made on images where those objects are smaller than what the model was trained. SAHI on those images is enlarging the object to be the same size as what the model was trained on.
The only exception is when tiling makes the objects appear at a scale similar to what the model saw during training. In that case, tiling is effectively just enlarging the objects to a familiar size.