What you describe sounds like it could be done using a simple conditional check during inference. If the first pass detects objects, but one or more objects are below a specified threshold or if no objects are detected, it would trigger the conditional path. On the conditional path you’d call a function with the image, and center points of any low confidence detections (empty list/array otherwise) to do a second inference pass. The function would then could either use the same model or load a new model, slice the image into the necessary tiles, and then perform inference on each tile.
This wouldn’t require any modification to the model, and should be relatively straightforward to implement. It would give you an “fast” path, when all detections are above a given threshold (likely the common route), and a “slow” path, running a second inference on the tiles. The slow path could even use SAHI for tiled inference with YOLO, which could add more latency but increase the overall accuracy of the second pass.