Context
I am analysing positional sensitivity in object detectors (YOLOv8 and RT-DETR via Ultralytics).
For a fixed image containing a single defect (tested on defects which go from like 1-4% of image area), I generate a circular vertical shift sweep:
-
Shift image by
dy = 0..H-1(1 px increments, circular withnp.roll). -
Run inference for each shift.
-
Track the matched detection (IoU-based matching to the same GT object).
-
Record confidence
conf(dy).
Recall in these sweeps is ~1.0 (detection almost never drops).
What I observe
The confidence signal conf(dy) is approximately periodic.
I compute periodicity using:
-
Sliding windows (STEP=1)
-
Z-normalization per window to focus on patterns instead of absolute conf
-
Mean window-correlation by lag
Clear peaks appear in the lag spectrum.
Clean result (warp 640×640, imgsz=640)
Both YOLOv8 and RT-DETR show:
-
Dominant lag ≈ 32 px
-
Extremely Strong harmonics (many times basically the same as 32px and sometimes even higher by an extremely small amount): 64, 96, 128, …
Correlation values are high (≈0.95–0.99).
This is extremely stable across multiple images.
Resolution experiments
I warped the same base image to different square resolutions before inference:
-
320×320
-
640×640
-
1280×1280
Inference always with imgsz=640.
Observed dominant lags (some show high lags for multiples but a trend is visible):
-
320 → ~16
-
640 → ~32
-
1280 → ~64
After correcting for scale all collapse to ~32 px in input space.
So the fundamental periodicity appears invariant in input coordinates.
Rectangular image case (1700×1200)
Using original aspect ratio:
-
YOLOv8 (default inference uses LetterBox, aspect ratio + padding)
→ dominant period ≈ 85 px. 1700/20 = 85 -
RT-DETR (default inference uses scale_fill=True, warp without padding)
→ dominant period ≈ 60 px. 1200/20 = 60
These differences are explained by different vertical scaling factors in preprocessing.
After correcting for scale_y, both are again consistent with ~32 px in model input space.
What I am trying to understand
-
What architectural mechanism would generate a ~32 px periodic confidence oscillation in input space?
-
Is this most likely tied to:
-
FPN stride structure (e.g., stride 8/16/32 heads)?
-
Effective quantization grid in the detection head?
-
Positional encoding discretization?
-
Assignment / decode / NMS behaviour?
-
-
Why does the fundamental appear so clean and stable across images?
-
Why does RT-DETR (transformer-based) exhibit the same periodic behaviour?
Key observation
The phenomenon:
-
Is not random
-
Is not resolution-dependent (after normalization)
-
Persists across architectures, yolo8 various sizes and even yolov11
-
Has strong harmonics
-
Is tied to input pixel space
It behaves like a spatial sampling clock inside the detector.
I will attach:
-
YOLOv8 pattern on original 1700×1200 image on one defect i tested with (85px, first img)
-
YOLOv8 640×640 warp pattern on that same defect (32px pattern and multiples, 2nd img)
Please ask any questions you need, all help is welcome, thank you so much for the help.


