Best approach to count stacked cardboard boxes using CCTV (2D RGB only)

Muhammad_Fhadli · October 28, 2025, 9:21am

Hi everyone

I’m working on a real-world counting problem and would love to get some advice or ideas from the community.

I need to count the number of cardboard boxes in a warehouse — similar to this example image:

My setup:

Only one CCTV camera (2D RGB) available — no depth or stereo sensors.
Boxes are stacked tightly and often partially occluded.
The camera is fixed, so the viewing angle doesn’t change.

What I’ve tried / considered:

Object detection (YOLOv11 / OBB): struggles with overlapping boxes.
Instance segmentation (YOLOv11-seg or SAM): works better, but still has many false positives and under-segmentation (some clusters are merged).
Counting by area or volume estimation: not accurate due to perspective distortion.

My questions:

Is instance segmentation still the best approach for this case, or is there a more robust method to handle heavy occlusion?
Are there any recommended post-processing steps (e.g., edge-based mask refinement or geometric heuristics) to split merged boxes?
Would perspective correction or homography calibration help improve segmentation accuracy?
Any best practices for training YOLOv11-seg specifically for stacked box scenarios?

I’m open to any suggestions — pipeline design, dataset tips, or even loss function tweaks that could improve instance separation.

Thanks in advance

BurhanQ · October 28, 2025, 12:48pm

Questions to help get to an answer:

Are the boxes generally all the same size like shown in the image?
I understand the aim is to count the boxes, but what’s the overall goal? Where does the box count data get sent to?
Will there be multiple pallets (like your example image) or a single pallet in the frame? If it’s multiple, can it be changed to be single?

BurhanQ · October 28, 2025, 12:51pm

Also, FWIW, I using a YOLOE model, I was able to get this result

Here’s the exact code I used:

from pathlib import Path

from ultralytics import YOLOE

p = Path.home() / "Downloads"
f = p / "boxes.jpg"

model = YOLOE("yoloe-11l-seg.pt")  # Large segmentation model
names = [
    "box", 
    "bin", 
    "handtruck", 
    "person", 
    "garage door", 
    "forklift", 
    "pallet",
    "",
]  # other classes that might be in the image to help separate detections
model.set_classes(names, model.get_text_pe(names))

results = model.predict(f, iou=0.11, conf=0.06)
results[0].show(masks=True, labels=False)

Topic		Replies	Views
Object counting Real time from drone video Discussion code	3	851	January 10, 2025
Drink cans counting in production line using Ultralytics YOLOv8 🔥 Resources news , feature , resource , ultralytics-official	3	148	August 11, 2024
Yolo model and detecting people or objects like cars Discussion discussion	1	406	September 11, 2025
Help with counting shot gun pellet holes on paper Discussion yolo , discussion , code	1	27	May 20, 2026
Larger obejcts detection boxes being cropped YOLO	5	325	August 20, 2025

Best approach to count stacked cardboard boxes using CCTV (2D RGB only)

My setup:

What I’ve tried / considered:

My questions:

Related topics