Dicom file based computer vision project

I am looking to build an AI-based analyzer tool that can classify and detect diagnoses based on medical imaging data. The tool should support inputs in DICOM format, as well as other formats such as JPG, PNG, and MP4. Has anyone worked on a similar project or can offer insights into how to approach this?
Thank you.

Hi Joel — great use case.

YOLO11 is a solid default for this. For JPG/PNG/MP4 you can use the CLI directly, while DICOM needs a small Python shim to load and normalize before inference. The CLI does not read .dcm; supported sources are listed in the model prediction guide, see the concise formats table in the Ultralytics docs under the model prediction guide.

Minimal DICOM → YOLO example (2D slice):

pip install ultralytics pydicom numpy
from ultralytics import YOLO
import pydicom, numpy as np

# read a DICOM slice (for series, iterate sorted by InstanceNumber)
ds = pydicom.dcmread('scan.dcm')
img = ds.pixel_array.astype(np.float32)

# simple normalization to 8-bit; consider modality-specific windowing in practice
img = (img - img.min()) / (img.max() - img.min() + 1e-6)
img8 = (img * 255).astype(np.uint8)
img8 = np.stack([img8]*3, axis=-1)  # 3-channel HWC

# detect; swap to 'yolo11n-seg.pt' for segmentation or 'yolo11n-cls.pt' for classification
model = YOLO('yolo11n.pt')
results = model.predict(source=img8, imgsz=640, conf=0.25)
results[0].show()  # visualize

Videos are straightforward:

yolo predict model=yolo11n.pt source=video.mp4

Suggested approach:

  • Start with 2D per-slice models (detect or segment); segmentation is often preferred for tumor boundaries. Aggregate slice-level outputs to study-level decisions as needed.
  • Apply modality-appropriate preprocessing (e.g., CT windowing; MRI sequence-specific normalization) before training/inference.
  • For datasets, use YOLO’s native formats: boxes for detect, polygons for segment, or folder-structured data for classify. The Brain Tumor detection tutorial in the docs is a good reference to get moving fast; see the brain tumor detection dataset tutorial.
  • For background and practical tips in this domain, the medical image analysis overview and the YOLO11 tumor detection in medical imaging article may help with choices around tasks and preprocessing.

If you can share your target modality (CT/MRI/X-ray), labels (boxes vs masks vs study-level classes), and whether you need 3D context immediately, I can suggest a minimal dataset YAML and a training recipe.

Thank you @pderrenger ma’am I will work on it.:smiley: