Ultralytics 8.3.6 with torch==2.6.0 + torchvision==0.21.0 is silently falling back to CPU during model.predict() despite .to('cuda') and device=0

from ultralytics import YOLO

results = model.predict(
source=“VIDEO.mp4”,
device=0,
stream=True,
save=True,
show=True
)

Please run the CLI command yolo checks with your Python environment active, and post the output here.

ultralytics 8.3.163 :rocket: Python-3.12.3 torch-2.7.1+cu126 CUDA:0 (NVIDIA GeForce RTX 4090, 24202MiB)
Setup complete :white_check_mark: (24 CPUs, 60.4 GB RAM, 610.2/1875.7 GB disk)

OS Linux-6.11.0-29-generic-x86_64-with-glibc2.39
Environment Linux
Python 3.12.3
Install pip
Path /venv/lib/python3.12/site-packages/ultralytics
RAM 60.43 GB
Disk 610.2/1875.7 GB
CPU AMD Ryzen 9 9900X 12-Core Processor
CPU count 24
GPU NVIDIA GeForce RTX 4090, 24202MiB
GPU count 1
CUDA 12.6

numpy :white_check_mark: 1.26.2>=1.23.0
matplotlib :white_check_mark: 3.6.2>=3.3.0
opencv-python :white_check_mark: 4.9.0.80>=4.6.0
pillow :white_check_mark: 10.2.0>=7.1.2
pyyaml :white_check_mark: 6.0.1>=5.3.1
requests :white_check_mark: 2.31.0>=2.23.0
scipy :white_check_mark: 1.11.4>=1.4.1
torch :white_check_mark: 2.7.1>=1.8.0
torch :white_check_mark: 2.7.1!=2.4.0,>=1.8.0; sys_platform == “win32”
torchvision :white_check_mark: 0.22.1>=0.9.0
tqdm :white_check_mark: 4.66.1>=4.64.0
psutil :white_check_mark: 5.9.6
py-cpuinfo :white_check_mark: 9.0.0
pandas :white_check_mark: 2.2.1>=1.1.4
ultralytics-thop :white_check_mark: 2.0.14>=2.0.0

Okay, it looks like the packages are correctly configured and your system looks like it’s more than capable (and very nice btw). I’m wondering if you try running this CLI command:

yolo val model=yolo11n.pt data=coco128.yaml device=0

if everything processes on the GPU? Running this will automatically download a small dataset to test with. I know it’s not 100% the same as your original command you posted, but I’m just looking to establish a baseline regarding usage of the GPU.