Hi Ultralytics community,
I am working on a lightweight model based on YOLO11N. I didn’t change the detect layer, but as expected, the accuracy dropped compared to the original model. To recover the lost performance, I plan to apply knowledge distillation (KD) using YOLO11M as the teacher.
I created a custom trainer by inheriting from DetectionTrainer:
from ultralytics.models.yolo.detect import DetectionTrainer
class KDDetectionTrainer(DetectionTrainer):
def __init__(self, cfg=None, overrides=None, distiller='mgd', loss_weight=1.0, _callbacks=None, teacher=None, student=None):
super().__init__(cfg=cfg, overrides=overrides, _callbacks=_callbacks)
self.model = student
self.teacher = teacher
self.teacher.model.eval()
for param in self.teacher.model.parameters():
param.requires_grad = False
And I instantiated it like this:
from ultralytics import YOLO
from ultralytics.models.yolo.detect.KDDetectionTrainer import KDDetectionTrainer
from ultralytics.utils import DEFAULT_CFG
student_model = YOLO("yolo11n.yaml")
teacher_model = YOLO(r"C:\Users\Hunger\Desktop\ultralytics\Custom_Distiller\best.pt")
args = dict(data="coco8.yaml", epochs=3)
trainer = KDDetectionTrainer(cfg=DEFAULT_CFG, student=student_model, teacher=teacher_model, overrides=args)
trainer.train()
My main question is: how can I access the outputs of the student and teacher during training so I can compute a custom distillation loss? I want to combine the original YOLO loss with the KD loss, but I’m not sure what is the proper way to hook into the training loop of DetectionTrainer to get the intermediate outputs.
Also, if anyone has similar implementations of knowledge distillation with Ultralytics YOLO models, it would be really helpful if you could share them for reference.
Any guidance or example on extending DetectionTrainer for knowledge distillation would be greatly appreciated!
Thanks in advance!