Hello everyone,
I am trying to implement an object detection model on an MCU with a NPU using several YOLO models. Right now, I am trying to implement it with YOLOv11n, however, even though i apply int8 quantization, I cannot achieve the size I want to achieve. My MCU can only accept int8, therefore I cannot use floating point as well.
So, I wanted to explore how to do pruning, what are the important steps and what to keep in mind while doing it. I am quite unsure on which layers to prune and so on.
Thank you for your time and help.