Is it impractical for me to run a YOLO11n quantization model on an ESP32S3?

Using ESP-IDF 5.3, espressif/esp-tflite-micro: "^1.3.4", ESP32S3 (16MB FLASH/8MB PSRAM), without a file system, I directly burned the tflite file exported from YOLO11N training and quantization to the model partition of the partition table. I then used esp_partition_read to read the model and simulate the generation of a static image for inference. Model information: Number of subgraphs: 1

ESP-IDF5.3, espressif/esp-tflite-micro: “^1.3.4”,ESP32S3(16MB FLASH/8MB PSRAM),无文件系统,我把YOLO11N训练并量化导出的tflite文件直接烧录到分区表model分区,通过esp_partition_read读取模型并模拟生成一张静态图像做推断,模型信息:Number of subgraphs: 1
Number of tensors: 817
Number of operators: 413,结果基本正确,然而非常耗时,TENSORFLOW计算耗时: 13365554 us,是我的思路问题,还是说这个模型过于复杂无法运行在ESP32S3呢

Which file did you use aftwr quantization? Did you use full_integer_quant file?

没有用full_integer_quant,我是把saved_model.pb用tf.lite.TFLiteConverter.from_saved_model(model_path)量化并转换uint8格式

Use full_integer_quant generated by Ultralytics

thinks! I’ll give it a try. The main reason I didn’t use the full_integer_quant.tflite at first was that it couldn’t be exported in the uint format.

Great question. Two key points:

  1. TFLite Micro prefers int8, not uint8. Ultralytics full_integer_quant is int8 by design (weights, activations, and I/O). You can feed your uint8 image by applying the tensor’s quant params (scale/zero_point, typically zero_point≈128) and casting to int8 on-device.

  2. Export the MCU-friendly model directly from Ultralytics with INT8 calibration and a much smaller, fixed input size:

from ultralytics import YOLO
YOLO("yolo11n.pt").export(format="tflite", int8=True, imgsz=160, nms=False, data="your_dataset.yaml")

This produces yolo11n_full_integer_quant.tflite. Details are in the TFLite export guide at the Ultralytics docs.

That said, YOLO11n is still heavy for ESP32S3; ~13 s/inference is expected. To squeeze more speed: keep imgsz at 128–160, build esp-tflite-micro with esp-nn optimized kernels enabled, and use a Release/O3 build. Real-time typically requires an accelerator or a stronger edge device; for example, see our Coral Edge TPU on Raspberry Pi guide for a practical path to realtime.