Is it impractical for me to run a YOLO11n quantization model on an ESP32S3?

Z.Y_W · November 8, 2025, 4:06pm

Using ESP-IDF 5.3, espressif/esp-tflite-micro: "^1.3.4", ESP32S3 (16MB FLASH/8MB PSRAM), without a file system, I directly burned the tflite file exported from YOLO11N training and quantization to the model partition of the partition table. I then used esp_partition_read to read the model and simulate the generation of a static image for inference. Model information: Number of subgraphs: 1

ESP-IDF5.3, espressif/esp-tflite-micro: “^1.3.4”，ESP32S3(16MB FLASH/8MB PSRAM)，无文件系统，我把YOLO11N训练并量化导出的tflite文件直接烧录到分区表model分区，通过esp_partition_read读取模型并模拟生成一张静态图像做推断，模型信息：Number of subgraphs: 1
Number of tensors: 817
Number of operators: 413，结果基本正确，然而非常耗时，TENSORFLOW计算耗时： 13365554 us，是我的思路问题，还是说这个模型过于复杂无法运行在ESP32S3呢

Toxite · November 8, 2025, 4:14pm

Which file did you use aftwr quantization? Did you use full_integer_quant file?

Z.Y_W · November 8, 2025, 11:58pm

没有用full_integer_quant，我是把saved_model.pb用tf.lite.TFLiteConverter.from_saved_model(model_path)量化并转换uint8格式

Toxite · November 9, 2025, 3:58am

Use full_integer_quant generated by Ultralytics

Z.Y_W · November 9, 2025, 8:10am

thinks! I’ll give it a try. The main reason I didn’t use the full_integer_quant.tflite at first was that it couldn’t be exported in the uint format.

pderrenger · November 10, 2025, 12:45am

Great question. Two key points:

TFLite Micro prefers int8, not uint8. Ultralytics full_integer_quant is int8 by design (weights, activations, and I/O). You can feed your uint8 image by applying the tensor’s quant params (scale/zero_point, typically zero_point≈128) and casting to int8 on-device.
Export the MCU-friendly model directly from Ultralytics with INT8 calibration and a much smaller, fixed input size:

from ultralytics import YOLO
YOLO("yolo11n.pt").export(format="tflite", int8=True, imgsz=160, nms=False, data="your_dataset.yaml")

This produces yolo11n_full_integer_quant.tflite. Details are in the TFLite export guide at the Ultralytics docs.

That said, YOLO11n is still heavy for ESP32S3; ~13 s/inference is expected. To squeeze more speed: keep imgsz at 128–160, build esp-tflite-micro with esp-nn optimized kernels enabled, and use a Release/O3 build. Real-time typically requires an accelerator or a stronger edge device; for example, see our Coral Edge TPU on Raspberry Pi guide for a practical path to realtime.

Topic		Replies	Views
Exporting to tflite with int8 quantization for edge deployment Support yolo , question , support , troubleshooting	1	83	October 2, 2025
Quantization YOLO yolo	11	1102	October 30, 2024
I would like to quantize my custom trained model YOLO question	1	542	January 5, 2025
Yolov11 pruning and quantizing YOLO yolo , question	3	1378	April 9, 2025
Speed up inference time for the model trained with YOLO12x YOLO	30	224	August 22, 2025

Is it impractical for me to run a YOLO11n quantization model on an ESP32S3?

Related topics