About parsing output

When analyzing the detection task of YOLO11, I have two questions regarding the tflite format (with INT8 input and output):

  1. For the INT8 output in the format of [batch, features, bbox], how can I determine the specific composition and order of the “features”?

  2. After obtaining the output and performing dequantization, is it still necessary to normalize the results? (The reason is that after parsing the dequantization results, I found that some of the classification scores are greater than 1. I assume that the “features” consist of x, y, w, h, and all classification scores.)

You can find an example with preprocessing and postprocessing here: