Thanks for the link, that looks very helpful
Regarding jetson throttling, I already set power mode to maxn, gpu usage ranges from 20% to 100%, and cpu usage is 20% for each core
Thanks for the link, that looks very helpful
Regarding jetson throttling, I already set power mode to maxn, gpu usage ranges from 20% to 100%, and cpu usage is 20% for each core
I tried
model = YOLOE(yolo_model)
model.predictor.preprocess([image]) but it says predictor is none,
I also tried
from ultralytics.models.yolo.segment.predict import SegmentationPredictor
predictor = SegmentationPredictor()
predictor.setup_model(model.model)
preprocessed = predictor.preprocess([image])
but it still says preprocess is None,
whatâs the correct way to use predictor for yoloe, or does predictor only works with yolo models?
The predictor instance loads automatically when inference is started. If you look at the code that Toxite linked to, you should see
# Run it at least once to create the predictor.
model(save=False, show=False, conf=0.01)
The predictor class isnât really intended to be initialized manually.
Itâs going to be difficult to help troubleshoot what youâre cutting as the problem is you canât share the flame graph. If inference is reported as 14ms but you say the flame graph shows 200ms, without seeing the flame graph we can only assume what the issue could be.
Despite this, if the 14ms meets your requirements, and youâre not experiencing any issues, then why bother with the flame graph? There could be lots of reasons why it could appear that things are slower in the flame graph, so unless youâre still experiencing an issue with slow inference, itâs probably best to ignore it.
Thanks for the reply, the ultralytics log says inference time is 14ms, I am satisfied with that
My problem at hand is based on the flame graph, the warmup and forward functions in the engine/model.py takes a long time, and for my use case I want to reduce the segmentation time as much as possible, ideally the flame graph should show inference (warmup + forward) takes ~14ms, not 200ms
I understand not able to share the entire flamegraph is an issue, maybe I can try to create a minimal example to get rid of the privacy issue
But my priority for now would be to use tensorrt C++ API to use the engine model for segmentation, I would assume C++ API will speed things up compared with python
I donât think youâll get inference + warmup to be that fast. The warmup cycle is only supposed to run once:
thatâs b/c it adds a few quick inference calls on the model
Since this should only occur once, not every run, it should only impact the inference time once. Even the official TensorRT dev docs recommend using warm up, and even provide a time/iteration limit for the warmup cycle. Another area you might want to investigate, is the use of Dynamic Axes/Dimensions for your model. If you donât need them, then setting fixed dimensions can help, as dynamic values will likely add overhead for each call.
Thanks for the helpful tips, I would like to clarify about the dynamic axes, I am following Model Export with Ultralytics YOLO - Ultralytics YOLO Docs when exporting the models and set dynamic to False, hereâs the code
exported_path = model.export(
format="engine",
int8=True,
dynamic=False,
nms=False,
imgsz=640,
simplify=True,
device="cuda",
)
That makes sense. I brought up the dynamic axes b/c when I first integrated the INT8 export process, I had set dynamic=True by default since there were issues when it was not set. Since then itâs been updated to allow int8=True and dynamic=False, so itâs no longer forced by default and shouldnât be a concern since youâre disabling dynamic axes.