Observing increasing inference time for each frame in video inferencing

Hello,
I have built a custom model training. I am using detect.py script for video inferencing. Video is of 20 minutes duration and static file (Not feeding from rtsp). I am observing a strange behavior during initial frame the inference time (i.e model predict time is around 100 ms) however as my frame is increasing this time is increasing to around 4000 ms. Unable to find out what is going wrong. I am using CUDA 16GB (P4) and all default option for detect.py (i.e. not saving any image or text file). Please help me out to identify some pointer in this.