How to reproduce YOLO11n-OBB 78.4 mAP50 on DOTA-v1.0 as shown in the official docs?

Hi everyone,

I am trying to reproduce the 78.4% mAP50 reported for YOLO11n-OBB on DOTA-v1.0 (as shown in the official documentation: https://docs.ultralytics.com/tasks/obb/#visual-samples).

Below is a detailed description of my setup and process:

Environment:

  • python: 3.10
  • Ultralytics: 8.3.53
  • PyTorch version: 2.9.0+cu128
  • CUDA version: 12.8
  • OS: Ubuntu 22.04.5
  • GPU: RTX 4090

Dataset Preparation

  • Dataset: DOTA-v1.0 (downloaded from the official DOTA dataset website)

  • Test set cropped using DOTA_devkit / ImgSplit_multi_process

    • gap = 200

    • subsize = 800

    • num_process = 8

  • After cropping, each test image is 800Ă—800 in size.

Inference Setup

  • imgsz = 1024

  • Model: yolo11n-obb.pt (downloaded from the Ultralytics official release page)

  • Task: obb

  • Dataset config: data=dota.yaml (standard DOTA-v1.0 format)

  • Inference performed with my script (attached below).

-- coding: utf-8 --

import os
from pathlib import Path
from ultralytics import YOLO

def main():
  # Model and data paths
  model = YOLO(“sj_best_0611.pt”)
  # Same as the official setting: specify data yaml, split = test
  results = model.predict(
    source=r"divided_test_img",  # directory containing test images
    imgsz=1024,
    task="obb",
    device="0",
    stream=True
  )

  out_dir = Path(r"my_output_dir")
  out_dir.mkdir(exist_ok=True)

  # DOTA v1 class names
  DOTA_CLASSES = [
    'plane', 'ship', 'storage-tank', 'baseball-diamond', 'tennis-court',
    'basketball-court', 'ground-track-field', 'harbor', 'bridge',
    'large-vehicle', 'small-vehicle', 'helicopter', 'roundabout',
    'soccer-ball-field', 'swimming-pool'
  ]

  # Initialize empty result files
  files = {c: open(out_dir / f"Task1_{c}.txt", "w") for c in DOTA_CLASSES}

  for res in results:
    imgname = Path(res.path).stem
    for box in res.obb:
        cls = int(box.cls)
        conf = float(box.conf)
        poly = box.xyxyxyxy.reshape(-1).tolist()
        line = f"{imgname} {conf:.6f} " + " ".join(f"{p:.2f}" for p in poly) + "\n"
        files[DOTA_CLASSES[cls]].write(line)

  for f in files.values():
      f.close()

if __name__ == "__main__":
    main()

Result Merging

  • Used ResultMerge_multi_process from DOTA_devkit

  • nms_thresh = 0.1

  • Generated final merged result files and zipped them for submission.

Evaluation

Submitted the merged results to the DOTA official evaluation server, but my test result is only 0.731 mAP50,which is lower than the 78.4% reported in the YOLO official documentation.

Question

Could you please help me identify what might cause this performance gap?

  • Are there specific image split parameters, test-time augmentations, or NMS settings used in the official benchmark?

  • Should I modify my imgsz, gap, or merge thresholds to better match the official setup?

  • Also, should I be using an official tool to convert YOLO predictions into the DOTA server’s accepted format, instead of writing my own conversion script? If such a tool exists, could you please let me know where to find it?

Thank you very much for your time and for maintaining this amazing project!

Prediction wouldn’t produce the same output as validation.

You can run the code provided in the docs to the get the files for submission:
yolo val obb data=DOTAv1.yaml device=0 split=test

1 Like

Thank you very much for your reply! I’ll try it again.

Hi,

I’m trying to reproduce the official YOLO11n-OBB results on the DOTA-v1.0 dataset.

First, I ran the following command:

yolo val obb data=Dota1_0.yaml device=2 split=test

After inference, no labels folder or result files were generated.

Then I modified the command as:

yolo val obb model=yolo11n-obb.pt data=Dota1_0.yaml device=2 split=test imgsz=1024 save_txt=True save_conf=True

This time, the labels folder was created, but it contains one .txt file per cropped image (for example, P0006__1__0___0.txt), instead of the 18 Task1_<class>.txt files required by the official DOTA evaluation server.

Each .txt file looks like this (example lines below):

9 0.877344 0.373613 0.90012 0.350302 0.815076 0.267208 0.792299 0.29052 0.893054
9 0.367070 0.453821 0.45460 0.367616 0.432601 0.345272 0.345066 0.431477 0.887902
9 0.364843 0.701617 0.45466 0.618353 0.433906 0.595954 0.344080 0.679218 0.885718

Each line follows the format:
cls conf x1 y1 x2 y2 x3 y3 x4 y4
where coordinates are normalized (relative) values between 0 and 1.

For reference, my test set was cropped using DOTA_devkit with:

subsize = 800
gap = 200
num_process = 8

My questions are:

  1. Is this output format normal for yolo val obb when running on the test split?

  2. Is there any official Ultralytics tool to convert or merge these per-image .txt predictions into the DOTA Task1 format (i.e., Task1_plane.txt, Task1_ship.txt, etc.)?

  3. Or should I manually merge them with a custom script to generate the Task1 submission files?

Environment:

  • Python 3.10

  • Ultralytics 8.3.53

  • PyTorch 2.9.0+cu128

  • CUDA 12.8

  • Ubuntu 22.04.5

  • GPU: RTX 4090

Thank you very much for your time and support!

Try with save_json=True

1 Like

Thanks for the quick reply! I tried save_json=True, which generates a single COCO-style predictions.json.
However, DOTA’s server expects Task1_.txt files (format: image_id score x1 y1 x2 y2 x3 y3 x4 y4), not COCO JSON.

To clarify my original questions:

  1. Is the per-image .txt output from yolo val obb ... save_txt=True expected on the test split?

  2. Is there any built-in Ultralytics tool/flag to export Task1_*.txt directly for DOTA, or should users manually merge per-image txt into Task1 format?

If manual merging is the current approach, could you confirm there’s no official exporter and (if available) point to a recommended reference implementation?

Thanks again!

save_json would generate the txt files if you use DOTAv1.yaml. You need to use the official DOTAv1.yaml that Ultralytics provides exactly. Don’t rename the YAML file. It should be DOTAv1.yaml exactly.

1 Like

Hello,

Thank you very much for your response, and I apologize for my late reply.

I found the eval_json function in ultralytics/ultralytics/models/yolo/obb/val.py, which is used to convert the validation results of the cropped sub-images into the DOTA server-compatible format and return the merged results. When the path contains the string “DOTA” and the save_json parameter is enabled, this function is automatically called to save the output.

I’ve successfully saved the merged validation results locally and am now preparing to submit them to the DOTA evaluation server.

Thanks again for your kind help.

Below are the trigger logic and the eval_json source I’m using.

def eval_json(self, stats):
    """Evaluates YOLO output in JSON format and returns performance statistics."""
    if self.args.save_json and self.is_dota and len(self.jdict):
        import json
        import re
        from collections import defaultdict

        pred_json = self.save_dir / "predictions.json"  # predictions
        pred_txt = self.save_dir / "predictions_txt"  # predictions
        pred_txt.mkdir(parents=True, exist_ok=True)
        data = json.load(open(pred_json))
        # Save split results
        LOGGER.info(f"Saving predictions with DOTA format to {pred_txt}...")
        for d in data:
            image_id = d["image_id"]
            score = d["score"]
            classname = self.names[d["category_id"] - 1].replace(" ", "-")
            p = d["poly"]

            with open(f'{pred_txt / f"Task1_{classname}"}.txt', "a") as f:
                f.writelines(f"{image_id} {score} {p[0]} {p[1]} {p[2]} {p[3]} {p[4]} {p[5]} {p[6]} {p[7]}\n")
        # Save merged results, this could result slightly lower map than using official merging script,
        # because of the probiou calculation.
        pred_merged_txt = self.save_dir / "predictions_merged_txt"  # predictions
        pred_merged_txt.mkdir(parents=True, exist_ok=True)
        merged_results = defaultdict(list)
        LOGGER.info(f"Saving merged predictions with DOTA format to {pred_merged_txt}...")
        for d in data:
            image_id = d["image_id"].split("__")[0]
            pattern = re.compile(r"\d+___\d+")
            x, y = (int(c) for c in re.findall(pattern, d["image_id"])[0].split("___"))
            bbox, score, cls = d["rbox"], d["score"], d["category_id"] - 1
            bbox[0] += x
            bbox[1] += y
            bbox.extend([score, cls])
            merged_results[image_id].append(bbox)
        for image_id, bbox in merged_results.items():
            bbox = torch.tensor(bbox)
            max_wh = torch.max(bbox[:, :2]).item() * 2
            c = bbox[:, 6:7] * max_wh  # classes
            scores = bbox[:, 5]  # scores
            b = bbox[:, :5].clone()
            b[:, :2] += c
            # 0.3 could get results close to the ones from official merging script, even slightly better.
            i = ops.nms_rotated(b, scores, 0.3)
            bbox = bbox[i]

            b = ops.xywhr2xyxyxyxy(bbox[:, :5]).view(-1, 8)
            for x in torch.cat([b, bbox[:, 5:7]], dim=-1).tolist():
                classname = self.names[int(x[-1])].replace(" ", "-")
                p = [round(i, 3) for i in x[:-2]]  # poly
                score = round(x[-2], 3)

                with open(f'{pred_merged_txt / f"Task1_{classname}"}.txt', "a") as f:
                    f.writelines(f"{image_id} {score} {p[0]} {p[1]} {p[2]} {p[3]} {p[4]} {p[5]} {p[6]} {p[7]}\n")

    return stats

Great find — that’s exactly how it works.

To reproduce and auto-create DOTA Task1 files, run the official val command with the official DOTAv1.yaml (name/path must contain “DOTA” to trigger the exporter) and save_json=True:

yolo obb val model=yolo11n-obb.pt data=DOTAv1.yaml split=test imgsz=1024 save_json=True

This will save:

  • split results in runs/val-obb/…/predictions_txt/Task1_*.txt
  • merged results in runs/val-obb/…/predictions_merged_txt/Task1_*.txt

The export/merge logic you referenced is implemented in the validator; see the OBBValidator eval_json reference for details, including the merging step and thresholds, which are designed to match DOTA devkit closely: OBBValidator eval_json reference. For the full workflow and the exact reproduce command, see the OBB task page: OBB task docs.

Notes:

  • save_txt=True writes per-image, normalized debug labels; save_json=True triggers the DOTA Task1 exporters.
  • The merged files may differ slightly from the official devkit merge; if you want exact scoreboard behavior, you can still feed predictions_txt into the DOTA devkit merge. Otherwise, submit predictions_merged_txt.

If your mAP still differs from the 78.4 mAP50, ensure you’re on the latest ultralytics release and using imgsz=1024, single-scale, default NMS (no TTA). Happy to help compare if needed.

1 Like