SimpleDet Docs

Datasets

SimpleDet can inspect several dataset formats, but the suite plus pipeline workflow is still centered on COCO-style annotations.

Overview Quickstart Core Concepts API Reference

Use this page to decide which dataset formats are suitable for exploration and which ones must be converted before full training.

Supported loader formats

Format	Status	Notes
`coco`	Implemented	COCO JSON with boxes normalized from xywh to xyxy
`csv`	Implemented	Flat annotation tables with path, bbox, label, and optional split columns
`json` / `jsonl` / `ndjson`	Implemented	Simple records as a JSON list, records object, or line-delimited objects
`yolo`	Implemented	YOLO TXT labels with split inferred from label subfolders
`voc`	Implemented	Pascal VOC XML with ImageSets split files when present

Recommended layout

dataset_root/
  annotations/
    instances_train.json
    instances_val.json
    instances_test.json
  images/
    image_0001.png
    image_0002.png

Normalized loader payload

load_dataset() returns images, annotations, samples, categories, category_map, splits, and meta for every supported adapter. Annotation boxes use x_min, y_min, x_max, and y_max. Image, annotation, and sample records carry the resolved split so training helpers can filter deterministically.

COCO example

{
  "images": [{"id": 1, "file_name": "image_0001.png", "width": 1024, "height": 1024}],
  "annotations": [{"id": 1, "image_id": 1, "category_id": 0, "bbox": [100, 120, 50, 80], "area": 4000, "iscrowd": 0}],
  "categories": [{"id": 0, "name": "vessel"}]
}

Practical rule

Use generic loaders for data exploration and the lightweight helpers. Convert to COCO JSON before using the native runtime helpers with detector_spec=.... The native datamodule resolves Annotations/train_annotations.json, Annotations/val_annotations.json, Annotations/test_annotations.json, or matching annotations/instances_*.json files, supports shared and split-specific paired image/target transforms, then raises NativeDataValidationError during setup if the requested training, validation, or test split has no samples.