Getting started

From zero to a first RFInject workflow

This page includes the shortest practical path for local setup and first exploratory runs with burst-based SAR products stored as Zarr.

1) Understand the data model

SAR products in RFInject are organized as bursts. Each burst is a compact acquisition unit that can be inspected, sampled, visualized, and evaluated independently.

Zarr stores those bursts as hierarchical, chunked arrays with nearby metadata. That lets you inspect structure first, then download only the payload chunks you need.

2) Install dependencies

PDM

curl -sSL https://pdm-project.org/install-pdm.py | python3 -
pdm install
pdm install -G jupyter_env -G viz -G docs
pdm run python -m pip install torch

virtualenv

python3 -m pip install --user virtualenv
python3 -m virtualenv .venv
source .venv/bin/activate
python -m pip install --upgrade pip
python -m pip install -e ".[jupyter_env,viz,docs]"
python -m pip install torch

curl -LsSf https://astral.sh/uv/install.sh | sh
uv sync --extra jupyter_env --extra viz --extra docs
uv pip install --python .venv/bin/python torch

Register the matching Jupyter kernel after install so your notebook server and dependencies point to the same environment.

3) Inspect a burst group

from rfinject import (
  DEFAULT_HF_BUCKET_ID,
  access_attributes,
  explore_zarr_structure,
  get_burst_info,
  open_hf_bucket_zarr,
)

zarr_data = open_hf_bucket_zarr(
  DEFAULT_HF_BUCKET_ID,
  "s1a-iw-raw-s-hh-20240116t204634-20240116t204707-052137-064d52.zarr",
  metadata_only=True,
)
explore_zarr_structure(zarr_data, max_depth=3)
burst_info = get_burst_info(zarr_data)
attrs = access_attributes(zarr_data, "burst_0")

4) Browse or mirror the bucket

from rfinject import (
  DEFAULT_HF_BUCKET_ID,
  download_hf_bucket_path,
  list_hf_bucket_zarrs,
)

products = list_hf_bucket_zarrs(DEFAULT_HF_BUCKET_ID, limit=3)
print(products)

download_hf_bucket_path(
  DEFAULT_HF_BUCKET_ID,
  products[0],
  "/path/to/folder",
)

5) Open the notebooks

notebooks/how_to_start.ipynb
notebooks/how_to_pytorch_dataloader.ipynb
notebooks/rfi_iou_evaluation.ipynb

The PyTorch dataloader notebook now mirrors the selected train, validation, and test burst subset under ./data before iteration, then reads from that local mirror.

Start with a small sample_fraction such as 0.001 or set max_scenes_per_split for a smoke run. Even a small fraction of a large raw scene can still translate into several GiB of local data.

6) Run your first script

Store helper code in a small Python script and run with PDM to keep the environment locked.

pdm run python your_script.py

Need full API details? Go to API reference.