Getting started
From zero to a first RFInject workflow
This page includes the shortest practical path for local setup and first exploratory runs with burst-based SAR products stored as Zarr.
1) Understand the data model
SAR products in RFInject are organized as bursts. Each burst is a compact acquisition unit that can be inspected, sampled, visualized, and evaluated independently.
Zarr stores those bursts as hierarchical, chunked arrays with nearby metadata. That lets you inspect structure first, then download only the payload chunks you need.
2) Install dependencies
PDM
curl -sSL https://pdm-project.org/install-pdm.py | python3 -
pdm install
pdm install -G jupyter_env -G viz -G docs
pdm run python -m pip install torch
virtualenv
python3 -m pip install --user virtualenv
python3 -m virtualenv .venv
source .venv/bin/activate
python -m pip install --upgrade pip
python -m pip install -e ".[jupyter_env,viz,docs]"
python -m pip install torch
uv
curl -LsSf https://astral.sh/uv/install.sh | sh
uv sync --extra jupyter_env --extra viz --extra docs
uv pip install --python .venv/bin/python torch
Register the matching Jupyter kernel after install so your notebook server and dependencies point to the same environment.
3) Inspect a burst group
from rfinject import (
DEFAULT_HF_BUCKET_ID,
access_attributes,
explore_zarr_structure,
get_burst_info,
open_hf_bucket_zarr,
)
zarr_data = open_hf_bucket_zarr(
DEFAULT_HF_BUCKET_ID,
"s1a-iw-raw-s-hh-20240116t204634-20240116t204707-052137-064d52.zarr",
metadata_only=True,
)
explore_zarr_structure(zarr_data, max_depth=3)
burst_info = get_burst_info(zarr_data)
attrs = access_attributes(zarr_data, "burst_0")
4) Browse or mirror the bucket
from rfinject import (
DEFAULT_HF_BUCKET_ID,
download_hf_bucket_path,
list_hf_bucket_zarrs,
)
products = list_hf_bucket_zarrs(DEFAULT_HF_BUCKET_ID, limit=3)
print(products)
download_hf_bucket_path(
DEFAULT_HF_BUCKET_ID,
products[0],
"/path/to/folder",
)
5) Open the notebooks
notebooks/how_to_start.ipynb
notebooks/how_to_pytorch_dataloader.ipynb
notebooks/rfi_iou_evaluation.ipynb
The PyTorch dataloader notebook now mirrors the selected train, validation, and test burst subset under
./data before iteration, then reads from that local mirror.
Start with a small sample_fraction such as 0.001 or set
max_scenes_per_split for a smoke run. Even a small fraction of a large raw scene can still
translate into several GiB of local data.
6) Run your first script
Store helper code in a small Python script and run with PDM to keep the environment locked.
pdm run python your_script.py
Need full API details? Go to API reference.