API reference

Core modules and helpers

Focus on practical entry points you will call in scripts and notebooks.

rfinject.utils

parse_hf_bucket_reference(bucket)

Normalizes Hugging Face bucket references to owner/name format.

bucket: Bucket ID, prefixed path, or public bucket URL.

Returns: normalized bucket ID string.

get_hf_bucket_info(bucket=DEFAULT_HF_BUCKET_ID)

Fetches high-level bucket metadata such as size, visibility, and total file count.

bucket: Bucket ID or URL. Defaults to the RFInject L0 bucket.

Returns: Hugging Face bucket info object.

list_hf_bucket_zarrs(bucket=DEFAULT_HF_BUCKET_ID, limit=None)

Lists top-level .zarr products available in a bucket without walking every file recursively.

bucket: Bucket ID or URL.
limit: Optional maximum number of product paths to return.

Returns: list of bucket-relative Zarr paths.

download_hf_bucket_path(bucket, remote_path, local_dir)

Mirrors a single file or a recursive prefix from a bucket to the local filesystem.

bucket: Bucket ID or URL.
remote_path: File path or directory-like prefix inside the bucket.
local_dir: Destination root for the mirrored files.

Returns: list of local file paths downloaded for the request.

open_hf_bucket_zarr(bucket, zarr_path, metadata_only=True)

Downloads the required Zarr files from a bucket to a local mirror and opens the result with Zarr.

bucket: Bucket ID or URL.
zarr_path: Bucket-relative path to the product root ending in .zarr.
metadata_only: If true, syncs only zarr.json files for fast hierarchy inspection.

Returns: opened zarr.Group backed by the local mirror.

explore_zarr_structure(zarr_group, max_depth=3)

Walks a Zarr hierarchy recursively and prints each subgroup and array with shape/type metadata.

zarr_group: Root zarr.Group to inspect. Must be an in-memory Zarr group object.
max_depth: Maximum recursion depth for printing nested groups. Lower values keep output short, higher values show deeper structure.

Returns: None (side-effect is console output).

access_array_data(zarr_group, burst_name, array_name)

Reads one data array from a burst folder, with explicit burst and array key selection.

zarr_group: Zarr group that contains burst children such as burst_0.
burst_name: Name of the burst group (for example burst_0). The function fails if this key is missing.
array_name: Array name inside the burst, such as echo, rfi, or echo_w_rfi.

Returns: zarr.Array.

get_array_slice(array, slice_params=None)

Extracts a NumPy slice from a Zarr array for a quick visual or numeric inspection.

array: Input Zarr array to slice.
slice_params: Optional tuple of Python slices or indices. If omitted, the function returns:
- first 10x10 block for 2D arrays
- first 10x10 block and first plane for 3D arrays
- all dims sliced as slice(0, 10) for higher dimensions

Returns: np.ndarray containing the selected window.

get_burst_info(zarr_group)

Collects per-burst statistics (shape, dtype, chunking, and estimated memory footprint).

zarr_group: Root group that contains one or more burst_* children.

Returns: Dict[str, Dict[str, Any]] keyed by burst name.

access_attributes(zarr_item, path=None)

Reads metadata attributes from a target object in the hierarchy, optionally navigating to a child path first.

zarr_item: Root zarr.Group (or array-like) from which navigation starts.
path: Optional slash-separated path (e.g. burst_0/echo). If omitted, reads only the starting item.

Returns: a plain dict of attributes; empty dictionary if none exist.

explore_all_attributes(zarr_group)

Traverses root, burst, and burst-array nodes and extracts all available attributes in a single pass.

zarr_group: Root Zarr group containing SAR burst groups and arrays.

Returns: nested dictionary keyed by element path (for example root, burst_0, burst_0/echo).

rfinject.viz

plot_complex_array(array, title='Complex Array Visualization', figsize=(15,5))

Creates a 3-panel figure: magnitude, phase, and real-part views for a complex-valued array.

array: Complex np.ndarray or coercible array-like input.
title: Figure title shown above the three plots.
figsize: Tuple (width, height) in inches passed to Matplotlib.

Returns: None (renders and displays the plot).

plot_magnitude(array, title='Magnitude', ...)

Draws one magnitude plot with normalization, optional dB conversion, color-range control, and optional file export.

array: Complex input array.
title: Panel title text.
figsize: Figure size in inches, as a tuple.
normalize: If true, scales magnitude by its maximum value before display.
db_scale: If true, converts to dB using 20*log10() and changes colorbar label.
vmin: Optional minimum color limit. Useful for clipping low values.
vmax: Optional maximum color limit. Useful for clipping high values.
savefig: Optional output path. If set, saves the figure at dpi=300.

Returns: None (renders, and optionally saves the figure).

rfinject.trainer

train_model(...)

End-to-end training entry point that builds default data module/callbacks and runs fit + test.

model: Optional pl.LightningModule. If omitted, the trainer creates a default MyModel.
data_module: Optional pl.LightningDataModule. If omitted, a default one is created with ./data.
max_epochs: Maximum training epochs passed directly to pl.Trainer.
accelerator: Hardware backend string passed to Lightning (commonly cpu or gpu).
devices: Number of accelerator devices to allocate.
log_dir: Root directory for the TensorBoard logger.
checkpoint_dir: Directory where ModelCheckpoint files are stored.

Returns: the trained model instance.

run_inference(model_checkpoint_path, data_module)

Loads a trained checkpoint and runs prediction with a provided datamodule.

model_checkpoint_path: Filesystem path to a saved .ckpt checkpoint.
data_module: Initialized data module exposing prediction dataloader(s).

Returns: predictions returned by trainer.predict().

setup_callbacks(checkpoint_dir='./checkpoints')

Builds standard Lightning callbacks for checkpointing, early stopping, and learning-rate logging.

checkpoint_dir: Directory passed to ModelCheckpoint.

Returns: callback list used by train_model().

Example call

from rfinject import (
  DEFAULT_HF_BUCKET_ID,
  access_attributes,
  get_burst_info,
  list_hf_bucket_zarrs,
  open_hf_bucket_zarr,
)

products = list_hf_bucket_zarrs(DEFAULT_HF_BUCKET_ID, limit=1)
zarr_data = open_hf_bucket_zarr(DEFAULT_HF_BUCKET_ID, products[0], metadata_only=True)
burst_info = get_burst_info(zarr_data)
metadata = access_attributes(zarr_data, "burst_0/echo")
print(products[0], len(burst_info), list(metadata.keys())[:5])