Samples and provenance

heron models production inputs as samples built from art ROOT files. Each sample tracks provenance, POT totals, and a normalisation factor to ensure consistent scaling across analyses.

From FermiGrid production to SampleIO

FermiGrid production delivers art ROOT files. heron first scans them to record run/subrun metadata, then aggregates the resulting provenance outputs into a SampleIO file.

# Register art provenance for a production file list.
heron art nue_run1:data/run1_nue.list

# Build a list of art provenance outputs from the previous step.
ls scratch/out/template/art/art_prov_nue_run1*.root > scratch/out/template/lists/nue_run1.txt

# Build the SampleIO file and update samples.tsv.
heron sample nue_run1:scratch/out/template/lists/nue_run1.txt

The sample step reads the beam database, sums POT, and writes a SampleIO ROOT file that is referenced by samples.tsv.

Sample list format

Sample lists live in scratch/out/<set>/sample/samples.tsv by default. They are TSV files with a header row and per-sample lines such as:

# sample_name\tsample_origin\tbeam_mode\toutput_path
nue_run1\tdata\tbeam\tscratch/out/template/sample/sample_root_nue_run1.root

Applications that build event outputs read this list to locate each sample ROOT file and its metadata. The <set> segment defaults to out and is controlled by HERON_SET or heron --set.

Normalisation inputs

SampleIO stores:

The list of input files and their provenance.
POT totals from the art provenance scan.
Beam database totals (for example tortgt and tor101).
Derived normalisation factors used when constructing event weights.

This ensures consistent scaling when multiple samples are combined in an analysis.