Samples and provenance

heron models production inputs as samples built from art ROOT files. Each sample tracks provenance, POT totals, and a normalisation factor to ensure consistent scaling across analyses.

From FermiGrid production to SampleIO

FermiGrid production delivers art ROOT files. heron first scans them to record run/subrun metadata, then aggregates the resulting provenance outputs into a SampleIO file.

# Register art provenance for a production file list.
heron art nue_run1:data/run1_nue.list

# Build a list of art provenance outputs from the previous step.
ls scratch/out/template/art/art_prov_nue_run1*.root > scratch/out/template/lists/nue_run1.txt

# Build the SampleIO file and update samples.tsv.
heron sample nue_run1:scratch/out/template/lists/nue_run1.txt

The sample step reads the beam database, sums POT, and writes a SampleIO ROOT file that is referenced by samples.tsv.

Sample list format

Sample lists live in scratch/out/<set>/sample/samples.tsv by default. They are TSV files with a header row and per-sample lines such as:

# sample_name\tsample_origin\tbeam_mode\toutput_path
nue_run1\tdata\tbeam\tscratch/out/template/sample/sample_root_nue_run1.root

Applications that build event outputs read this list to locate each sample ROOT file and its metadata. The <set> segment defaults to out and is controlled by HERON_SET or heron --set.

Normalisation inputs

SampleIO stores:

  • The list of input files and their provenance.

  • POT totals from the art provenance scan.

  • Beam database totals (for example tortgt and tor101).

  • Derived normalisation factors used when constructing event weights.

This ensures consistent scaling when multiple samples are combined in an analysis.