2. Benchmark initialization

Maps the upstream call chain, records honest baselines at every tier, picks metrics + thresholds for the concordance gate, profiles to find hotspots.

In a fresh coding-agent session from the task directory, paste:

373 lines · 17,514 chars
379 lines · 17,082 chars

Time: 20–60 minutes (depends on baseline cost; the agent runs the slow function across tiers).

What to watch for:

  • task.yaml::metrics filled in with thresholds the agent justified from upstream behavior
  • reference_outputs/<tier>/ populated for every tier
  • memory/discoveries.md has the profiling notes

Stop here and run the Auditor. Copy the initialization auditor prompt below and paste it into a fresh coding-agent session from this task directory. Continue only after the verdict is PASS or WEAK.

133 lines · 11,492 chars

Optional: reflect before you move on. While this session is still open, run the Reflection agent right here (no fresh session) to report anything confusing or missing about initialization. Recommended before you contribute.

101 lines · 6,237 chars

Next -> Auditor agent.