4. Validation

Stress-tests the converged patch on held-out OOD tiers and non-default thread counts: conditions the optimization agent never saw. The release gate before packaging or shipping.

In a fresh coding-agent session from the task directory, paste:

369 lines · 44,806 chars
369 lines · 44,874 chars

Time: 30–90 minutes (depends on OOD tier size).

What to watch for:

  • verify.tsv: per-(tier, thread, replicate) verdicts: PASS / SOFT / HARD
  • memory/discoveries.md has the scaling assessment

If a regression appears, the agent loops back into a constrained 30-round optimization to fix the specific failure (e.g. “+20% slower at 8 threads on ood_xlarge”). You don’t drive that loop; the agent handles it.

For local use, you’re done after this. Your pipeline/run.{py,R} is the fast version. Call it from your own scripts.

Want to share it?Contributing (one paste runs packaging + opens the PR).