Building a patch

Pick a slow upstream function. Paste one prompt per phase into a coding agent (Claude Code, Codex, …).

The agents handle the work: benchmarking, profiling, optimization rounds, and OOD validation.

Model matters more than the agent UI. Claude Code, Cursor, Codex, etc. all work: use a model at least Claude Opus–class (e.g. Opus with max reasoning, or GPT-5.5).

The pipeline

Auditor Auditor

Agent 1

Task setup

Confirm target. Scaffold task.

Agent 2

Initialization

Benchmark + gate + baselines.

Agent 3

Optimization

50-round optimization loop.

Agent 4

Validation

OOD + threading sweep.

Agent 5

Packaging

Lift into library and open the PR.

Reflectionoptional, recommended after each step before contributing

Main agentAuditor gate (after Initialization & Optimization)Contribute step

Two checks sit alongside the main agents. The Auditor runs after Initialization and again after Optimization: a fresh-session gate that inspects the setup, then the converged patch, for benchmark-specific shortcuts, and returns a verdict before you continue. Reflection is optional. Run it after any step to report friction; it is recommended before you contribute a patch upstream, since that feedback is what sharpens the prompts and checks for the next run.

Agent	What you do	Time
1. Task setup	Copy prompt → paste → answer scaffolding questions	~2 min
2. Benchmark initialization	Copy prompt → paste → wait	20–60 min
3. Optimization loop	Copy prompt → paste → wait	hours
4. Validation	Copy prompt → paste → wait	30–90 min
5. Packaging (contribute only)	Use the Contributing flow: one paste lifts the patch, runs `zyme attest` across tiers, and opens the PR	1–3 hrs

Each agent runs in a fresh coding-agent session. The fresh-session rule is the safety design: no single agent context defines success, optimizes toward it, AND validates the result. You can interrupt any agent mid-run; commit-per-round means the next session picks up where it stopped.

Two domain prompt sets

Each phase page shows two copy buttons:

Bio: single-cell, genomics, Bioconductor, scanpy, Seurat.
OtherField: astronomy, climate, chemistry, physics, statistics, anything else.

Same agent structure; the prompts differ in domain-specific intuitions.

Optional: dispatch mode

Don’t want to paste a prompt per phase? Dispatch mode is hands-off: you paste a single Manager prompt, tell it the repo and function, walk away. The Manager runs every phase on your behalf.

Building a patch

The pipeline

Two domain prompt sets

Optional: dispatch mode

Read next