Building a patch

Pick a slow upstream function. Paste one prompt per phase into a coding agent (Claude Code, Codex, …).

The agents handle the work: benchmarking, profiling, optimization rounds, and OOD validation.

Model matters more than the agent UI. Claude Code, Cursor, Codex, etc. all work: use a model at least Claude Opus–class (e.g. Opus with max reasoning, or GPT-5.5).

The pipeline

AuditorAuditor

Agent 1

Task setup

Confirm target. Scaffold task.

Agent 2

Initialization

Benchmark + gate + baselines.

Agent 3

Optimization

50-round optimization loop.

Agent 4

Validation

OOD + threading sweep.

Agent 5

Packaging

Lift into library and open the PR.

Reflectionoptional, recommended after each step before contributing
Main agentAuditor gate (after Initialization & Optimization)Contribute step

Two checks sit alongside the main agents. The Auditor runs after Initialization and again after Optimization: a fresh-session gate that inspects the setup, then the converged patch, for benchmark-specific shortcuts, and returns a verdict before you continue. Reflection is optional. Run it after any step to report friction; it is recommended before you contribute a patch upstream, since that feedback is what sharpens the prompts and checks for the next run.

AgentWhat you doTime
1. Task setupCopy prompt → paste → answer scaffolding questions~2 min
2. Benchmark initializationCopy prompt → paste → wait20–60 min
3. Optimization loopCopy prompt → paste → waithours
4. ValidationCopy prompt → paste → wait30–90 min
5. Packaging (contribute only)Use the Contributing flow: one paste lifts the patch, runs zyme attest across tiers, and opens the PR1–3 hrs

Each agent runs in a fresh coding-agent session. The fresh-session rule is the safety design: no single agent context defines success, optimizes toward it, AND validates the result. You can interrupt any agent mid-run; commit-per-round means the next session picks up where it stopped.

Two domain prompt sets

Each phase page shows two copy buttons:

  • Bio: single-cell, genomics, Bioconductor, scanpy, Seurat.
  • OtherField: astronomy, climate, chemistry, physics, statistics, anything else.

Same agent structure; the prompts differ in domain-specific intuitions.

Optional: dispatch mode

Don’t want to paste a prompt per phase? Dispatch mode is hands-off: you paste a single Manager prompt, tell it the repo and function, walk away. The Manager runs every phase on your behalf.