Benchmark charts
Speedup distribution
Each dot is one finalized dataset/thread run on WindowsThread sweep
Speedup across finalized thread counts on WindowsMemory
Baseline vs optimized peak memory on WindowsWhat is accelerated
This task targets squidpy.gr.co_occurrence in squidpy. The benchmarked result
preserves the declared scientific output gate while reducing CPU runtime on the listed datasets.
Also searched as: co-occurrence, spatial statistics, neighborhood enrichment.
Supported scope
The patch replaces the inner helper squidpy.gr._ppatterns._co_occurrence_helper (register_patch targets=[("squidpy.gr._ppatterns","_co_occurrence_helper", fast_co_occurrence_helper)]), so it is invoked for EVERY co_occurrence call regardless of public-API arguments. Read full supported scope
The patch replaces the inner helper squidpy.gr._ppatterns._co_occurrence_helper (register_patch targets=[("squidpy.gr._ppatterns","_co_occurrence_helper", fast_co_occurrence_helper)]), so it is invoked for EVERY co_occurrence call regardless of public-API arguments. By binding below the public entry point it correctly supports: any cluster_key, any spatial_key, copy=True/False, any interval (int->linspace or array — upstream builds the float interval array before the helper is reached; the helper only sees interval as a numeric array and bisects it, so non-default interval sizes/values work), any n_splits (auto or explicit — the helper consumes whatever tile collection co_occurrence built), and any n_jobs/backend (squidpy.parallelize chunks idx_splits across joblib workers; the patched helper handles arbitrary sub-lists of triu pairs per chunk, and re-derives its own numba thread count at call time via auto_threads). Per-pair same_split symmetry and the divide-by-zero / zero-marginal NaN guards mirror upstream (the "if rs==0.0: continue" / "if m==0.0: continue" branches reproduce upstream's "np.sum==0 -> zeros" behavior). Two internal kernels: a fused 2D distance+bin+histogram path (spatial.shape[1]==2) and a runtime-dim nd fallback (spatial.shape[1]!=2). An all-tiles parallel-over-tiles kernel is used only when is_2d AND len(idx_splits) >= _ALL_TILES_THRESHOLD (default 2000, tunable via AUTOZYME_COOCCURRENCE_ALL_TILES env at import). Concordance verified pearson_occ=1.0 and q99_abs_diff_occ <= 2e-6 across all five tiers (small/medium/large/ood_large/ood_xlarge), thread 1/4/14, pass_rate=1.0. Tested against squidpy 1.6.5.
Out-of-scope behavior
handles all
Show detailed speedup table 10 runs
| Dataset | Tier | Platform | Threads | Baseline | Optimized | Speedup | Memory | Concordance | Pass |
|---|---|---|---|---|---|---|---|---|---|
four_i_mouse_cortex_80k | ood_large | Windows | 8 | 11.82 min | 18.52 s | 36.5× | 0.6 → 0.6 GB | — | pass |
four_i_mouse_cortex_full | ood_xlarge | Windows | 4 | 128.16 min | 2.18 min | 72.3× | 0.9 → 0.9 GB | — | pass |
merfish_mouse_preoptic | large | Windows | 4 | 7.86 min | 12.55 s | 35.4× | 0.7 → 0.7 GB | — | pass |
seqfish_mouse_gastrulation | small | Windows | 4 | 48.75 s | 8.60 s | 5.58× | 6.2 → 0.6 GB | — | pass |
slideseqv2_mouse_hippocampus | medium | Windows | 8 | 2.18 min | 10.03 s | 13.2× | 0.8 → 0.8 GB | — | pass |
four_i_mouse_cortex_80k | ood_large | macOS | 14 | 7.80 min | 10.55 s | 42.7× | 2.0 → 0.7 GB | — | pass |
four_i_mouse_cortex_full | ood_xlarge | macOS | — | 80.62 min | 1.11 min | 72.3× | 2.5 → 1.0 GB | — | pass |
merfish_mouse_preoptic | large | macOS | 8 | 4.97 min | 7.59 s | 42.1× | 2.5 → 0.7 GB | — | pass |
seqfish_mouse_gastrulation | small | macOS | 14 | 41.92 s | 3.02 s | 13.6× | 13.3 → 0.7 GB | — | pass |
slideseqv2_mouse_hippocampus | medium | macOS | 14 | 1.91 min | 5.00 s | 22.0× | 2.1 → 0.9 GB | — | pass |
Frequently asked questions
Why is Squidpy co_occurrence slow?
Squidpy co_occurrence is CPU-bound, and the stock implementation in squidpy leaves performance on the table in its core numerical work. On the benchmark datasets the original takes 128.16 min where the AutoZyme path takes 2.18 min (72.3× faster).
How do I make Squidpy co_occurrence faster?
Install AutoZyme and activate the squidpy patch, then keep using Squidpy co_occurrence exactly as before. AutoZyme transparently substitutes the faster, output-validated path, up to 72.3× faster on the benchmark datasets, with no pipeline or API changes.
Does the AutoZyme speedup change the Squidpy co_occurrence output?
No. The accelerated path returns bit-for-bit identical results to the original squidpy implementation (maximum absolute difference 0), checked by a frozen concordance gate on every benchmark dataset.
How do I install the squidpy speedup?
In Python: pip install autozyme, then import autozyme and autozyme.activate("squidpy"). The patch applies automatically the next time you call squidpy.gr.co_occurrence.