Python Bulk genomics & enrichment squidpy

Speed up Squidpy co_occurrence

Squidpy co_occurrence is one of the slower steps in many bulk genomics & enrichment workflows. AutoZyme ships a verified, drop-in patch that is up to 72.3× faster, returning bit-for-bit identical results with no change to how you call it.

Best speedup 72.3×
Median speedup 36.0×
Output equivalence Bit-exact
Best runtime baseline 128.16 min optimized 2.18 min
Datasets 5
Pass rate 10/10

Benchmark charts

Switch benchmark platform; all charts update together
Platform
Speedup distribution
Each dot is one finalized dataset/thread run on Windows
log scale
four_i_mouse_cortex_f…four_i_mouse_cortex_8…merfish_mouse_preopticslideseqv2_mouse_hipp…seqfish_mouse_gastrul…
Thread sweep
Speedup across finalized thread counts on Windows
25×50×14full (8)four_i_mouse_cortex_80k · ood_large4 threads · 34.0× speedup12.63 min baseline → 20.39 s optimizedmemory 0.6 GB → 0.6 GBfour_i_mouse_cortex_80k · ood_large8 threads · 36.5× speedup11.82 min baseline → 18.52 s optimizedmemory 0.6 GB → 0.6 GBmerfish_mouse_preoptic · large1 threads · 12.1× speedup7.57 min baseline → 36.76 s optimizedmemory 0.7 GB → 0.7 GBmerfish_mouse_preoptic · large4 threads · 35.4× speedup7.86 min baseline → 12.55 s optimizedmemory 0.7 GB → 0.7 GBmerfish_mouse_preoptic · large8 threads · 34.6× speedup6.79 min baseline → 12.83 s optimizedmemory 0.7 GB → 0.7 GBslideseqv2_mouse_hippocampus · medium1 threads · 9.88× speedup2.18 min baseline → 15.52 s optimizedmemory 0.8 GB → 0.8 GBslideseqv2_mouse_hippocampus · medium4 threads · 11.3× speedup2.43 min baseline → 11.73 s optimizedmemory 0.8 GB → 0.8 GBslideseqv2_mouse_hippocampus · medium8 threads · 13.2× speedup2.18 min baseline → 10.03 s optimizedmemory 0.8 GB → 0.8 GBseqfish_mouse_gastrulation · small1 threads · 4.81× speedup46.23 s baseline → 9.98 s optimizedmemory 6.2 GB → 0.6 GBseqfish_mouse_gastrulation · small4 threads · 5.58× speedup48.75 s baseline → 8.60 s optimizedmemory 6.2 GB → 0.6 GBseqfish_mouse_gastrulation · small8 threads · 4.84× speedup48.00 s baseline → 10.21 s optimizedmemory 3.4 GB → 0.6 GB
four_i_mouse_cortex…merfish_mouse_preop…slideseqv2_mouse_hi…seqfish_mouse_gastr…
Memory
Baseline vs optimized peak memory on Windows
0.0 GB5.0 GB10 GBseqfish_mouse_gas…0.10×four_i_mouse_cort…0.97×slideseqv2_mouse_…1.02×merfish_mouse_pre…0.98×four_i_mouse_cort…1.00×seqfish_mouse_gastrulation · smallmemory 6.2 GB → 0.6 GBoptimized / baseline 0.10×5.58× speedup · 4 threadsfour_i_mouse_cortex_full · ood_xlargememory 0.9 GB → 0.9 GBoptimized / baseline 0.97×72.3× speedup · 4 threadsslideseqv2_mouse_hippocampus · mediummemory 0.8 GB → 0.8 GBoptimized / baseline 1.02×13.2× speedup · 8 threadsmerfish_mouse_preoptic · largememory 0.7 GB → 0.7 GBoptimized / baseline 0.98×34.6× speedup · 8 threadsfour_i_mouse_cortex_80k · ood_largememory 0.6 GB → 0.6 GBoptimized / baseline 1.00×36.5× speedup · 8 threads
baselineoptimized

What is accelerated

The public API stays the same; AutoZyme replaces only the supported fast path.

This task targets squidpy.gr.co_occurrence in squidpy. The benchmarked result preserves the declared scientific output gate while reducing CPU runtime on the listed datasets.

Also searched as: co-occurrence, spatial statistics, neighborhood enrichment.

Supported scope

The patch replaces the inner helper squidpy.gr._ppatterns._co_occurrence_helper (register_patch targets=[("squidpy.gr._ppatterns","_co_occurrence_helper", fast_co_occurrence_helper)]), so it is invoked for EVERY co_occurrence call regardless of public-API arguments. Read full supported scope

The patch replaces the inner helper squidpy.gr._ppatterns._co_occurrence_helper (register_patch targets=[("squidpy.gr._ppatterns","_co_occurrence_helper", fast_co_occurrence_helper)]), so it is invoked for EVERY co_occurrence call regardless of public-API arguments. By binding below the public entry point it correctly supports: any cluster_key, any spatial_key, copy=True/False, any interval (int->linspace or array — upstream builds the float interval array before the helper is reached; the helper only sees interval as a numeric array and bisects it, so non-default interval sizes/values work), any n_splits (auto or explicit — the helper consumes whatever tile collection co_occurrence built), and any n_jobs/backend (squidpy.parallelize chunks idx_splits across joblib workers; the patched helper handles arbitrary sub-lists of triu pairs per chunk, and re-derives its own numba thread count at call time via auto_threads). Per-pair same_split symmetry and the divide-by-zero / zero-marginal NaN guards mirror upstream (the "if rs==0.0: continue" / "if m==0.0: continue" branches reproduce upstream's "np.sum==0 -> zeros" behavior). Two internal kernels: a fused 2D distance+bin+histogram path (spatial.shape[1]==2) and a runtime-dim nd fallback (spatial.shape[1]!=2). An all-tiles parallel-over-tiles kernel is used only when is_2d AND len(idx_splits) >= _ALL_TILES_THRESHOLD (default 2000, tunable via AUTOZYME_COOCCURRENCE_ALL_TILES env at import). Concordance verified pearson_occ=1.0 and q99_abs_diff_occ <= 2e-6 across all five tiers (small/medium/large/ood_large/ood_xlarge), thread 1/4/14, pass_rate=1.0. Tested against squidpy 1.6.5.

Out-of-scope behavior

handles all

Show detailed speedup table 10 runs
Dataset Tier Platform Threads Baseline Optimized Speedup Memory Concordance Pass
four_i_mouse_cortex_80k ood_large Windows 8 11.82 min 18.52 s 36.5× 0.6 → 0.6 GB pass
four_i_mouse_cortex_full ood_xlarge Windows 4 128.16 min 2.18 min 72.3× 0.9 → 0.9 GB pass
merfish_mouse_preoptic large Windows 4 7.86 min 12.55 s 35.4× 0.7 → 0.7 GB pass
seqfish_mouse_gastrulation small Windows 4 48.75 s 8.60 s 5.58× 6.2 → 0.6 GB pass
slideseqv2_mouse_hippocampus medium Windows 8 2.18 min 10.03 s 13.2× 0.8 → 0.8 GB pass
four_i_mouse_cortex_80k ood_large macOS 14 7.80 min 10.55 s 42.7× 2.0 → 0.7 GB pass
four_i_mouse_cortex_full ood_xlarge macOS 80.62 min 1.11 min 72.3× 2.5 → 1.0 GB pass
merfish_mouse_preoptic large macOS 8 4.97 min 7.59 s 42.1× 2.5 → 0.7 GB pass
seqfish_mouse_gastrulation small macOS 14 41.92 s 3.02 s 13.6× 13.3 → 0.7 GB pass
slideseqv2_mouse_hippocampus medium macOS 14 1.91 min 5.00 s 22.0× 2.1 → 0.9 GB pass

Frequently asked questions

Speeding up Squidpy co_occurrence
Why is Squidpy co_occurrence slow?

Squidpy co_occurrence is CPU-bound, and the stock implementation in squidpy leaves performance on the table in its core numerical work. On the benchmark datasets the original takes 128.16 min where the AutoZyme path takes 2.18 min (72.3× faster).

How do I make Squidpy co_occurrence faster?

Install AutoZyme and activate the squidpy patch, then keep using Squidpy co_occurrence exactly as before. AutoZyme transparently substitutes the faster, output-validated path, up to 72.3× faster on the benchmark datasets, with no pipeline or API changes.

Does the AutoZyme speedup change the Squidpy co_occurrence output?

No. The accelerated path returns bit-for-bit identical results to the original squidpy implementation (maximum absolute difference 0), checked by a frozen concordance gate on every benchmark dataset.

How do I install the squidpy speedup?

In Python: pip install autozyme, then import autozyme and autozyme.activate("squidpy"). The patch applies automatically the next time you call squidpy.gr.co_occurrence.