Python Scanpy methods Scanpy

Speed up Scanpy leiden

Scanpy leiden is one of the slower steps in many single-cell genomics workflows. AutoZyme ships a verified, drop-in patch that is up to 1.77× faster, returning output within a validated, bounded difference with no change to how you call it.

Best speedup 1.77×
Median speedup 1.60×
Output equivalence Bounded
Best runtime baseline 3.35 s optimized 1.89 s
Datasets 6
Pass rate 11/11

Benchmark charts

Switch benchmark platform; all charts update together
Platform
Speedup distribution
Each dot is one finalized dataset/thread run on Windows
tms_ss2splitseq_rosenbergpbmc200k_glaucomagastrulation_pijuansa…heart_adultpbmc68k
Thread sweep
Speedup across finalized thread counts on Windows
No finalized multi-thread sweep for this platform.
Memory
Baseline vs optimized peak memory on Windows
0.0 GB25 GB50 GBheart_adult0.93×gastrulation_piju…1.02×pbmc200k_glaucoma0.92×tms_ss21.03×splitseq_rosenberg0.83×pbmc68k0.71×heart_adult · largememory 21 GB → 20 GBoptimized / baseline 0.93×1.58× speedup · 1 threadsgastrulation_pijuansala · ood_large3memory 15 GB → 15 GBoptimized / baseline 1.02×1.60× speedup · 1 threadspbmc200k_glaucoma · mediummemory 8.7 GB → 8.0 GBoptimized / baseline 0.92×1.62× speedup · 1 threadstms_ss2 · ood_large2memory 8.6 GB → 8.9 GBoptimized / baseline 1.03×1.77× speedup · 1 threadssplitseq_rosenberg · ood_large1memory 5.8 GB → 4.8 GBoptimized / baseline 0.83×1.66× speedup · 1 threadspbmc68k · smallmemory 1.9 GB → 1.3 GBoptimized / baseline 0.71×1.55× speedup · 1 threads
baselineoptimized

What is accelerated

The public API stays the same; AutoZyme replaces only the supported fast path.

This task targets leiden in Scanpy. The benchmarked result preserves the declared scientific output gate while reducing CPU runtime on the listed datasets.

Also searched as: clustering, community detection, louvain, FindClusters, cluster cells, tl.leiden.

Supported scope

Fast path correctly handles the igraph-flavor Leiden case: flavor="igraph" (the default for the patch arg, _leiden.py:347), directed=False, restrict_to=None, partition_type=None. Read full supported scope

Fast path correctly handles the igraph-flavor Leiden case: flavor="igraph" (the default for the patch arg, _leiden.py:347), directed=False, restrict_to=None, partition_type=None. It honors resolution (passes through to community_leiden, _leiden.py:388-389), use_weights (default True; sets weight attr, :386-387), random_state (default 0; applied via set_igraph_random_state in both direct and fork paths, :278/:303), key_added, copy, adjacency override, neighbors_key/obsp graph selection (_choose_graph, :379-380), and objective_function via clustering_args (defaults to "modularity", :390). It builds a deduplicated upper-triangle simple graph with 2x weights (vs upstream's multi-edge graph) and runs igraph community_leiden in a forked child on Unix / directly on Windows; output partition is similar but not bit-identical to upstream (ARI ~0.90-0.98, benchmark threshold ari>=0.90). Effectively reproduces upstream's flavor="igraph", n_iterations=2 result, which is exactly the benchmarked configuration.

Out-of-scope behavior

silent fallback to upstream

Show detailed speedup table 11 runs
Dataset Tier Platform Threads Baseline Optimized Speedup Memory Concordance Pass
gastrulation_pijuansala ood_large3 Windows 1 4.33 s 2.71 s 1.60× 14.8 → 15.0 GB pass
heart_adult large Windows 1 18.37 s 11.64 s 1.58× 21.1 → 19.7 GB pass
pbmc200k_glaucoma medium Windows 1 7.75 s 4.77 s 1.62× 8.7 → 8.0 GB pass
pbmc68k small Windows 1 1.39 s 896 ms 1.55× 1.9 → 1.3 GB pass
splitseq_rosenberg ood_large1 Windows 1 5.08 s 3.07 s 1.66× 5.8 → 4.8 GB pass
tms_ss2 ood_large2 Windows 1 3.35 s 1.89 s 1.77× 8.6 → 8.9 GB pass
gastrulation_pijuansala ood_large3 macOS 1 2.63 s 1.63 s 1.62× 13.4 → 13.7 GB pass
pbmc200k_glaucoma medium macOS 1 5.09 s 3.35 s 1.52× 11.8 → 10.3 GB pass
pbmc68k small macOS 1 1.20 s 728 ms 1.65× 3.7 → 2.3 GB pass
splitseq_rosenberg ood_large1 macOS 1 3.69 s 2.30 s 1.60× 9.9 → 7.3 GB pass
tms_ss2 ood_large2 macOS 1 1.96 s 1.24 s 1.57× 9.7 → 9.0 GB pass

Frequently asked questions

Speeding up Scanpy leiden
Why is Scanpy leiden slow?

Scanpy leiden is CPU-bound, and the stock implementation in Scanpy leaves performance on the table in its core numerical work. On the benchmark datasets the original takes 3.35 s where the AutoZyme path takes 1.89 s (1.77× faster).

How do I make Scanpy leiden faster?

Install AutoZyme and activate the Scanpy patch, then keep using Scanpy leiden exactly as before. AutoZyme transparently substitutes the faster, output-validated path, up to 1.77× faster on the benchmark datasets, with no pipeline or API changes.

Does the AutoZyme speedup change the Scanpy leiden output?

Differences are small and bounded: concordance-validated to within roughly 1.5 to 5% of the original Scanpy result on every benchmark dataset, inside a frozen gate.

How do I install the Scanpy speedup?

In Python: pip install autozyme, then import autozyme and autozyme.activate("scanpy"). The patch applies automatically the next time you call leiden.