Speed up Scanpy highly_variable_genes: up to 19.6× faster, identical output

Benchmark charts

Switch benchmark platform; all charts update together

Speedup distribution

Each dot is one finalized dataset/thread run on Windows

log scale

pbmc68k

19.6×

pbmc200k_glaucoma

10.3×

heart_adult

9.05×

tms_ss2

8.24×

gastrulation_pijuansa…

7.53×

splitseq_rosenberg

7.42×

pbmc68kpbmc200k_glaucomaheart_adulttms_ss2gastrulation_pijuansa…splitseq_rosenberg

Thread sweep

Speedup across finalized thread counts on Windows

pbmc68kpbmc200k_glaucomaheart_adulttms_ss2gastrulation_pijuan…splitseq_rosenberg

Memory

Baseline vs optimized peak memory on Windows

baselineoptimized

What is accelerated

The public API stays the same; AutoZyme replaces only the supported fast path.

This task targets highly_variable_genes in Scanpy. The benchmarked result preserves the declared scientific output gate while reducing CPU runtime on the listed datasets.

Also searched as: HVG, highly variable genes, variable genes, feature selection, FindVariableFeatures, pp.highly_variable_genes, seurat_v3.

Supported scope

The dispatcher _patched_hvg routes by flavor and batch_key. The BENCHMARKED config (flavor="seurat", batch_key=None, sparse CSR log1p-normalized input, numba available) is handled by _fast_hvg_seurat. Read full supported scope

The dispatcher _patched_hvg routes by flavor and batch_key. The BENCHMARKED config (flavor="seurat", batch_key=None, sparse CSR log1p-normalized input, numba available) is handled by _fast_hvg_seurat. That fast path correctly supports: flavor="seurat" only; sparse input (auto-converted to CSR float32); both selection modes — n_top_genes (argpartition top-N on normalized dispersion) AND the cutoff mode with min_mean/max_mean/min_disp/max_disp (all four honored, lines 296-299); n_bins (honored, passed into kernel); layer= (reads adata.layers[layer]); subset= and inplace= (both honored, lines 304-321); it stores log1p(mean) for means to match upstream scanpy seurat contract (line 302). Separately, flavor in {seurat_v3, seurat_v3_paper} WITH batch_key set routes to _fast_hvg_seurat_v3_batch (a heavily guarded CSR-raw-counts batch path), but that is NOT the benchmarked path. The eval metric is hvg_jaccard>=0.95 (set overlap of selected genes), tolerant of small numeric drift.

Out-of-scope behavior

silent fallback to upstream

Show detailed speedup table 12 runs

Dataset	Tier	Platform	Threads	Baseline	Optimized	Speedup	Memory	Concordance	Pass
`gastrulation_pijuansala`	ood_large2	Windows	32	11.75 s	1.50 s	7.53×	18.7 → 18.7 GB	—	pass
`heart_adult`	large	Windows	32	5.16 s	570 ms	9.05×	23.3 → 19.3 GB	—	pass
`pbmc200k_glaucoma`	medium	Windows	32	2.21 s	216 ms	10.3×	8.9 → 7.6 GB	—	pass
`pbmc68k`	small	Windows	4	283 ms	15 ms	19.6×	0.6 → 1.0 GB	—	pass
`splitseq_rosenberg`	ood_large1	Windows	32	884 ms	120 ms	7.42×	4.9 → 4.3 GB	—	pass
`tms_ss2`	ood_large2	Windows	32	2.78 s	329 ms	8.24×	10.6 → 8.9 GB	—	pass
`gastrulation_pijuansala`	ood_large2	macOS	14	7.63 s	896 ms	8.52×	10.2 → 9.7 GB	—	pass
`heart_adult`	large	macOS	14	7.95 s	1.25 s	6.26×	14.5 → 16.2 GB	—	pass
`pbmc200k_glaucoma`	medium	macOS	14	1.46 s	205 ms	7.45×	10.2 → 10.3 GB	—	pass
`pbmc68k`	small	macOS	14	58 ms	13 ms	6.46×	1.0 → 1.0 GB	—	pass
`splitseq_rosenberg`	ood_large1	macOS	14	1.53 s	178 ms	8.61×	4.4 → 3.5 GB	—	pass
`tms_ss2`	small	macOS	4	6.70 s	757 ms	8.85×	11.1 → 11.1 GB	—	pass

Frequently asked questions

Speeding up Scanpy highly_variable_genes

Why is Scanpy highly_variable_genes slow?

Scanpy highly_variable_genes is CPU-bound, and the stock implementation in Scanpy leaves performance on the table in its core numerical work. On the benchmark datasets the original takes 283 ms where the AutoZyme path takes 15 ms (19.6× faster).

How do I make Scanpy highly_variable_genes faster?

Install AutoZyme and activate the Scanpy patch, then keep using Scanpy highly_variable_genes exactly as before. AutoZyme transparently substitutes the faster, output-validated path, up to 19.6× faster on the benchmark datasets, with no pipeline or API changes.

Does the AutoZyme speedup change the Scanpy highly_variable_genes output?

No. The accelerated path returns bit-for-bit identical results to the original Scanpy implementation (maximum absolute difference 0), checked by a frozen concordance gate on every benchmark dataset.

How do I install the Scanpy speedup?

In Python: pip install autozyme, then import autozyme and autozyme.activate("scanpy"). The patch applies automatically the next time you call highly_variable_genes.