Speed up Scanpy highly_variable_genes (v3 batched): up to 9.31× faster, identical output

Benchmark charts

Switch benchmark platform; all charts update together

Speedup distribution

Each dot is one finalized dataset/thread run on Windows

log scale

pbmc68k

19.6×

pbmc200k_glaucoma

10.3×

heart_adult

9.05×

tms_ss2

8.24×

gastrulation_pijuansa…

7.53×

splitseq_rosenberg

7.42×

pbmc68kpbmc200k_glaucomaheart_adulttms_ss2gastrulation_pijuansa…splitseq_rosenberg

Thread sweep

Speedup across finalized thread counts on Windows

pbmc68kpbmc200k_glaucomaheart_adulttms_ss2gastrulation_pijuan…splitseq_rosenberg

Memory

Baseline vs optimized peak memory on Windows

baselineoptimized

What is accelerated

The public API stays the same; AutoZyme replaces only the supported fast path.

This task targets highly_variable_genes (v3 · batched) in Scanpy. The benchmarked result preserves the declared scientific output gate while reducing CPU runtime on the listed datasets.

Also searched as: HVG, highly variable genes, batched HVG, batch HVG, seurat_v3 batch, batched feature selection.

Supported scope

The v3-batch fast path (_fast_hvg_seurat_v3_batch) correctly handles: flavor in {"seurat_v3","seurat_v3_paper"} WITH a non-None batch_key, on a CSR-sparse raw-count matrix (adata.X or a named layer), with n_top_genes a concrete int (not None), subset=False,… Read full supported scope

The v3-batch fast path (_fast_hvg_seurat_v3_batch) correctly handles: flavor in {"seurat_v3","seurat_v3_paper"} WITH a non-None batch_key, on a CSR-sparse raw-count matrix (adata.X or a named layer), with n_top_genes a concrete int (not None), subset=False, inplace=True, and no extra/unknown kwargs. It honors span (passed into loess) and check_values (drives the non-integer warning). It supports multiple batches: per-batch loess fit, per-batch clipped variance, median-rank aggregation, and the seurat_v3 vs seurat_v3_paper lexsort tiebreak ordering (lines 519-522). It writes highly_variable, highly_variable_rank, means (overall), variances (overall), variances_norm, highly_variable_nbatches to adata.var and uns["hvg"]. Numba (parallel prange) must be importable and skmisc.loess must be importable. Requires >=2 obs, >=1 var, all batch sizes >=2, valid (non-negative) batch codes, and >=2 non-constant genes per batch. Anything outside this is delegated verbatim to the captured upstream original (__autozyme_original__). A separate flavor="seurat" non-batch path (_fast_hvg_seurat) also exists but is not the benchmarked target here.

Out-of-scope behavior

silent fallback to upstream

Show detailed speedup table 12 runs

Dataset	Tier	Platform	Threads	Baseline	Optimized	Speedup	Memory	Concordance	Pass
`gastrulation_pijuansala`	ood_large2	Windows	32	11.75 s	1.50 s	7.53×	18.7 → 18.7 GB	—	pass
`heart_adult`	large	Windows	32	5.16 s	570 ms	9.05×	23.3 → 19.3 GB	—	pass
`pbmc200k_glaucoma`	medium	Windows	32	2.21 s	216 ms	10.3×	8.9 → 7.6 GB	—	pass
`pbmc68k`	small	Windows	4	283 ms	15 ms	19.6×	0.6 → 1.0 GB	—	pass
`splitseq_rosenberg`	ood_large1	Windows	32	884 ms	120 ms	7.42×	4.9 → 4.3 GB	—	pass
`tms_ss2`	ood_large2	Windows	32	2.78 s	329 ms	8.24×	10.6 → 8.9 GB	—	pass
`gastrulation_pijuansala`	ood_large2	macOS	14	7.63 s	896 ms	8.52×	10.2 → 9.7 GB	—	pass
`heart_adult`	large	macOS	14	7.95 s	1.25 s	6.26×	14.5 → 16.2 GB	—	pass
`pbmc200k_glaucoma`	medium	macOS	14	1.46 s	205 ms	7.45×	10.2 → 10.3 GB	—	pass
`pbmc68k`	small	macOS	14	58 ms	13 ms	6.46×	1.0 → 1.0 GB	—	pass
`splitseq_rosenberg`	ood_large1	macOS	14	1.53 s	178 ms	8.61×	4.4 → 3.5 GB	—	pass
`tms_ss2`	small	macOS	4	6.70 s	757 ms	8.85×	11.1 → 11.1 GB	—	pass

Frequently asked questions

Speeding up Scanpy highly_variable_genes (v3 batched)

Why is Scanpy highly_variable_genes (v3 batched) slow?

Scanpy highly_variable_genes (v3 batched) is CPU-bound, and the stock implementation in Scanpy leaves performance on the table in its core numerical work. On the benchmark datasets the original takes 5.05 s where the AutoZyme path takes 543 ms (9.31× faster).

How do I make Scanpy highly_variable_genes (v3 batched) faster?

Install AutoZyme and activate the Scanpy patch, then keep using Scanpy highly_variable_genes (v3 batched) exactly as before. AutoZyme transparently substitutes the faster, output-validated path, up to 9.31× faster on the benchmark datasets, with no pipeline or API changes.

Does the AutoZyme speedup change the Scanpy highly_variable_genes (v3 batched) output?

No. The accelerated path returns bit-for-bit identical results to the original Scanpy implementation (maximum absolute difference 0), checked by a frozen concordance gate on every benchmark dataset.

How do I install the Scanpy speedup?

In Python: pip install autozyme, then import autozyme and autozyme.activate("scanpy"). The patch applies automatically the next time you call highly_variable_genes (v3 batched).