Speed up Seurat FindVariableFeatures: up to 14.2× faster, identical output

Benchmark charts

Switch benchmark platform; all charts update together

Speedup distribution

Each dot is one finalized dataset/thread run on Windows

log scale

gastrulation_pijuansa…

14.2×

tms_ss2

12.8×

heart_adult

12.7×

splitseq_rosenberg

10.7×

pbmc200k_glaucoma

9.10×

pbmc68k

7.22×

gastrulation_pijuansa…tms_ss2heart_adultsplitseq_rosenbergpbmc200k_glaucomapbmc68k

Thread sweep

Speedup across finalized thread counts on Windows

gastrulation_pijuan…tms_ss2heart_adultsplitseq_rosenbergpbmc200k_glaucomapbmc68k

Memory

Baseline vs optimized peak memory on Windows

baselineoptimized

What is accelerated

The public API stays the same; AutoZyme replaces only the supported fast path.

This task targets FindVariableFeatures in Seurat. The benchmarked result preserves the declared scientific output gate while reducing CPU runtime on the listed datasets.

Also searched as: HVG, highly variable genes, variable features, variable genes, feature selection, vst.

Supported scope

Fast path is correct only for the upstream-default VST selection on a v5 Assay5 with a unified "counts" layer. Read full supported scope

Fast path is correct only for the upstream-default VST selection on a v5 Assay5 with a unified "counts" layer. Concretely, the entry-point method fast_FindVariableFeatures_Seurat takes the fast route when: zyme/turbo TRUE (default) AND selection.method == "vst" AND the resolved assay inherits "Assay5" AND it has a "counts" layer (L254, L270-271). It honors loess.span (passed as span) and clip.max (passed as clip, "auto" -> NULL -> vmax=sqrt(n_cells)); nfeatures controls top-N selection. The kernel (fast_VST_dgCMatrix, L152-196) computes per-row mean/variance over the counts dgCMatrix via a parallel C++ pass, fits log-var~log-mean with stats:::simpleLoess (span, degree 2), standardizes/clips, and picks the top-nselect by standardized variance. Correctness is approximate (HVG overlap, comparator gte 0.95 hvg_jaccard), not bit-exact, because it substitutes stats:::simpleLoess for the upstream loess() call and HVG order can drift around ties / loess-boundary (manifest L157-168). This exactly matches the benchmarked call (selection.method="vst", nfeatures=2000, all other VST args default).

Out-of-scope behavior

silent fallback to upstream

Show detailed speedup table 11 runs

Dataset	Tier	Platform	Threads	Baseline	Optimized	Speedup	Memory	Concordance	Pass
`gastrulation_pijuansala`	ood_large3	Windows	4	19.98 s	1.41 s	14.2×	33.4 → 33.4 GB	—	pass
`heart_adult`	large	Windows	32	26.81 s	2.18 s	12.7×	49.6 → 47.3 GB	—	pass
`pbmc200k_glaucoma`	medium	Windows	4	9.27 s	1.03 s	9.10×	20.0 → 18.3 GB	—	pass
`pbmc68k`	small	Windows	4	2.48 s	320 ms	7.22×	4.5 → 4.2 GB	—	pass
`splitseq_rosenberg`	ood_large1	Windows	4	5.27 s	500 ms	10.7×	11.7 → 11.7 GB	—	pass
`tms_ss2`	ood_large2	Windows	32	11.40 s	890 ms	12.8×	21.0 → 19.9 GB	—	pass
`gastrulation_pijuansala`	ood_large3	macOS	14	16.98 s	1.93 s	9.48×	19.4 → 13.7 GB	—	pass
`pbmc200k_glaucoma`	medium	macOS	1	7.36 s	1.28 s	5.75×	12.0 → 9.1 GB	—	pass
`pbmc68k (inferred)`	small	macOS	14	2.67 s	398 ms	6.67×	5.3 → 4.6 GB	—	pass
`splitseq_rosenberg`	ood_large1	macOS	4	4.58 s	692 ms	6.62×	7.5 → 5.8 GB	—	pass
`tms_ss2`	ood_large2	macOS	14	8.24 s	1.08 s	7.56×	11.8 → 8.5 GB	—	pass

Frequently asked questions

Speeding up Seurat FindVariableFeatures

Why is Seurat FindVariableFeatures slow?

Seurat FindVariableFeatures is CPU-bound, and the stock implementation in Seurat leaves performance on the table in its core numerical work. On the benchmark datasets the original takes 19.98 s where the AutoZyme path takes 1.41 s (14.2× faster).

How do I make Seurat FindVariableFeatures faster?

Install AutoZyme and activate the Seurat patch, then keep using Seurat FindVariableFeatures exactly as before. AutoZyme transparently substitutes the faster, output-validated path, up to 14.2× faster on the benchmark datasets, with no pipeline or API changes.

Does the AutoZyme speedup change the Seurat FindVariableFeatures output?

No. The accelerated path returns bit-for-bit identical results to the original Seurat implementation (maximum absolute difference 0), checked by a frozen concordance gate on every benchmark dataset.

How do I install the Seurat speedup?

In R: install the autozyme package, then run library(autozyme) and autozyme::activate("seurat"). The patch applies automatically the next time you call FindVariableFeatures.