R Seurat methods Seurat

Speed up Seurat FindVariableFeatures

Seurat FindVariableFeatures is one of the slower steps in many single-cell genomics workflows. AutoZyme ships a verified, drop-in patch that is up to 14.2× faster, returning bit-for-bit identical results with no change to how you call it.

Best speedup 14.2×
Median speedup 9.10×
Output equivalence Bit-exact
Best runtime baseline 19.98 s optimized 1.41 s
Datasets 7
Pass rate 11/11

Benchmark charts

Switch benchmark platform; all charts update together
Platform
Speedup distribution
Each dot is one finalized dataset/thread run on Windows
log scale
gastrulation_pijuansa…tms_ss2heart_adultsplitseq_rosenbergpbmc200k_glaucomapbmc68k
Thread sweep
Speedup across finalized thread counts on Windows
10×20×14full (32)gastrulation_pijuansala · ood_large31 threads · 14.1× speedup20.44 s baseline → 1.42 s optimizedmemory 33 GB → 33 GBgastrulation_pijuansala · ood_large34 threads · 14.2× speedup19.98 s baseline → 1.41 s optimizedmemory 33 GB → 33 GBgastrulation_pijuansala · ood_large332 threads · 13.7× speedup17.97 s baseline → 1.46 s optimizedmemory 33 GB → 33 GBtms_ss2 · ood_large21 threads · 11.9× speedup11.36 s baseline → 960 ms optimizedmemory 21 GB → 20 GBtms_ss2 · ood_large24 threads · 11.8× speedup12.38 s baseline → 970 ms optimizedmemory 21 GB → 20 GBtms_ss2 · ood_large232 threads · 12.8× speedup11.40 s baseline → 890 ms optimizedmemory 21 GB → 20 GBheart_adult · large1 threads · 12.2× speedup24.74 s baseline → 2.28 s optimizedmemory 50 GB → 47 GBheart_adult · large4 threads · 11.1× speedup27.86 s baseline → 2.50 s optimizedmemory 50 GB → 47 GBheart_adult · large32 threads · 12.7× speedup26.81 s baseline → 2.18 s optimizedmemory 50 GB → 47 GBsplitseq_rosenberg · ood_large11 threads · 9.42× speedup5.37 s baseline → 570 ms optimizedmemory 12 GB → 12 GBsplitseq_rosenberg · ood_large14 threads · 10.7× speedup5.27 s baseline → 500 ms optimizedmemory 12 GB → 12 GBsplitseq_rosenberg · ood_large132 threads · 10.5× speedup5.58 s baseline → 510 ms optimizedmemory 12 GB → 12 GBpbmc200k_glaucoma · medium1 threads · 8.01× speedup9.37 s baseline → 1.17 s optimizedmemory 20 GB → 19 GBpbmc200k_glaucoma · medium4 threads · 9.10× speedup9.27 s baseline → 1.03 s optimizedmemory 20 GB → 18 GBpbmc200k_glaucoma · medium32 threads · 8.52× speedup11.55 s baseline → 1.10 s optimizedmemory 20 GB → 18 GBpbmc68k · small1 threads · 3.32× speedup2.31 s baseline → 696 ms optimizedmemory 4.5 GB → 4.0 GBpbmc68k · small4 threads · 7.22× speedup2.48 s baseline → 320 ms optimizedmemory 4.5 GB → 4.2 GBpbmc68k · small32 threads · 6.08× speedup2.12 s baseline → 380 ms optimizedmemory 4.5 GB → 4.2 GB
gastrulation_pijuan…tms_ss2heart_adultsplitseq_rosenbergpbmc200k_glaucomapbmc68k
Memory
Baseline vs optimized peak memory on Windows
0.0 GB25 GB50 GBheart_adult0.95×gastrulation_piju…1.00×tms_ss20.95×pbmc200k_glaucoma0.92×splitseq_rosenberg1.00×pbmc68k0.94×heart_adult · largememory 50 GB → 47 GBoptimized / baseline 0.95×12.7× speedup · 32 threadsgastrulation_pijuansala · ood_large3memory 33 GB → 33 GBoptimized / baseline 1.00×14.2× speedup · 4 threadstms_ss2 · ood_large2memory 21 GB → 20 GBoptimized / baseline 0.95×12.8× speedup · 32 threadspbmc200k_glaucoma · mediummemory 20 GB → 18 GBoptimized / baseline 0.92×9.10× speedup · 4 threadssplitseq_rosenberg · ood_large1memory 12 GB → 12 GBoptimized / baseline 1.00×10.7× speedup · 4 threadspbmc68k · smallmemory 4.5 GB → 4.2 GBoptimized / baseline 0.94×7.22× speedup · 4 threads
baselineoptimized

What is accelerated

The public API stays the same; AutoZyme replaces only the supported fast path.

This task targets FindVariableFeatures in Seurat. The benchmarked result preserves the declared scientific output gate while reducing CPU runtime on the listed datasets.

Also searched as: HVG, highly variable genes, variable features, variable genes, feature selection, vst.

Supported scope

Fast path is correct only for the upstream-default VST selection on a v5 Assay5 with a unified "counts" layer. Read full supported scope

Fast path is correct only for the upstream-default VST selection on a v5 Assay5 with a unified "counts" layer. Concretely, the entry-point method fast_FindVariableFeatures_Seurat takes the fast route when: zyme/turbo TRUE (default) AND selection.method == "vst" AND the resolved assay inherits "Assay5" AND it has a "counts" layer (L254, L270-271). It honors loess.span (passed as span) and clip.max (passed as clip, "auto" -> NULL -> vmax=sqrt(n_cells)); nfeatures controls top-N selection. The kernel (fast_VST_dgCMatrix, L152-196) computes per-row mean/variance over the counts dgCMatrix via a parallel C++ pass, fits log-var~log-mean with stats:::simpleLoess (span, degree 2), standardizes/clips, and picks the top-nselect by standardized variance. Correctness is approximate (HVG overlap, comparator gte 0.95 hvg_jaccard), not bit-exact, because it substitutes stats:::simpleLoess for the upstream loess() call and HVG order can drift around ties / loess-boundary (manifest L157-168). This exactly matches the benchmarked call (selection.method="vst", nfeatures=2000, all other VST args default).

Out-of-scope behavior

silent fallback to upstream

Show detailed speedup table 11 runs
Dataset Tier Platform Threads Baseline Optimized Speedup Memory Concordance Pass
gastrulation_pijuansala ood_large3 Windows 4 19.98 s 1.41 s 14.2× 33.4 → 33.4 GB pass
heart_adult large Windows 32 26.81 s 2.18 s 12.7× 49.6 → 47.3 GB pass
pbmc200k_glaucoma medium Windows 4 9.27 s 1.03 s 9.10× 20.0 → 18.3 GB pass
pbmc68k small Windows 4 2.48 s 320 ms 7.22× 4.5 → 4.2 GB pass
splitseq_rosenberg ood_large1 Windows 4 5.27 s 500 ms 10.7× 11.7 → 11.7 GB pass
tms_ss2 ood_large2 Windows 32 11.40 s 890 ms 12.8× 21.0 → 19.9 GB pass
gastrulation_pijuansala ood_large3 macOS 14 16.98 s 1.93 s 9.48× 19.4 → 13.7 GB pass
pbmc200k_glaucoma medium macOS 1 7.36 s 1.28 s 5.75× 12.0 → 9.1 GB pass
pbmc68k (inferred) small macOS 14 2.67 s 398 ms 6.67× 5.3 → 4.6 GB pass
splitseq_rosenberg ood_large1 macOS 4 4.58 s 692 ms 6.62× 7.5 → 5.8 GB pass
tms_ss2 ood_large2 macOS 14 8.24 s 1.08 s 7.56× 11.8 → 8.5 GB pass

Frequently asked questions

Speeding up Seurat FindVariableFeatures
Why is Seurat FindVariableFeatures slow?

Seurat FindVariableFeatures is CPU-bound, and the stock implementation in Seurat leaves performance on the table in its core numerical work. On the benchmark datasets the original takes 19.98 s where the AutoZyme path takes 1.41 s (14.2× faster).

How do I make Seurat FindVariableFeatures faster?

Install AutoZyme and activate the Seurat patch, then keep using Seurat FindVariableFeatures exactly as before. AutoZyme transparently substitutes the faster, output-validated path, up to 14.2× faster on the benchmark datasets, with no pipeline or API changes.

Does the AutoZyme speedup change the Seurat FindVariableFeatures output?

No. The accelerated path returns bit-for-bit identical results to the original Seurat implementation (maximum absolute difference 0), checked by a frozen concordance gate on every benchmark dataset.

How do I install the Seurat speedup?

In R: install the autozyme package, then run library(autozyme) and autozyme::activate("seurat"). The patch applies automatically the next time you call FindVariableFeatures.