R Seurat methods Seurat

Speed up Seurat FindAllMarkers

Seurat FindAllMarkers is one of the slower steps in many single-cell genomics workflows. AutoZyme ships a verified, drop-in patch that is up to 271.4× faster, returning bit-for-bit identical results with no change to how you call it.

Best speedup 271.4×
Median speedup 183.8×
Output equivalence Bit-exact
Best runtime baseline 24.90 min optimized 5.50 s
Datasets 7
Pass rate 9/10

Benchmark charts

Switch benchmark platform; all charts update together
Platform
Speedup distribution
Each dot is one finalized dataset/thread run on Windows
log scale
tms_ss2heart_adultsplitseq_rosenberggastrulation_pijuansa…pbmc200k_glaucomapbmc68k
Thread sweep
Speedup across finalized thread counts on Windows
250×500×14full (32)tms_ss2 · ood_large21 threads · 63.3× speedup23.85 min baseline → 23.60 s optimizedmemory 38 GB → 12 GBtms_ss2 · ood_large24 threads · 154.3× speedup25.33 min baseline → 9.68 s optimizedmemory 38 GB → 12 GBtms_ss2 · ood_large232 threads · 271.4× speedup24.90 min baseline → 5.50 s optimizedmemory 38 GB → 12 GBheart_adult · large1 threads · 51.8× speedup51.87 min baseline → 1.00 min optimizedmemory 65 GB → 29 GBheart_adult · large4 threads · 136.6× speedup64.11 min baseline → 22.79 s optimizedmemory 71 GB → 29 GBheart_adult · large32 threads · 249.7× speedup50.03 min baseline → 12.47 s optimizedmemory 74 GB → 29 GBsplitseq_rosenberg · ood_large11 threads · 42.8× speedup8.79 min baseline → 12.46 s optimizedmemory 18 GB → 7.1 GBsplitseq_rosenberg · ood_large14 threads · 115.6× speedup8.89 min baseline → 4.61 s optimizedmemory 18 GB → 7.1 GBsplitseq_rosenberg · ood_large132 threads · 222.3× speedup8.90 min baseline → 2.40 s optimizedmemory 18 GB → 7.2 GBgastrulation_pijuansala · ood_large31 threads · 42.1× speedup31.82 min baseline → 45.31 s optimizedmemory 65 GB → 19 GBgastrulation_pijuansala · ood_large34 threads · 107.7× speedup41.58 min baseline → 17.74 s optimizedmemory 61 GB → 19 GBgastrulation_pijuansala · ood_large332 threads · 199.8× speedup33.03 min baseline → 9.55 s optimizedmemory 64 GB → 19 GBpbmc200k_glaucoma · medium1 threads · 33.3× speedup12.34 min baseline → 21.46 s optimizedmemory 29 GB → 12 GBpbmc200k_glaucoma · medium4 threads · 88.3× speedup11.81 min baseline → 8.10 s optimizedmemory 29 GB → 12 GBpbmc200k_glaucoma · medium32 threads · 167.8× speedup11.91 min baseline → 4.26 s optimizedmemory 29 GB → 12 GBpbmc68k · small1 threads · 25.5× speedup1.48 min baseline → 3.72 s optimizedmemory 6.6 GB → 2.8 GBpbmc68k · small4 threads · 69.1× speedup1.62 min baseline → 1.37 s optimizedmemory 6.6 GB → 2.8 GBpbmc68k · small32 threads · 126.3× speedup1.61 min baseline → 751 ms optimizedmemory 6.6 GB → 2.9 GB
tms_ss2heart_adultsplitseq_rosenberggastrulation_pijuan…pbmc200k_glaucomapbmc68k
Memory
Baseline vs optimized peak memory on Windows
0.0 GB50 GB100 GBheart_adult0.39×gastrulation_piju…0.29×tms_ss20.31×pbmc200k_glaucoma0.40×splitseq_rosenberg0.40×pbmc68k0.43×heart_adult · largememory 74 GB → 29 GBoptimized / baseline 0.39×249.7× speedup · 32 threadsgastrulation_pijuansala · ood_large3memory 65 GB → 19 GBoptimized / baseline 0.29×42.1× speedup · 1 threadstms_ss2 · ood_large2memory 38 GB → 12 GBoptimized / baseline 0.31×154.3× speedup · 4 threadspbmc200k_glaucoma · mediummemory 29 GB → 12 GBoptimized / baseline 0.40×88.3× speedup · 4 threadssplitseq_rosenberg · ood_large1memory 18 GB → 7.1 GBoptimized / baseline 0.40×42.8× speedup · 1 threadspbmc68k · smallmemory 6.6 GB → 2.8 GBoptimized / baseline 0.43×120.5× speedup
baselineoptimized

What is accelerated

The public API stays the same; AutoZyme replaces only the supported fast path.

This task targets FindAllMarkers in Seurat. The benchmarked result preserves the declared scientific output gate while reducing CPU runtime on the listed datasets.

Also searched as: FindMarkers, FindConservedMarkers, marker genes, differential expression, DEG, DE genes, wilcoxon, wilcox.

Supported scope

The shipped default path (fast_FindAllMarkers_fusion) computes a Wilcoxon-rank-sum (normal-approximation, presto-style) marker test per cluster-vs-rest using a single fused RcppParallel C++ kernel. Read full supported scope

The shipped default path (fast_FindAllMarkers_fusion) computes a Wilcoxon-rank-sum (normal-approximation, presto-style) marker test per cluster-vs-rest using a single fused RcppParallel C++ kernel. It is taken ONLY when ALL of these hold (gate at patch.R:466-480): zyme/turbo enabled (default TRUE); test.use=='wilcox'; slot=='data'; features is NULL (all features); node is NULL; latent.vars is NULL; mean.fxn is NULL; fc.name is NULL; only.pos is FALSE; densify is FALSE; max.cells.per.ident is Inf; min.diff.pct == -Inf; base == 2; no extra (...) args (length(dots)==0); group.by is NULL or 'ident'. Additional runtime guards fall back to upstream: data layer must be a single (joined) dgCMatrix (patch.R:487-493), and Idents() must name all cells (patch.R:499-501). Within that gate, the fast path DOES honor user-supplied values of the args it actually consumes: logfc.threshold (default 0.1, used at :570), min.pct (default 0.01, used at :569), return.thresh (default 1e-2, used at :587), min.cells.group (default 3, used at :561 to skip small clusters), assay, and base (only base==2). Per-cluster small-group skipping matches Seurat behavior (warn+skip rare clusters rather than global fallback). p-value adjustment is Bonferroni over n.features. This matches the benchmarked call (object + verbose=FALSE = all defaults) exactly, so the benchmark exercises the supported fast path.

Out-of-scope behavior

silent possibly wrong

Show detailed speedup table 10 runs
Dataset Tier Platform Threads Baseline Optimized Speedup Memory Concordance Pass
gastrulation_pijuansala ood_large3 Windows 32 33.03 min 9.55 s 199.8× 63.5 → 19.1 GB pass
heart_adult large Windows 32 50.03 min 12.47 s 249.7× 74.4 → 29.1 GB pass
pbmc200k_glaucoma medium Windows 32 11.91 min 4.26 s 167.8× 28.9 → 11.6 GB pass
pbmc68k small Windows 32 1.61 min 751 ms 126.3× 6.6 → 2.9 GB pass
splitseq_rosenberg ood_large1 Windows 32 8.90 min 2.40 s 222.3× 18.0 → 7.2 GB pass
tms_ss2 ood_large2 Windows 32 24.90 min 5.50 s 271.4× 38.0 → 11.7 GB pass
gastrulation_pijuansala ood_large3 macOS 1 7.42 min 14.11 s 31.5× 20.5 → 28.6 GB fail
pbmc68k_full medium macOS 1 2.12 min 840 ms 151.6× 4.8 → 2.4 GB pass
splitseq_rosenberg ood_large1 macOS 1 5.63 min 2.67 s 126.8× 18.7 → 9.6 GB pass
tms_ss2 ood_large2 macOS 1 21.48 min 5.83 s 221.3× 25.8 → 17.4 GB pass

Frequently asked questions

Speeding up Seurat FindAllMarkers
Why is Seurat FindAllMarkers slow?

Seurat FindAllMarkers is CPU-bound, and the stock implementation in Seurat leaves performance on the table in its core numerical work. On the benchmark datasets the original takes 24.90 min where the AutoZyme path takes 5.50 s (271.4× faster).

How do I make Seurat FindAllMarkers faster?

Install AutoZyme and activate the Seurat patch, then keep using Seurat FindAllMarkers exactly as before. AutoZyme transparently substitutes the faster, output-validated path, up to 271.4× faster on the benchmark datasets, with no pipeline or API changes.

Does the AutoZyme speedup change the Seurat FindAllMarkers output?

No. The accelerated path returns bit-for-bit identical results to the original Seurat implementation (maximum absolute difference 0), checked by a frozen concordance gate on every benchmark dataset.

How do I install the Seurat speedup?

In R: install the autozyme package, then run library(autozyme) and autozyme::activate("seurat"). The patch applies automatically the next time you call FindAllMarkers.