R Seurat methods Seurat

Speed up Seurat RunPCA

Seurat RunPCA is one of the slower steps in many single-cell genomics workflows. AutoZyme ships a verified, drop-in patch that is up to 85.5× faster, returning output within a strict, verified tolerance with no change to how you call it.

Best speedup 85.5×
Median speedup 24.6×
Output equivalence Tolerance
Best runtime baseline 2.01 min optimized 1.42 s
Datasets 7
Pass rate 11/11

Benchmark charts

Switch benchmark platform; all charts update together
Platform
Speedup distribution
Each dot is one finalized dataset/thread run on Windows
log scale
pbmc68kpbmc200k_glaucomaheart_adultsplitseq_rosenbergtms_ss2gastrulation_pijuansa…
Thread sweep
Speedup across finalized thread counts on Windows
50×100×14full (32)pbmc68k · small1 threads · 27.2× speedup2.05 min baseline → 4.46 s optimizedmemory 7.3 GB → 5.3 GBpbmc68k · small4 threads · 75.4× speedup2.02 min baseline → 1.61 s optimizedmemory 7.3 GB → 5.3 GBpbmc68k · small32 threads · 85.5× speedup2.01 min baseline → 1.42 s optimizedmemory 7.3 GB → 5.3 GBpbmc200k_glaucoma · medium1 threads · 13.0× speedup2.65 min baseline → 12.33 s optimizedmemory 29 GB → 20 GBpbmc200k_glaucoma · medium4 threads · 28.8× speedup2.66 min baseline → 5.56 s optimizedmemory 29 GB → 20 GBpbmc200k_glaucoma · medium32 threads · 48.8× speedup2.93 min baseline → 3.33 s optimizedmemory 29 GB → 20 GBheart_adult · large1 threads · 7.31× speedup3.52 min baseline → 29.20 s optimizedmemory 73 GB → 52 GBheart_adult · large4 threads · 21.8× speedup3.72 min baseline → 9.80 s optimizedmemory 73 GB → 52 GBheart_adult · large32 threads · 35.4× speedup3.56 min baseline → 6.02 s optimizedmemory 73 GB → 52 GBsplitseq_rosenberg · ood_large11 threads · 7.42× speedup1.19 min baseline → 9.62 s optimizedmemory 21 GB → 14 GBsplitseq_rosenberg · ood_large14 threads · 24.0× speedup1.19 min baseline → 2.97 s optimizedmemory 21 GB → 14 GBsplitseq_rosenberg · ood_large132 threads · 33.0× speedup1.19 min baseline → 2.16 s optimizedmemory 21 GB → 14 GBtms_ss2 · ood_large21 threads · 5.67× speedup41.82 s baseline → 7.16 s optimizedmemory 24 GB → 24 GBtms_ss2 · ood_large24 threads · 10.7× speedup40.57 s baseline → 3.81 s optimizedmemory 24 GB → 24 GBtms_ss2 · ood_large232 threads · 24.6× speedup36.95 s baseline → 1.65 s optimizedmemory 24 GB → 24 GBgastrulation_pijuansala · ood_large31 threads · 5.45× speedup48.23 s baseline → 8.67 s optimizedmemory 41 GB → 37 GBgastrulation_pijuansala · ood_large34 threads · 16.2× speedup47.22 s baseline → 2.91 s optimizedmemory 41 GB → 37 GBgastrulation_pijuansala · ood_large332 threads · 23.0× speedup45.29 s baseline → 2.05 s optimizedmemory 41 GB → 37 GB
pbmc68kpbmc200k_glaucomaheart_adultsplitseq_rosenbergtms_ss2gastrulation_pijuan…
Memory
Baseline vs optimized peak memory on Windows
0.0 GB50 GB100 GBheart_adult0.71×gastrulation_piju…0.91×pbmc200k_glaucoma0.71×tms_ss20.99×splitseq_rosenberg0.69×pbmc68k0.73×heart_adult · largememory 73 GB → 52 GBoptimized / baseline 0.71×35.4× speedup · 32 threadsgastrulation_pijuansala · ood_large3memory 41 GB → 37 GBoptimized / baseline 0.91×23.0× speedup · 32 threadspbmc200k_glaucoma · mediummemory 29 GB → 20 GBoptimized / baseline 0.71×48.8× speedup · 32 threadstms_ss2 · ood_large2memory 24 GB → 24 GBoptimized / baseline 0.99×24.6× speedup · 32 threadssplitseq_rosenberg · ood_large1memory 21 GB → 14 GBoptimized / baseline 0.69×33.0× speedup · 32 threadspbmc68k · smallmemory 7.3 GB → 5.3 GBoptimized / baseline 0.73×85.5× speedup · 32 threads
baselineoptimized

What is accelerated

The public API stays the same; AutoZyme replaces only the supported fast path.

This task targets RunPCA in Seurat. The benchmarked result preserves the declared scientific output gate while reducing CPU runtime on the listed datasets.

Also searched as: PCA, principal component analysis, dimensionality reduction, dim reduction.

Supported scope

The fast path computes PCA via a Gram-matrix eigendecomposition in Python (numpy/scipy via reticulate) and is correct (embeddings/loadings/stdev close up to sign) only for the upstream-default configuration: rev.pca = FALSE, weight.by.var = TRUE OR FALSE… Read full supported scope

The fast path computes PCA via a Gram-matrix eigendecomposition in Python (numpy/scipy via reticulate) and is correct (embeddings/loadings/stdev close up to sign) only for the upstream-default configuration: rev.pca = FALSE, weight.by.var = TRUE OR FALSE (both handled — FALSE divides embeddings by singular values, line 1107), seed.use any (set.seed honored), npcs <= nrow-1 (clamped, line 1070). Input (object passed to .default) must be a dense matrix or convertible-to-dense float64 array of features x cells that is ALREADY mean-centered (i.e. scale.data), with all requested features present and of nonzero variance. The StdAssay (Seurat-object) entry requires layer = 'scale.data' to actually be present/centered, and features that are a subset of the layer's features. Python with numpy+scipy must be importable. On Darwin it uses numpy.linalg.eigh (full) + slice; elsewhere scipy.linalg.eigh partial (driver='evr'); both yield the same supported result. This exactly covers the benchmarked default call.

Out-of-scope behavior

silent fallback to upstream

Show detailed speedup table 11 runs
Dataset Tier Platform Threads Baseline Optimized Speedup Memory Concordance Pass
gastrulation_pijuansala ood_large3 Windows 32 45.29 s 2.05 s 23.0× 40.6 → 37.1 GB pass
heart_adult large Windows 32 3.56 min 6.02 s 35.4× 73.5 → 52.1 GB pass
pbmc200k_glaucoma medium Windows 32 2.93 min 3.33 s 48.8× 28.7 → 20.2 GB pass
pbmc68k small Windows 32 2.01 min 1.42 s 85.5× 7.3 → 5.3 GB pass
splitseq_rosenberg ood_large1 Windows 32 1.19 min 2.16 s 33.0× 20.5 → 14.2 GB pass
tms_ss2 ood_large2 Windows 32 36.95 s 1.65 s 24.6× 24.1 → 23.8 GB pass
gastrulation_pijuansala ood_large3 macOS 14 17.10 s 1.47 s 12.2× 22.8 → 13.7 GB pass
pbmc200k_glaucoma medium macOS 4 44.19 s 1.98 s 22.4× 21.5 → 9.2 GB pass
pbmc68k (inferred) small macOS 4 40.75 s 1.26 s 32.3× 11.1 → 7.6 GB pass
splitseq_rosenberg ood_large1 macOS 14 23.13 s 1.55 s 14.8× 14.6 → 5.9 GB pass
tms_ss2 ood_large2 macOS 4 14.61 s 1.24 s 11.0× 15.8 → 8.5 GB pass

Frequently asked questions

Speeding up Seurat RunPCA
Why is Seurat RunPCA slow?

Seurat RunPCA is CPU-bound, and the stock implementation in Seurat leaves performance on the table in its core numerical work. On the benchmark datasets the original takes 2.01 min where the AutoZyme path takes 1.42 s (85.5× faster).

How do I make Seurat RunPCA faster?

Install AutoZyme and activate the Seurat patch, then keep using Seurat RunPCA exactly as before. AutoZyme transparently substitutes the faster, output-validated path, up to 85.5× faster on the benchmark datasets, with no pipeline or API changes.

Does the AutoZyme speedup change the Seurat RunPCA output?

Effectively no. The output is tolerance-equivalent: held within a frozen concordance gate (up to about 0.6% drift from the original Seurat result) on every benchmark dataset.

How do I install the Seurat speedup?

In R: install the autozyme package, then run library(autozyme) and autozyme::activate("seurat"). The patch applies automatically the next time you call RunPCA.