Speed up Seurat RunPCA: up to 85.5× faster, near-identical output

Q: Why is Seurat RunPCA slow?

Seurat RunPCA is CPU-bound, and the stock implementation in Seurat leaves performance on the table in its core numerical work. On the benchmark datasets the original takes 2.01 min where the AutoZyme path takes 1.42 s (85.5× faster).

Q: How do I make Seurat RunPCA faster?

Install AutoZyme and activate the Seurat patch, then keep using Seurat RunPCA exactly as before. AutoZyme transparently substitutes the faster, output-validated path, up to 85.5× faster on the benchmark datasets, with no pipeline or API changes.

Q: Does the AutoZyme speedup change the Seurat RunPCA output?

Effectively no. The output is tolerance-equivalent: held within a frozen concordance gate (up to about 0.6% drift from the original Seurat result) on every benchmark dataset.

Q: How do I install the Seurat speedup?

In R: install the autozyme package, then run library(autozyme) and autozyme::activate("seurat"). The patch applies automatically the next time you call RunPCA.

Benchmark charts

Switch benchmark platform; all charts update together

Speedup distribution

Each dot is one finalized dataset/thread run on Windows

log scale

pbmc68k

85.5×

pbmc200k_glaucoma

48.8×

heart_adult

35.4×

splitseq_rosenberg

33.0×

tms_ss2

24.6×

gastrulation_pijuansa…

23.0×

pbmc68kpbmc200k_glaucomaheart_adultsplitseq_rosenbergtms_ss2gastrulation_pijuansa…

Thread sweep

Speedup across finalized thread counts on Windows

pbmc68kpbmc200k_glaucomaheart_adultsplitseq_rosenbergtms_ss2gastrulation_pijuan…

Memory

Baseline vs optimized peak memory on Windows

baselineoptimized

What is accelerated

The public API stays the same; AutoZyme replaces only the supported fast path.

This task targets RunPCA in Seurat. The benchmarked result preserves the declared scientific output gate while reducing CPU runtime on the listed datasets.

Also searched as: PCA, principal component analysis, dimensionality reduction, dim reduction.

Supported scope

The fast path computes PCA via a Gram-matrix eigendecomposition in Python (numpy/scipy via reticulate) and is correct (embeddings/loadings/stdev close up to sign) only for the upstream-default configuration: rev.pca = FALSE, weight.by.var = TRUE OR FALSE… Read full supported scope

The fast path computes PCA via a Gram-matrix eigendecomposition in Python (numpy/scipy via reticulate) and is correct (embeddings/loadings/stdev close up to sign) only for the upstream-default configuration: rev.pca = FALSE, weight.by.var = TRUE OR FALSE (both handled — FALSE divides embeddings by singular values, line 1107), seed.use any (set.seed honored), npcs <= nrow-1 (clamped, line 1070). Input (object passed to .default) must be a dense matrix or convertible-to-dense float64 array of features x cells that is ALREADY mean-centered (i.e. scale.data), with all requested features present and of nonzero variance. The StdAssay (Seurat-object) entry requires layer = 'scale.data' to actually be present/centered, and features that are a subset of the layer's features. Python with numpy+scipy must be importable. On Darwin it uses numpy.linalg.eigh (full) + slice; elsewhere scipy.linalg.eigh partial (driver='evr'); both yield the same supported result. This exactly covers the benchmarked default call.

Out-of-scope behavior

silent fallback to upstream

Show detailed speedup table 11 runs

Dataset	Tier	Platform	Threads	Baseline	Optimized	Speedup	Memory	Concordance	Pass
`gastrulation_pijuansala`	ood_large3	Windows	32	45.29 s	2.05 s	23.0×	40.6 → 37.1 GB	—	pass
`heart_adult`	large	Windows	32	3.56 min	6.02 s	35.4×	73.5 → 52.1 GB	—	pass
`pbmc200k_glaucoma`	medium	Windows	32	2.93 min	3.33 s	48.8×	28.7 → 20.2 GB	—	pass
`pbmc68k`	small	Windows	32	2.01 min	1.42 s	85.5×	7.3 → 5.3 GB	—	pass
`splitseq_rosenberg`	ood_large1	Windows	32	1.19 min	2.16 s	33.0×	20.5 → 14.2 GB	—	pass
`tms_ss2`	ood_large2	Windows	32	36.95 s	1.65 s	24.6×	24.1 → 23.8 GB	—	pass
`gastrulation_pijuansala`	ood_large3	macOS	14	17.10 s	1.47 s	12.2×	22.8 → 13.7 GB	—	pass
`pbmc200k_glaucoma`	medium	macOS	4	44.19 s	1.98 s	22.4×	21.5 → 9.2 GB	—	pass
`pbmc68k (inferred)`	small	macOS	4	40.75 s	1.26 s	32.3×	11.1 → 7.6 GB	—	pass
`splitseq_rosenberg`	ood_large1	macOS	14	23.13 s	1.55 s	14.8×	14.6 → 5.9 GB	—	pass
`tms_ss2`	ood_large2	macOS	4	14.61 s	1.24 s	11.0×	15.8 → 8.5 GB	—	pass

Frequently asked questions

Speeding up Seurat RunPCA

Why is Seurat RunPCA slow?

Seurat RunPCA is CPU-bound, and the stock implementation in Seurat leaves performance on the table in its core numerical work. On the benchmark datasets the original takes 2.01 min where the AutoZyme path takes 1.42 s (85.5× faster).

How do I make Seurat RunPCA faster?

Install AutoZyme and activate the Seurat patch, then keep using Seurat RunPCA exactly as before. AutoZyme transparently substitutes the faster, output-validated path, up to 85.5× faster on the benchmark datasets, with no pipeline or API changes.

Does the AutoZyme speedup change the Seurat RunPCA output?

Effectively no. The output is tolerance-equivalent: held within a frozen concordance gate (up to about 0.6% drift from the original Seurat result) on every benchmark dataset.

How do I install the Seurat speedup?

In R: install the autozyme package, then run library(autozyme) and autozyme::activate("seurat"). The patch applies automatically the next time you call RunPCA.