Scanpy pca is one of the slower steps in many single-cell genomics workflows. AutoZyme ships a
verified, drop-in patch that is up to 36.0× faster, returning output within a strict, verified tolerance with no change to how you call it.
Best speedup36.0×
Median speedup20.5×
Output equivalenceTolerance
Best runtime baseline 58.94 s → optimized 1.63 s
Datasets6
Pass rate11/11
Benchmark charts
Switch benchmark platform; all charts update together
Platform
Speedup distribution
Each dot is one finalized dataset/thread run on Windows
log scale
pbmc200k_glaucoma
pbmc200k_glaucoma · medium1 threads · 14.0× speedup1.50 min baseline → 6.43 s optimizedmemory 13 GB → 7.7 GBpbmc200k_glaucoma · medium4 threads · 24.7× speedup57.46 s baseline → 2.34 s optimizedmemory 13 GB → 7.7 GBpbmc200k_glaucoma · medium32 threads · 36.0× speedup58.94 s baseline → 1.63 s optimizedmemory 14 GB → 7.9 GB
36.0×
pbmc68k
pbmc68k · small1 threads · 17.2× speedup35.89 s baseline → 2.10 s optimizedmemory 3.4 GB → 1.3 GBpbmc68k · small4 threads · 25.2× speedup20.90 s baseline → 829 ms optimizedmemory 3.5 GB → 1.6 GBpbmc68k · small32 threads · 12.0× speedup22.31 s baseline → 1.89 s optimizedmemory 3.6 GB → 1.6 GB
25.2×
heart_adult
heart_adult · large1 threads · 8.05× speedup1.97 min baseline → 14.83 s optimizedmemory 32 GB → 19 GBheart_adult · large4 threads · 14.3× speedup1.25 min baseline → 5.24 s optimizedmemory 32 GB → 19 GBheart_adult · large32 threads · 23.7× speedup1.28 min baseline → 3.27 s optimizedmemory 33 GB → 19 GB
23.7×
splitseq_rosenberg
splitseq_rosenberg · ood_large11 threads · 8.98× speedup42.10 s baseline → 4.70 s optimizedmemory 9.3 GB → 4.7 GBsplitseq_rosenberg · ood_large14 threads · 16.4× speedup27.21 s baseline → 1.66 s optimizedmemory 9.4 GB → 4.7 GBsplitseq_rosenberg · ood_large132 threads · 23.1× speedup27.56 s baseline → 1.19 s optimizedmemory 9.8 GB → 4.9 GB
23.1×
gastrulation_pijuansa…
gastrulation_pijuansala · ood_large31 threads · 4.90× speedup26.48 s baseline → 5.41 s optimizedmemory 15 GB → 15 GBgastrulation_pijuansala · ood_large34 threads · 12.0× speedup20.70 s baseline → 1.72 s optimizedmemory 15 GB → 15 GBgastrulation_pijuansala · ood_large332 threads · 15.7× speedup18.52 s baseline → 1.17 s optimizedmemory 15 GB → 15 GB
15.7×
tms_ss2
tms_ss2 · ood_large21 threads · 6.67× speedup23.80 s baseline → 3.54 s optimizedmemory 9.5 GB → 8.9 GBtms_ss2 · ood_large24 threads · 13.6× speedup17.89 s baseline → 1.34 s optimizedmemory 9.7 GB → 8.9 GBtms_ss2 · ood_large232 threads · 10.7× speedup14.88 s baseline → 1.39 s optimizedmemory 9.9 GB → 8.9 GB
The public API stays the same; AutoZyme replaces only the supported fast path.
This task targets pca in Scanpy. The benchmarked result
preserves the declared scientific output gate while reducing CPU runtime on the listed datasets.
Also searched as: PCA, principal components, dimensionality reduction, RunPCA, pp.pca, tl.pca.
Supported scope
The fast path computes a full, zero-centered PCA via a Gram matrix (X^T X scaled by 1/(n_cells-1)) plus a symmetric eigendecomposition, then projects centered X onto the top n_comps eigenvectors.Read full supported scope
The fast path computes a full, zero-centered PCA via a Gram matrix (X^T X scaled by 1/(n_cells-1)) plus a symmetric eigendecomposition, then projects centered X onto the top n_comps eigenvectors. It correctly handles: dense or sparse adata.X (sparse is densified via toarray, line 159); any n_comps valid for the matrix; copy=True/False; and the implicit upstream zero_center=True / arpack-or-auto / full deterministic-solver case (output is mathematically equivalent up to sign, which is what the eval metric min_pc_cor>=0.95 checks). use_highly_variable is honored explicitly (lines 121-145): when an HVG annotation is present (or use_highly_variable=True) and the mask is a proper subset, it recurses on the HVG-subset matrix and lifts PCs back into full var space with zeros for non-HVG genes; use_highly_variable=False forces the full-gene path. Matrices with n_genes>8000 fall through to upstream ARPACK (line 154-156) where all kwargs are forwarded, and zyme=False (line 105-106) forwards everything to upstream. On macOS it uses Apple Accelerate cblas_sgemm + numpy.linalg.eigh; elsewhere numpy BLAS + scipy.linalg.eigh(driver="evr"). All computation is done in float32 internally regardless of requested dtype.
Out-of-scope behavior
silent fallback to upstream
Show detailed speedup table11 runs▾
Dataset
Tier
Platform
Threads
Baseline
Optimized
Speedup
Memory
Concordance
Pass
gastrulation_pijuansala
ood_large3
Windows
32
18.52 s
1.17 s
15.7×
14.8 → 15.0 GB
—
pass
heart_adult
large
Windows
32
1.28 min
3.27 s
23.7×
32.7 → 19.3 GB
—
pass
pbmc200k_glaucoma
medium
Windows
32
58.94 s
1.63 s
36.0×
14.0 → 7.9 GB
—
pass
pbmc68k
small
Windows
4
20.90 s
829 ms
25.2×
3.5 → 1.6 GB
—
pass
splitseq_rosenberg
ood_large1
Windows
32
27.56 s
1.19 s
23.1×
9.8 → 4.9 GB
—
pass
tms_ss2
ood_large2
Windows
4
17.89 s
1.34 s
13.6×
9.7 → 8.9 GB
—
pass
gastrulation_pijuansala
ood_large3
macOS
4
8.75 s
815 ms
12.3×
14.4 → 14.7 GB
—
pass
pbmc200k_glaucoma
medium
macOS
4
28.54 s
976 ms
29.6×
16.5 → 10.3 GB
—
pass
pbmc68k
small
macOS
14
12.35 s
600 ms
20.5×
7.3 → 2.2 GB
—
pass
splitseq_rosenberg
ood_large1
macOS
4
13.93 s
911 ms
17.0×
14.1 → 7.1 GB
—
pass
tms_ss2
ood_large2
macOS
14
7.47 s
766 ms
9.30×
9.2 → 9.0 GB
—
pass
Frequently asked questions
Speeding up Scanpy pca
Why is Scanpy pca slow?
Scanpy pca is CPU-bound, and the stock implementation in Scanpy leaves performance on the table in its core numerical work. On the benchmark datasets the original takes 58.94 s where the AutoZyme path takes 1.63 s (36.0× faster).
How do I make Scanpy pca faster?
Install AutoZyme and activate the Scanpy patch, then keep using Scanpy pca exactly as before. AutoZyme transparently substitutes the faster, output-validated path, up to 36.0× faster on the benchmark datasets, with no pipeline or API changes.
Does the AutoZyme speedup change the Scanpy pca output?
Effectively no. The output is tolerance-equivalent: held within a frozen concordance gate (up to about 0.6% drift from the original Scanpy result) on every benchmark dataset.
How do I install the Scanpy speedup?
In Python: pip install autozyme, then import autozyme and autozyme.activate("scanpy"). The patch applies automatically the next time you call pca.