R Seurat methods Seurat

Speed up Seurat ScaleData

Seurat ScaleData is one of the slower steps in many single-cell genomics workflows. AutoZyme ships a verified, drop-in patch that is up to 24.7× faster, returning bit-for-bit identical results with no change to how you call it.

Best speedup 24.7×
Median speedup 8.47×
Output equivalence Bit-exact
Best runtime baseline 2.51 s optimized 110 ms
Datasets 7
Pass rate 11/11

Benchmark charts

Switch benchmark platform; all charts update together
Platform
Speedup distribution
Each dot is one finalized dataset/thread run on Windows
log scale
pbmc68ksplitseq_rosenbergpbmc200k_glaucomaheart_adulttms_ss2gastrulation_pijuansa…
Thread sweep
Speedup across finalized thread counts on Windows
25×50×14full (32)pbmc68k · small1 threads · 9.43× speedup3.01 s baseline → 289 ms optimizedmemory 5.4 GB → 4.3 GBpbmc68k · small4 threads · 19.4× speedup2.72 s baseline → 140 ms optimizedmemory 5.4 GB → 4.3 GBpbmc68k · small32 threads · 24.7× speedup2.51 s baseline → 110 ms optimizedmemory 5.4 GB → 4.3 GBsplitseq_rosenberg · ood_large11 threads · 7.40× speedup5.23 s baseline → 717 ms optimizedmemory 14 GB → 12 GBsplitseq_rosenberg · ood_large14 threads · 12.9× speedup5.31 s baseline → 410 ms optimizedmemory 14 GB → 12 GBsplitseq_rosenberg · ood_large132 threads · 16.1× speedup5.36 s baseline → 330 ms optimizedmemory 14 GB → 12 GBpbmc200k_glaucoma · medium1 threads · 5.98× speedup7.35 s baseline → 1.25 s optimizedmemory 24 GB → 18 GBpbmc200k_glaucoma · medium4 threads · 9.58× speedup7.47 s baseline → 780 ms optimizedmemory 24 GB → 18 GBpbmc200k_glaucoma · medium32 threads · 13.3× speedup8.56 s baseline → 560 ms optimizedmemory 24 GB → 18 GBheart_adult · large1 threads · 5.93× speedup16.39 s baseline → 2.78 s optimizedmemory 59 GB → 47 GBheart_adult · large4 threads · 10.1× speedup16.81 s baseline → 1.63 s optimizedmemory 59 GB → 47 GBheart_adult · large32 threads · 11.7× speedup16.59 s baseline → 1.41 s optimizedmemory 59 GB → 47 GBtms_ss2 · ood_large21 threads · 4.20× speedup5.08 s baseline → 1.17 s optimizedmemory 24 GB → 20 GBtms_ss2 · ood_large24 threads · 6.55× speedup4.91 s baseline → 750 ms optimizedmemory 24 GB → 20 GBtms_ss2 · ood_large232 threads · 8.47× speedup4.36 s baseline → 580 ms optimizedmemory 24 GB → 20 GBgastrulation_pijuansala · ood_large31 threads · 4.41× speedup7.25 s baseline → 1.52 s optimizedmemory 41 GB → 33 GBgastrulation_pijuansala · ood_large34 threads · 7.37× speedup6.71 s baseline → 910 ms optimizedmemory 41 GB → 33 GBgastrulation_pijuansala · ood_large332 threads · 8.28× speedup6.40 s baseline → 810 ms optimizedmemory 41 GB → 33 GB
pbmc68ksplitseq_rosenbergpbmc200k_glaucomaheart_adulttms_ss2gastrulation_pijuan…
Memory
Baseline vs optimized peak memory on Windows
0.0 GB50 GB100 GBheart_adult0.80×gastrulation_piju…0.82×tms_ss20.83×pbmc200k_glaucoma0.78×splitseq_rosenberg0.85×pbmc68k0.80×heart_adult · largememory 59 GB → 47 GBoptimized / baseline 0.80×11.7× speedup · 32 threadsgastrulation_pijuansala · ood_large3memory 41 GB → 33 GBoptimized / baseline 0.82×8.28× speedup · 32 threadstms_ss2 · ood_large2memory 24 GB → 20 GBoptimized / baseline 0.83×8.47× speedup · 32 threadspbmc200k_glaucoma · mediummemory 24 GB → 18 GBoptimized / baseline 0.78×13.3× speedup · 32 threadssplitseq_rosenberg · ood_large1memory 14 GB → 12 GBoptimized / baseline 0.85×16.1× speedup · 32 threadspbmc68k · smallmemory 5.4 GB → 4.3 GBoptimized / baseline 0.80×24.7× speedup · 32 threads
baselineoptimized

What is accelerated

The public API stays the same; AutoZyme replaces only the supported fast path.

This task targets ScaleData in Seurat. The benchmarked result preserves the declared scientific output gate while reducing CPU runtime on the listed datasets.

Also searched as: scaling, z-score, standardize, center and scale.

Supported scope

Fast path (turbo_scale_sparse_full) handles the canonical default per-feature z-scaling: a v5 Seurat object (Assay5) with a unified "data" layer that is a dgCMatrix, scaling+centering every selected feature globally over all cells. Read full supported scope

Fast path (turbo_scale_sparse_full) handles the canonical default per-feature z-scaling: a v5 Seurat object (Assay5) with a unified "data" layer that is a dgCMatrix, scaling+centering every selected feature globally over all cells. All of these must hold simultaneously: vars.to.regress=NULL, split.by=NULL, model.use=="linear", use.umi=FALSE, do.scale=TRUE, do.center=TRUE (all guarded at patch.R:301-303). features may be NULL (resolves to VariableFeatures, else rownames, matching upstream ScaleData.Assay) or an explicit subset; assay may be NULL (DefaultAssay) or named. scale.max is honored and applied as a POSITIVE-tail-only cap (kernel lines 53,69), matching Seurat's FastSparseRowScale. Variance uses the n-1 (sample) denominator; zero-variance features get sd=1 (kernel lines 45-47), matching Seurat. Result is materialized as a dense features x cells matrix and written to the scale.data layer with the cells/features metadata flags updated (patch.R:343-353). The task's correctness gate is gene_cor_min >= 0.99 (not bit-exact), consistent with this numeric-close contract.

Out-of-scope behavior

silent fallback to upstream

Show detailed speedup table 11 runs
Dataset Tier Platform Threads Baseline Optimized Speedup Memory Concordance Pass
gastrulation_pijuansala ood_large3 Windows 32 6.40 s 810 ms 8.28× 40.6 → 33.4 GB pass
heart_adult large Windows 32 16.59 s 1.41 s 11.7× 59.1 → 47.3 GB pass
pbmc200k_glaucoma medium Windows 32 8.56 s 560 ms 13.3× 23.6 → 18.3 GB pass
pbmc68k small Windows 32 2.51 s 110 ms 24.7× 5.4 → 4.3 GB pass
splitseq_rosenberg ood_large1 Windows 32 5.36 s 330 ms 16.1× 14.0 → 11.8 GB pass
tms_ss2 ood_large2 Windows 32 4.36 s 580 ms 8.47× 24.1 → 19.9 GB pass
gastrulation_pijuansala ood_large3 macOS 1 6.40 s 1.24 s 5.16× 19.0 → 17.2 GB pass
pbmc200k_glaucoma medium macOS 1 7.47 s 1.00 s 7.36× 19.2 → 11.9 GB pass
pbmc68k (inferred) small macOS 1 2.65 s 283 ms 9.37× 7.6 → 5.5 GB pass
splitseq_rosenberg ood_large1 macOS 1 3.95 s 548 ms 7.24× 16.3 → 13.5 GB pass
tms_ss2 ood_large2 macOS 4 3.45 s 605 ms 5.58× 13.9 → 10.0 GB pass

Frequently asked questions

Speeding up Seurat ScaleData
Why is Seurat ScaleData slow?

Seurat ScaleData is CPU-bound, and the stock implementation in Seurat leaves performance on the table in its core numerical work. On the benchmark datasets the original takes 2.51 s where the AutoZyme path takes 110 ms (24.7× faster).

How do I make Seurat ScaleData faster?

Install AutoZyme and activate the Seurat patch, then keep using Seurat ScaleData exactly as before. AutoZyme transparently substitutes the faster, output-validated path, up to 24.7× faster on the benchmark datasets, with no pipeline or API changes.

Does the AutoZyme speedup change the Seurat ScaleData output?

No. The accelerated path returns bit-for-bit identical results to the original Seurat implementation (maximum absolute difference 0), checked by a frozen concordance gate on every benchmark dataset.

How do I install the Seurat speedup?

In R: install the autozyme package, then run library(autozyme) and autozyme::activate("seurat"). The patch applies automatically the next time you call ScaleData.