Speed up Seurat NormalizeData: up to 43.7× faster, identical output

Benchmark charts

Switch benchmark platform; all charts update together

Speedup distribution

Each dot is one finalized dataset/thread run on Windows

log scale

pbmc200k_glaucoma

43.7×

splitseq_rosenberg

43.2×

pbmc68k

37.4×

heart_adult

35.5×

tms_ss2

34.7×

gastrulation_pijuansa…

33.4×

pbmc200k_glaucomasplitseq_rosenbergpbmc68kheart_adulttms_ss2gastrulation_pijuansa…

Thread sweep

Speedup across finalized thread counts on Windows

pbmc200k_glaucomasplitseq_rosenbergpbmc68kheart_adulttms_ss2gastrulation_pijuan…

Memory

Baseline vs optimized peak memory on Windows

baselineoptimized

What is accelerated

The public API stays the same; AutoZyme replaces only the supported fast path.

This task targets NormalizeData in Seurat. The benchmarked result preserves the declared scientific output gate while reducing CPU runtime on the listed datasets.

Also searched as: normalization, LogNormalize, log normalize, normalize_total.

Supported scope

Fast path handles LogNormalize column normalization on a Seurat v5 object whose target assay is an Assay5/StdAssay with a single "counts" layer, producing the default "data" layer. Read full supported scope

Fast path handles LogNormalize column normalization on a Seurat v5 object whose target assay is an Assay5/StdAssay with a single "counts" layer, producing the default "data" layer. It is gated to: zyme/turbo TRUE; normalization.method exactly "LogNormalize"; scale.factor a finite numeric scalar (length 1); margin a finite numeric scalar == 1; block.size NULL; no extra dot arguments (length(...)==0); assay NULL or a single non-NA character assay name that exists; assay inherits StdAssay with layers/cells/features slots; Layers(search="counts") returns exactly "counts"; counts layer non-null and coercible to dgCMatrix. Algorithm: per column, col_sum = sum of nonzero entries (equals full column total for sparse counts), then x := fast_log1p(x * scale.factor/col_sum); columns with col_sum<=0 left untouched (matches upstream zero-column behavior). It writes the data layer and adds the "data" column to the cells/features LogMaps, then runs LogSeuratCommand. Numeric equivalence is exact within ~1e-5 (the fast_log1p polynomial approximation has ~1.8e-11 max abs error, far inside the task's max_rel_err<=0.01 / data_cor>=0.999 gates).

Out-of-scope behavior

silent fallback to upstream

Show detailed speedup table 11 runs

Dataset	Tier	Platform	Threads	Baseline	Optimized	Speedup	Memory	Concordance	Pass
`gastrulation_pijuansala`	ood_large3	Windows	32	30.41 s	910 ms	33.4×	33.4 → 33.4 GB	—	pass
`heart_adult`	large	Windows	32	38.72 s	1.09 s	35.5×	49.5 → 47.3 GB	—	pass
`pbmc200k_glaucoma`	medium	Windows	32	15.00 s	340 ms	43.7×	19.2 → 18.3 GB	—	pass
`pbmc68k`	small	Windows	32	2.95 s	80 ms	37.4×	3.9 → 3.9 GB	—	pass
`splitseq_rosenberg`	ood_large1	Windows	32	8.24 s	190 ms	43.2×	11.3 → 10.8 GB	—	pass
`tms_ss2`	ood_large2	Windows	32	18.13 s	530 ms	34.7×	20.9 → 19.9 GB	—	pass
`gastrulation_pijuansala`	ood_large3	macOS	14	18.16 s	1.49 s	12.2×	26.0 → 17.2 GB	—	pass
`pbmc200k_glaucoma`	medium	macOS	1	5.89 s	795 ms	8.30×	16.5 → 10.7 GB	—	pass
`pbmc68k`	small	macOS	14	1.56 s	144 ms	9.00×	3.9 → 2.6 GB	—	pass
`splitseq_rosenberg`	ood_large1	macOS	1	3.15 s	360 ms	8.22×	14.2 → 12.0 GB	—	pass
`tms_ss2`	ood_large2	macOS	1	10.32 s	912 ms	11.5×	19.8 → 18.8 GB	—	pass

Frequently asked questions

Speeding up Seurat NormalizeData

Why is Seurat NormalizeData slow?

Seurat NormalizeData is CPU-bound, and the stock implementation in Seurat leaves performance on the table in its core numerical work. On the benchmark datasets the original takes 15.00 s where the AutoZyme path takes 340 ms (43.7× faster).

How do I make Seurat NormalizeData faster?

Install AutoZyme and activate the Seurat patch, then keep using Seurat NormalizeData exactly as before. AutoZyme transparently substitutes the faster, output-validated path, up to 43.7× faster on the benchmark datasets, with no pipeline or API changes.

Does the AutoZyme speedup change the Seurat NormalizeData output?

No. The accelerated path returns bit-for-bit identical results to the original Seurat implementation (maximum absolute difference 0), checked by a frozen concordance gate on every benchmark dataset.

How do I install the Seurat speedup?

In R: install the autozyme package, then run library(autozyme) and autozyme::activate("seurat"). The patch applies automatically the next time you call NormalizeData.