R Differential expression & markers MAST

Speed up MAST

MAST is one of the slower steps in many single-cell genomics workflows. AutoZyme ships a verified, drop-in patch that is up to 3.37× faster, returning output within a strict, verified tolerance with no change to how you call it.

Best speedup 3.37×
Median speedup 3.33×
Output equivalence Tolerance
Best runtime baseline 3.28 min optimized 58.45 s
Datasets 5
Pass rate 10/10

Benchmark charts

Switch benchmark platform; all charts update together
Platform
Speedup distribution
Each dot is one finalized dataset/thread run on Windows
pbmc68k_18kpbmc68k_50kpbmc68k_8ktms_ss2_50kpbmc200k_glaucoma
Thread sweep
Speedup across finalized thread counts on Windows
No finalized multi-thread sweep for this platform.
Memory
Baseline vs optimized peak memory on Windows
0.0 GB10 GB20 GBpbmc200k_glaucoma1.00×tms_ss2_50k1.00×pbmc68k_50k1.00×pbmc68k_18k0.99×pbmc68k_8k1.00×pbmc200k_glaucoma · ood_xlargememory 18 GB → 18 GBoptimized / baseline 1.00×2.17× speedup · 1 threadstms_ss2_50k · ood_largememory 18 GB → 18 GBoptimized / baseline 1.00×2.28× speedup · 1 threadspbmc68k_50k · largememory 5.6 GB → 5.6 GBoptimized / baseline 1.00×3.29× speedup · 1 threadspbmc68k_18k · mediummemory 2.4 GB → 2.4 GBoptimized / baseline 0.99×3.37× speedup · 1 threadspbmc68k_8k · smallmemory 1.9 GB → 1.9 GBoptimized / baseline 1.00×3.20× speedup · 1 threads
baselineoptimized

What is accelerated

The public API stays the same; AutoZyme replaces only the supported fast path.

This task targets MAST in MAST. The benchmarked result preserves the declared scientific output gate while reducing CPU runtime on the listed datasets.

Also searched as: differential expression, hurdle model, zero-inflated, DEG.

Supported scope

The fast path is an APPROXIMATION (explicitly not bit-exact; documented in CHANGELOG/final_audit). Read full supported scope

The fast path is an APPROXIMATION (explicitly not bit-exact; documented in CHANGELOG/final_audit). Four namespace overrides on MAST internals: (1) fast_validateState strips IRLS sanity checks down to the convergence test; (2) fast_bgf no-ops family$validmu/$valideta and loosens the IRLS epsilon from 1e-8 to 1e-5 for ALL families and ALL fits; (3) fast_updateState by default delegates to MAST's original updater (the native C++ cpp_updateState path is gated OFF behind getOption('autozyme.mast.native_update_state') / env AUTOZYME_MAST_NATIVE_UPDATE_STATE=1, because it segfaults under package attest) -- the C++ kernel only handles binomial+logit and gaussian+identity with all(state$good)==TRUE; (4) fast_lrTest_hybrid: a two-pass hybrid that computes a cheap Wald statistic for every gene (chisq df=1 for cont/disc, df=2 for hurdle) then runs the EXACT upstream lrTest only on the top-K=200 genes ranked by Wald hurdle p-value, overwriting those K rows with exact LR results. lrTest fast path activates only for hypothesis of class CoefficientHypothesis or a length-1 character that resolves to exactly one matching column name in coefC. Correct/accepted regime as benchmarked: bayesglm method, a single-coefficient (1 df per component) contrast on a model with discrete+continuous (hurdle) components, where DE conclusions are driven by top-ranked genes (top-100 jaccard ~1, pearson on coef and -log10 p ~1; max coef deviation ~1e-4). The loosened 1e-5 epsilon and stripped validators apply unconditionally to all genes/families on the IRLS path.

Out-of-scope behavior

silent possibly wrong

Show detailed speedup table 10 runs
Dataset Tier Platform Threads Baseline Optimized Speedup Memory Concordance Pass
pbmc200k_glaucoma ood_xlarge Windows 1 39.60 min 18.30 min 2.17× 18.2 → 18.2 GB pass
pbmc68k_18k medium Windows 1 3.28 min 58.45 s 3.37× 2.4 → 2.4 GB pass
pbmc68k_50k large Windows 1 8.20 min 2.49 min 3.29× 5.6 → 5.6 GB pass
pbmc68k_8k small Windows 1 1.41 min 26.32 s 3.20× 1.9 → 1.9 GB pass
tms_ss2_50k ood_large Windows 1 9.87 min 4.34 min 2.28× 17.8 → 17.8 GB pass
pbmc200k_glaucoma ood_xlarge macOS 1 15.56 min 4.00 min 3.91× 18.9 → 19.0 GB pass
pbmc68k_18k medium macOS 8 23.06 s 5.26 s 4.56× 6.1 → 5.9 GB pass
pbmc68k_50k large macOS 8 1.56 min 14.39 s 6.38× 10.1 → 9.8 GB pass
pbmc68k_8k small macOS 4 12.15 s 3.99 s 3.10× 4.7 → 4.2 GB pass
tms_ss2_50k ood_large macOS 4 2.01 min 28.51 s 4.22× 14.0 → 13.9 GB pass

Frequently asked questions

Speeding up MAST
Why is MAST slow?

MAST is CPU-bound, and the stock implementation in MAST leaves performance on the table in its core numerical work. On the benchmark datasets the original takes 3.28 min where the AutoZyme path takes 58.45 s (3.37× faster).

How do I make MAST faster?

Install AutoZyme and activate the MAST patch, then keep using MAST exactly as before. AutoZyme transparently substitutes the faster, output-validated path, up to 3.37× faster on the benchmark datasets, with no pipeline or API changes.

Does the AutoZyme speedup change the MAST output?

Effectively no. The output is tolerance-equivalent: held within a frozen concordance gate (up to about 0.6% drift from the original MAST result) on every benchmark dataset.

How do I install the MAST speedup?

In R: install the autozyme package, then run library(autozyme) and autozyme::activate("mast"). The patch applies automatically the next time you call MAST.