tradeSeq fitGAM is one of the slower steps in many single-cell genomics workflows. AutoZyme ships a
verified, drop-in patch that is up to 5.13× faster, returning bit-for-bit identical results with no change to how you call it.
Best speedup5.13×
Median speedup5.15×
Output equivalenceBit-exact
Best runtime baseline 16.44 min → optimized 3.20 min
Datasets5
Pass rate10/10
Benchmark charts
Switch benchmark platform; all charts update together
Platform
Speedup distribution
Each dot is one finalized dataset/thread run on Windows
fitgam_ood_2lin_6000g…
fitgam_ood_2lin_6000g_5000c · ood_xlarge1 threads · 1.35× speedup16.44 min baseline → 12.12 min optimizedmemory 8.2 GB → 4.0 GBfitgam_ood_2lin_6000g_5000c · ood_xlarge4 threads · 4.00× speedup15.35 min baseline → 4.11 min optimizedmemory 5.8 GB → 2.8 GBfitgam_ood_2lin_6000g_5000c · ood_xlarge8 threads · 5.13× speedup16.44 min baseline → 3.20 min optimizedmemory 7.0 GB → 2.8 GB
5.13×
fitgam_ood_2lin_3000g…
fitgam_ood_2lin_3000g_3500c · ood_large4 threads · 3.14× speedup5.31 min baseline → 1.69 min optimizedmemory 3.3 GB → 1.2 GBfitgam_ood_2lin_3000g_3500c · ood_large8 threads · 3.23× speedup5.31 min baseline → 1.64 min optimizedmemory 3.3 GB → 1.6 GB
3.23×
fitgam_synth_4000g_30…
fitgam_synth_4000g_3000c · large1 threads · 1.33× speedup4.84 min baseline → 3.52 min optimizedmemory 3.2 GB → 2.1 GBfitgam_synth_4000g_3000c · large4 threads · 3.00× speedup4.63 min baseline → 1.56 min optimizedmemory 2.7 GB → 1.6 GBfitgam_synth_4000g_3000c · large8 threads · 2.87× speedup4.68 min baseline → 1.63 min optimizedmemory 3.0 GB → 1.6 GB
3.00×
fitgam_synth_2000g_30…
fitgam_synth_2000g_3000c · medium1 threads · 1.14× speedup2.41 min baseline → 2.11 min optimizedmemory 2.0 GB → 1.5 GBfitgam_synth_2000g_3000c · medium4 threads · 2.25× speedup2.32 min baseline → 1.07 min optimizedmemory 2.0 GB → 1.2 GBfitgam_synth_2000g_3000c · medium8 threads · 1.52× speedup2.58 min baseline → 1.59 min optimizedmemory 2.0 GB → 1.2 GB
2.25×
fitgam_synth_750g_300…
fitgam_synth_750g_3000c · small1 threads · 1.11× speedup59.69 s baseline → 53.60 s optimizedmemory 1.2 GB → 1.1 GBfitgam_synth_750g_3000c · small4 threads · 1.42× speedup59.68 s baseline → 41.78 s optimizedmemory 1.2 GB → 1.0 GBfitgam_synth_750g_3000c · small8 threads · 1.32× speedup57.13 s baseline → 44.94 s optimizedmemory 1.2 GB → 1.0 GB
The public API stays the same; AutoZyme replaces only the supported fast path.
This task targets tradeSeq · fitGAM in tradeSeq. The benchmarked result
preserves the declared scientific output gate while reducing CPU runtime on the listed datasets.
Also searched as: trajectory differential expression, pseudotime DE, GAM, fitGAM, trajectory DE.
Supported scope
The fast path (fast_fitGAM_hoist_formula) is entered only when zyme=TRUE AND conditions is NULL.Read full supported scope
The fast path (fast_fitGAM_hoist_formula) is entered only when zyme=TRUE AND conditions is NULL. Within that it correctly handles the standard tradeSeq NB-GAM fitting workload: family="nb" with mgcv's log link, weights=NULL, offset with no dim (vector offset, the .get_offset default), sce=TRUE, aic=FALSE, single-lineage (ncol(pseudotime)==1, quantile-based knot placement with duplicate repair) AND multi-lineage (ncol(pseudotime)>=2, delegates knot placement to tradeSeq::.findKnots). Both single-lineage (dev tiers small/medium/large) and 2-lineage (OOD tiers ood_large/ood_xlarge, which force ncol==2 and the .findKnots branch) are benchmarked and pass bit-exact gates (beta/sigma/X/knot max_abs_diff <=1e-8..1e-12, converged_match=1.0). The hot-path acceleration (prefit_G reuse, hoisted formula template, fork-pool via .zyme_mclapply on macOS / PSOCK on Windows, and the mgcv::nb / gam.fit4 crossprod overrides) activates only when use_prefit_G is TRUE: is.null(weights) && is.null(dim(offset)) && family=="nb" && length(id)>0 (patch.R:335-336), and the fork/parallel compact path additionally requires !verbose && sce && !aic && worker_count>1 (patch.R:416). When use_prefit_G is FALSE it falls through to a serial/pbapply per-gene mgcv::gam() with the same formula. The scBLAS NB kernels (fast_dDeta and scblasR routing in fast_nb) are gated OFF by default via .az_feature_enabled('scblas_nb', default=FALSE); the default path uses the inline Rcpp nb_Dd_cpp/linkinv_log_cpp/nb_dev_resids_cpp kept bit-exact. The mgcv gam.fit4 crossprod rewrite is a textual substitution pinned to mgcv 1.9.4 (sentinel warns if patterns miss).
Out-of-scope behavior
silent fallback to upstream
Show detailed speedup table10 runs▾
Dataset
Tier
Platform
Threads
Baseline
Optimized
Speedup
Memory
Concordance
Pass
fitgam_ood_2lin_3000g_3500c
ood_large
Windows
8
5.31 min
1.64 min
3.23×
3.3 → 1.6 GB
—
pass
fitgam_ood_2lin_6000g_5000c
ood_xlarge
Windows
8
16.44 min
3.20 min
5.13×
7.0 → 2.8 GB
—
pass
fitgam_synth_2000g_3000c
medium
Windows
4
2.32 min
1.07 min
2.25×
2.0 → 1.2 GB
—
pass
fitgam_synth_4000g_3000c
large
Windows
4
4.63 min
1.56 min
3.00×
2.7 → 1.6 GB
—
pass
fitgam_synth_750g_3000c
small
Windows
4
59.68 s
41.78 s
1.42×
1.2 → 1.0 GB
—
pass
fitgam_ood_2lin_3000g_3500c
ood_large
macOS
8
3.48 min
29.67 s
7.04×
5.1 → 1.5 GB
—
pass
fitgam_ood_2lin_6000g_5000c
ood_xlarge
macOS
8
8.70 min
1.37 min
6.41×
13.0 → 2.4 GB
—
pass
fitgam_synth_2000g_3000c
medium
macOS
8
59.92 s
10.66 s
5.63×
2.8 → 1.2 GB
—
pass
fitgam_synth_4000g_3000c
large
macOS
8
2.13 min
20.20 s
6.32×
4.7 → 1.4 GB
—
pass
fitgam_synth_750g_3000c
small
macOS
8
23.02 s
4.57 s
5.18×
1.7 → 1.1 GB
—
pass
Frequently asked questions
Speeding up tradeSeq fitGAM
Why is tradeSeq fitGAM slow?
tradeSeq fitGAM is CPU-bound, and the stock implementation in tradeSeq leaves performance on the table in its core numerical work. On the benchmark datasets the original takes 16.44 min where the AutoZyme path takes 3.20 min (5.13× faster).
How do I make tradeSeq fitGAM faster?
Install AutoZyme and activate the tradeSeq patch, then keep using tradeSeq fitGAM exactly as before. AutoZyme transparently substitutes the faster, output-validated path, up to 5.13× faster on the benchmark datasets, with no pipeline or API changes.
Does the AutoZyme speedup change the tradeSeq fitGAM output?
No. The accelerated path returns bit-for-bit identical results to the original tradeSeq implementation (maximum absolute difference 0), checked by a frozen concordance gate on every benchmark dataset.
How do I install the tradeSeq speedup?
In R: install the autozyme package, then run library(autozyme) and autozyme::activate("tradeseq"). The patch applies automatically the next time you call tradeSeq fitGAM.