Speed up tradeSeq fitGAM: up to 5.13× faster, identical output

Benchmark charts

Switch benchmark platform; all charts update together

Speedup distribution

Each dot is one finalized dataset/thread run on Windows

fitgam_ood_2lin_6000g…

5.13×

fitgam_ood_2lin_3000g…

3.23×

fitgam_synth_4000g_30…

3.00×

fitgam_synth_2000g_30…

2.25×

fitgam_synth_750g_300…

1.42×

fitgam_ood_2lin_6000g…fitgam_ood_2lin_3000g…fitgam_synth_4000g_30…fitgam_synth_2000g_30…fitgam_synth_750g_300…

Thread sweep

Speedup across finalized thread counts on Windows

fitgam_ood_2lin_600…fitgam_ood_2lin_300…fitgam_synth_4000g_…fitgam_synth_2000g_…fitgam_synth_750g_3…

Memory

Baseline vs optimized peak memory on Windows

baselineoptimized

What is accelerated

The public API stays the same; AutoZyme replaces only the supported fast path.

This task targets tradeSeq · fitGAM in tradeSeq. The benchmarked result preserves the declared scientific output gate while reducing CPU runtime on the listed datasets.

Also searched as: trajectory differential expression, pseudotime DE, GAM, fitGAM, trajectory DE.

Supported scope

The fast path (fast_fitGAM_hoist_formula) is entered only when zyme=TRUE AND conditions is NULL. Read full supported scope

The fast path (fast_fitGAM_hoist_formula) is entered only when zyme=TRUE AND conditions is NULL. Within that it correctly handles the standard tradeSeq NB-GAM fitting workload: family="nb" with mgcv's log link, weights=NULL, offset with no dim (vector offset, the .get_offset default), sce=TRUE, aic=FALSE, single-lineage (ncol(pseudotime)==1, quantile-based knot placement with duplicate repair) AND multi-lineage (ncol(pseudotime)>=2, delegates knot placement to tradeSeq::.findKnots). Both single-lineage (dev tiers small/medium/large) and 2-lineage (OOD tiers ood_large/ood_xlarge, which force ncol==2 and the .findKnots branch) are benchmarked and pass bit-exact gates (beta/sigma/X/knot max_abs_diff <=1e-8..1e-12, converged_match=1.0). The hot-path acceleration (prefit_G reuse, hoisted formula template, fork-pool via .zyme_mclapply on macOS / PSOCK on Windows, and the mgcv::nb / gam.fit4 crossprod overrides) activates only when use_prefit_G is TRUE: is.null(weights) && is.null(dim(offset)) && family=="nb" && length(id)>0 (patch.R:335-336), and the fork/parallel compact path additionally requires !verbose && sce && !aic && worker_count>1 (patch.R:416). When use_prefit_G is FALSE it falls through to a serial/pbapply per-gene mgcv::gam() with the same formula. The scBLAS NB kernels (fast_dDeta and scblasR routing in fast_nb) are gated OFF by default via .az_feature_enabled('scblas_nb', default=FALSE); the default path uses the inline Rcpp nb_Dd_cpp/linkinv_log_cpp/nb_dev_resids_cpp kept bit-exact. The mgcv gam.fit4 crossprod rewrite is a textual substitution pinned to mgcv 1.9.4 (sentinel warns if patterns miss).

Out-of-scope behavior

silent fallback to upstream

Show detailed speedup table 10 runs

Dataset	Tier	Platform	Threads	Baseline	Optimized	Speedup	Memory	Concordance	Pass
`fitgam_ood_2lin_3000g_3500c`	ood_large	Windows	8	5.31 min	1.64 min	3.23×	3.3 → 1.6 GB	—	pass
`fitgam_ood_2lin_6000g_5000c`	ood_xlarge	Windows	8	16.44 min	3.20 min	5.13×	7.0 → 2.8 GB	—	pass
`fitgam_synth_2000g_3000c`	medium	Windows	4	2.32 min	1.07 min	2.25×	2.0 → 1.2 GB	—	pass
`fitgam_synth_4000g_3000c`	large	Windows	4	4.63 min	1.56 min	3.00×	2.7 → 1.6 GB	—	pass
`fitgam_synth_750g_3000c`	small	Windows	4	59.68 s	41.78 s	1.42×	1.2 → 1.0 GB	—	pass
`fitgam_ood_2lin_3000g_3500c`	ood_large	macOS	8	3.48 min	29.67 s	7.04×	5.1 → 1.5 GB	—	pass
`fitgam_ood_2lin_6000g_5000c`	ood_xlarge	macOS	8	8.70 min	1.37 min	6.41×	13.0 → 2.4 GB	—	pass
`fitgam_synth_2000g_3000c`	medium	macOS	8	59.92 s	10.66 s	5.63×	2.8 → 1.2 GB	—	pass
`fitgam_synth_4000g_3000c`	large	macOS	8	2.13 min	20.20 s	6.32×	4.7 → 1.4 GB	—	pass
`fitgam_synth_750g_3000c`	small	macOS	8	23.02 s	4.57 s	5.18×	1.7 → 1.1 GB	—	pass

Frequently asked questions

Speeding up tradeSeq fitGAM

Why is tradeSeq fitGAM slow?

tradeSeq fitGAM is CPU-bound, and the stock implementation in tradeSeq leaves performance on the table in its core numerical work. On the benchmark datasets the original takes 16.44 min where the AutoZyme path takes 3.20 min (5.13× faster).

How do I make tradeSeq fitGAM faster?

Install AutoZyme and activate the tradeSeq patch, then keep using tradeSeq fitGAM exactly as before. AutoZyme transparently substitutes the faster, output-validated path, up to 5.13× faster on the benchmark datasets, with no pipeline or API changes.

Does the AutoZyme speedup change the tradeSeq fitGAM output?

No. The accelerated path returns bit-for-bit identical results to the original tradeSeq implementation (maximum absolute difference 0), checked by a frozen concordance gate on every benchmark dataset.

How do I install the tradeSeq speedup?

In R: install the autozyme package, then run library(autozyme) and autozyme::activate("tradeseq"). The patch applies automatically the next time you call tradeSeq fitGAM.