Speed up fgsea: up to 5.91× faster, near-identical output

Q: Why is fgsea slow?

fgsea is CPU-bound, and the stock implementation in fgsea leaves performance on the table in its core numerical work. On the benchmark datasets the original takes 6.53 min where the AutoZyme path takes 58.12 s (5.91× faster).

Q: How do I make fgsea faster?

Install AutoZyme and activate the fgsea patch, then keep using fgsea exactly as before. AutoZyme transparently substitutes the faster, output-validated path, up to 5.91× faster on the benchmark datasets, with no pipeline or API changes.

Q: Does the AutoZyme speedup change the fgsea output?

Effectively no. The output is tolerance-equivalent: held within a frozen concordance gate (up to about 0.6% drift from the original fgsea result) on every benchmark dataset.

Q: How do I install the fgsea speedup?

In R: install the autozyme package, then run library(autozyme) and autozyme::activate("fgsea"). The patch applies automatically the next time you call fgsea::fgsea.

Benchmark charts

Switch benchmark platform; all charts update together

Speedup distribution

Each dot is one finalized dataset/thread run on Windows

tms_ss2_mouse_42cl_MG…

5.91×

heart_adult_33cl_HGR

5.73×

pbmc200k_glaucoma_24c…

5.05×

pbmc68k_16cl_HGR

4.04×

splitseq_mouse_28cl_M…

3.51×

tms_ss2_mouse_42cl_MG…heart_adult_33cl_HGRpbmc200k_glaucoma_24c…pbmc68k_16cl_HGRsplitseq_mouse_28cl_M…

Thread sweep

Speedup across finalized thread counts on Windows

tms_ss2_mouse_42cl_…heart_adult_33cl_HGRpbmc200k_glaucoma_2…pbmc68k_16cl_HGRsplitseq_mouse_28cl…

Memory

Baseline vs optimized peak memory on Windows

baselineoptimized

What is accelerated

The public API stays the same; AutoZyme replaces only the supported fast path.

This task targets fgsea::fgsea in fgsea. The benchmarked result preserves the declared scientific output gate while reducing CPU runtime on the listed datasets.

Also searched as: GSEA, gene set enrichment, enrichment, pathway analysis.

Supported scope

Fast path covers the default fgsea() dispatch route: fgsea() with no `nperm` argument forwards to fgseaMultilevel, which is the only top-level function this patch overrides (along with the internal helpers preparePathwaysAndStats and calcGseaStat). Read full supported scope

Fast path covers the default fgsea() dispatch route: fgsea() with no `nperm` argument forwards to fgseaMultilevel, which is the only top-level function this patch overrides (along with the internal helpers preparePathwaysAndStats and calcGseaStat). Within fgseaMultilevel the fast path correctly handles: scoreType in {"std","pos","neg"} (branch lifted in both fast_calcGseaStat and the C++ calcEsLeBatchCpp / EsRuler sign handling); arbitrary gseaParam (re-applied as abs(stats)^gseaParam inside preparePathwaysAndStats before the C++ ES kernel, which therefore does not re-apply it); minSize/maxSize filtering (clamped minSize>=1, maxSize<=length(stats)-1); nPermSimple and sampleSize arbitrary (qbeta / multilevelError / trigamma lookup tables keyed and built per (nPermSimple, sampleSize)); eps arbitrary (clamped to [0,1]); the all-stats-zero edge case (NR==0 falls back to R-level fast_calcGseaStat with uniform 1/k increments). Pathway-prep result is cached across clusters and correctly invalidated when names(stats) length/endpoints or the pathways object address/endpoints change. Output is intended bit-exact for ES (no RNG) and seed-deterministic for NES/pval/padj/log2err. Multilevel C++ batch kernel uses per-group RNG seeded from a shared seed so results are dispatch-order-independent.

Out-of-scope behavior

silent fallback to upstream

Show detailed speedup table 10 runs

Dataset	Tier	Platform	Threads	Baseline	Optimized	Speedup	Memory	Concordance	Pass
`heart_adult_33cl_HGR`	large	Windows	8	4.84 min	45.41 s	5.73×	0.5 → 0.6 GB	—	pass
`pbmc200k_glaucoma_24cl_HGR`	medium	Windows	8	2.57 min	25.39 s	5.05×	0.5 → 0.6 GB	—	pass
`pbmc68k_16cl_HGR`	small	Windows	8	1.07 min	12.94 s	4.04×	0.5 → 0.6 GB	—	pass
`splitseq_mouse_28cl_MGR`	ood_large	Windows	8	1.15 min	17.08 s	3.51×	0.5 → 0.6 GB	—	pass
`tms_ss2_mouse_42cl_MGR_ext`	ood_xlarge	Windows	8	6.53 min	58.12 s	5.91×	0.6 → 0.6 GB	—	pass
`heart_adult_33cl_HGR`	large	macOS	4	1.41 min	26.09 s	3.25×	0.5 → 0.5 GB	—	pass
`pbmc200k_glaucoma_24cl_HGR`	medium	macOS	4	43.76 s	14.08 s	3.11×	0.5 → 0.5 GB	—	pass
`pbmc68k_16cl_HGR`	small	macOS	4	17.16 s	6.71 s	2.56×	0.5 → 0.4 GB	—	pass
`splitseq_mouse_28cl_MGR`	ood_large	macOS	4	36.41 s	12.87 s	2.83×	0.4 → 0.4 GB	—	pass
`tms_ss2_mouse_42cl_MGR_ext`	ood_xlarge	macOS	4	1.95 min	36.99 s	3.17×	0.4 → 0.4 GB	—	pass

Frequently asked questions

Speeding up fgsea

Why is fgsea slow?

fgsea is CPU-bound, and the stock implementation in fgsea leaves performance on the table in its core numerical work. On the benchmark datasets the original takes 6.53 min where the AutoZyme path takes 58.12 s (5.91× faster).

How do I make fgsea faster?

Install AutoZyme and activate the fgsea patch, then keep using fgsea exactly as before. AutoZyme transparently substitutes the faster, output-validated path, up to 5.91× faster on the benchmark datasets, with no pipeline or API changes.

Does the AutoZyme speedup change the fgsea output?

Effectively no. The output is tolerance-equivalent: held within a frozen concordance gate (up to about 0.6% drift from the original fgsea result) on every benchmark dataset.

How do I install the fgsea speedup?

In R: install the autozyme package, then run library(autozyme) and autozyme::activate("fgsea"). The patch applies automatically the next time you call fgsea::fgsea.