R Bulk genomics & enrichment fgsea

Speed up fgsea

fgsea is one of the slower steps in many bulk genomics & enrichment workflows. AutoZyme ships a verified, drop-in patch that is up to 5.91× faster, returning output within a strict, verified tolerance with no change to how you call it.

Best speedup 5.91×
Median speedup 3.38×
Output equivalence Tolerance
Best runtime baseline 6.53 min optimized 58.12 s
Datasets 5
Pass rate 10/10

Benchmark charts

Switch benchmark platform; all charts update together
Platform
Speedup distribution
Each dot is one finalized dataset/thread run on Windows
tms_ss2_mouse_42cl_MG…heart_adult_33cl_HGRpbmc200k_glaucoma_24c…pbmc68k_16cl_HGRsplitseq_mouse_28cl_M…
Thread sweep
Speedup across finalized thread counts on Windows
10×14full (8)tms_ss2_mouse_42cl_MGR_ext · ood_xlarge1 threads · 1.30× speedup5.73 min baseline → 4.40 min optimizedmemory 0.6 GB → 0.5 GBtms_ss2_mouse_42cl_MGR_ext · ood_xlarge4 threads · 3.96× speedup5.73 min baseline → 1.45 min optimizedmemory 0.6 GB → 0.6 GBtms_ss2_mouse_42cl_MGR_ext · ood_xlarge8 threads · 5.91× speedup6.53 min baseline → 58.12 s optimizedmemory 0.6 GB → 0.6 GBheart_adult_33cl_HGR · large1 threads · 1.32× speedup4.33 min baseline → 3.27 min optimizedmemory 0.5 GB → 0.5 GBheart_adult_33cl_HGR · large4 threads · 3.92× speedup4.33 min baseline → 1.11 min optimizedmemory 0.6 GB → 0.6 GBheart_adult_33cl_HGR · large8 threads · 5.73× speedup4.84 min baseline → 45.41 s optimizedmemory 0.5 GB → 0.6 GBpbmc200k_glaucoma_24cl_HGR · medium1 threads · 1.34× speedup2.13 min baseline → 1.59 min optimizedmemory 0.5 GB → 0.5 GBpbmc200k_glaucoma_24cl_HGR · medium4 threads · 3.68× speedup2.13 min baseline → 34.86 s optimizedmemory 0.5 GB → 0.5 GBpbmc200k_glaucoma_24cl_HGR · medium8 threads · 5.05× speedup2.57 min baseline → 25.39 s optimizedmemory 0.5 GB → 0.6 GBpbmc68k_16cl_HGR · small1 threads · 1.41× speedup52.15 s baseline → 37.20 s optimizedmemory 0.5 GB → 0.5 GBpbmc68k_16cl_HGR · small4 threads · 3.21× speedup52.21 s baseline → 16.28 s optimizedmemory 0.5 GB → 0.5 GBpbmc68k_16cl_HGR · small8 threads · 4.04× speedup1.07 min baseline → 12.94 s optimizedmemory 0.5 GB → 0.6 GBsplitseq_mouse_28cl_MGR · ood_large1 threads · 1.37× speedup1.00 min baseline → 43.78 s optimizedmemory 0.5 GB → 0.5 GBsplitseq_mouse_28cl_MGR · ood_large4 threads · 2.92× speedup59.76 s baseline → 20.58 s optimizedmemory 0.5 GB → 0.5 GBsplitseq_mouse_28cl_MGR · ood_large8 threads · 3.51× speedup1.15 min baseline → 17.08 s optimizedmemory 0.5 GB → 0.6 GB
tms_ss2_mouse_42cl_…heart_adult_33cl_HGRpbmc200k_glaucoma_2…pbmc68k_16cl_HGRsplitseq_mouse_28cl…
Memory
Baseline vs optimized peak memory on Windows
0.0 GB1.0 GB2.0 GBtms_ss2_mouse_42c…1.12×heart_adult_33cl_…1.04×pbmc200k_glaucoma…1.10×pbmc68k_16cl_HGR0.90×splitseq_mouse_28…1.11×tms_ss2_mouse_42cl_MGR_ext · ood_xlargememory 0.6 GB → 0.6 GBoptimized / baseline 1.12×5.91× speedup · 8 threadsheart_adult_33cl_HGR · largememory 0.6 GB → 0.6 GBoptimized / baseline 1.04×3.92× speedup · 4 threadspbmc200k_glaucoma_24cl_HGR · mediummemory 0.5 GB → 0.6 GBoptimized / baseline 1.10×5.05× speedup · 8 threadspbmc68k_16cl_HGR · smallmemory 0.5 GB → 0.5 GBoptimized / baseline 0.90×1.41× speedup · 1 threadssplitseq_mouse_28cl_MGR · ood_largememory 0.5 GB → 0.6 GBoptimized / baseline 1.11×3.51× speedup · 8 threads
baselineoptimized

What is accelerated

The public API stays the same; AutoZyme replaces only the supported fast path.

This task targets fgsea::fgsea in fgsea. The benchmarked result preserves the declared scientific output gate while reducing CPU runtime on the listed datasets.

Also searched as: GSEA, gene set enrichment, enrichment, pathway analysis.

Supported scope

Fast path covers the default fgsea() dispatch route: fgsea() with no `nperm` argument forwards to fgseaMultilevel, which is the only top-level function this patch overrides (along with the internal helpers preparePathwaysAndStats and calcGseaStat). Read full supported scope

Fast path covers the default fgsea() dispatch route: fgsea() with no `nperm` argument forwards to fgseaMultilevel, which is the only top-level function this patch overrides (along with the internal helpers preparePathwaysAndStats and calcGseaStat). Within fgseaMultilevel the fast path correctly handles: scoreType in {"std","pos","neg"} (branch lifted in both fast_calcGseaStat and the C++ calcEsLeBatchCpp / EsRuler sign handling); arbitrary gseaParam (re-applied as abs(stats)^gseaParam inside preparePathwaysAndStats before the C++ ES kernel, which therefore does not re-apply it); minSize/maxSize filtering (clamped minSize>=1, maxSize<=length(stats)-1); nPermSimple and sampleSize arbitrary (qbeta / multilevelError / trigamma lookup tables keyed and built per (nPermSimple, sampleSize)); eps arbitrary (clamped to [0,1]); the all-stats-zero edge case (NR==0 falls back to R-level fast_calcGseaStat with uniform 1/k increments). Pathway-prep result is cached across clusters and correctly invalidated when names(stats) length/endpoints or the pathways object address/endpoints change. Output is intended bit-exact for ES (no RNG) and seed-deterministic for NES/pval/padj/log2err. Multilevel C++ batch kernel uses per-group RNG seeded from a shared seed so results are dispatch-order-independent.

Out-of-scope behavior

silent fallback to upstream

Show detailed speedup table 10 runs
Dataset Tier Platform Threads Baseline Optimized Speedup Memory Concordance Pass
heart_adult_33cl_HGR large Windows 8 4.84 min 45.41 s 5.73× 0.5 → 0.6 GB pass
pbmc200k_glaucoma_24cl_HGR medium Windows 8 2.57 min 25.39 s 5.05× 0.5 → 0.6 GB pass
pbmc68k_16cl_HGR small Windows 8 1.07 min 12.94 s 4.04× 0.5 → 0.6 GB pass
splitseq_mouse_28cl_MGR ood_large Windows 8 1.15 min 17.08 s 3.51× 0.5 → 0.6 GB pass
tms_ss2_mouse_42cl_MGR_ext ood_xlarge Windows 8 6.53 min 58.12 s 5.91× 0.6 → 0.6 GB pass
heart_adult_33cl_HGR large macOS 4 1.41 min 26.09 s 3.25× 0.5 → 0.5 GB pass
pbmc200k_glaucoma_24cl_HGR medium macOS 4 43.76 s 14.08 s 3.11× 0.5 → 0.5 GB pass
pbmc68k_16cl_HGR small macOS 4 17.16 s 6.71 s 2.56× 0.5 → 0.4 GB pass
splitseq_mouse_28cl_MGR ood_large macOS 4 36.41 s 12.87 s 2.83× 0.4 → 0.4 GB pass
tms_ss2_mouse_42cl_MGR_ext ood_xlarge macOS 4 1.95 min 36.99 s 3.17× 0.4 → 0.4 GB pass

Frequently asked questions

Speeding up fgsea
Why is fgsea slow?

fgsea is CPU-bound, and the stock implementation in fgsea leaves performance on the table in its core numerical work. On the benchmark datasets the original takes 6.53 min where the AutoZyme path takes 58.12 s (5.91× faster).

How do I make fgsea faster?

Install AutoZyme and activate the fgsea patch, then keep using fgsea exactly as before. AutoZyme transparently substitutes the faster, output-validated path, up to 5.91× faster on the benchmark datasets, with no pipeline or API changes.

Does the AutoZyme speedup change the fgsea output?

Effectively no. The output is tolerance-equivalent: held within a frozen concordance gate (up to about 0.6% drift from the original fgsea result) on every benchmark dataset.

How do I install the fgsea speedup?

In R: install the autozyme package, then run library(autozyme) and autozyme::activate("fgsea"). The patch applies automatically the next time you call fgsea::fgsea.