Speed up maftools: up to 27.0× faster, identical output

Benchmark charts

Switch benchmark platform; all charts update together

Speedup distribution

Each dot is one finalized dataset/thread run on Windows

log scale

laml_ood_xlarge

27.0×

laml_large

21.4×

laml_ood_large

20.5×

laml_medium

15.4×

laml_tiny

11.3×

laml_ood_xlargelaml_largelaml_ood_largelaml_mediumlaml_tiny

Thread sweep

Speedup across finalized thread counts on Windows

laml_ood_xlargelaml_largelaml_ood_largelaml_mediumlaml_tiny

Memory

Baseline vs optimized peak memory on Windows

baselineoptimized

What is accelerated

The public API stays the same; AutoZyme replaces only the supported fast path.

This task targets maftools in maftools. The benchmarked result preserves the declared scientific output gate while reducing CPU runtime on the listed datasets.

Also searched as: mutation, MAF, somatic mutation, oncoplot.

Supported scope

The fast path (fast_read.maf -> fast_validateMaf + fast_summarizeMaf) handles a single MAF input that is EITHER a file path (.maf or .maf.gz, tab-separated, with a Hugo_Symbol header) OR an in-memory data.frame/data.table, when zyme=TRUE AND… Read full supported scope

The fast path (fast_read.maf -> fast_validateMaf + fast_summarizeMaf) handles a single MAF input that is EITHER a file path (.maf or .maf.gz, tab-separated, with a Hugo_Symbol header) OR an in-memory data.frame/data.table, when zyme=TRUE AND gisticAllLesionsFile is NULL AND cnTable is NULL AND isTCGA=FALSE AND useAll=TRUE. Within that envelope it correctly supports: removeDuplicatedVariants TRUE/FALSE (forwarded to validateMaf multi-key duplicated); rmFlags FALSE/TRUE/numeric (FLAG-gene removal at lines 392-402); custom vc_nonSyn (non-synonymous override at 387-391); clinicalData as NULL / data.frame / file path (handled in summarizeMaf 222-237); presence or absence of CNV variants (has_cnv branch 187-199 and Amp/Del + CNV column handling 142-170, exercised by the ood tiers); blank and NA Hugo_Symbol -> 'Unknown' (293-309); single-sample and (via nrow==0 guard) zero-variant inputs. All summary statistics (uniqueN, tabulate, colMeans/median rounded to 3, Rcpp zyme_fill_dcast integer fill) are exact equivalences to upstream; the task reports bit-exact concordance (vps_max_rel_diff=0, all *_match=1) across all 5 tiers x 2 platforms.

Out-of-scope behavior

silent fallback to upstream

Show detailed speedup table 10 runs

Dataset	Tier	Platform	Threads	Baseline	Optimized	Speedup	Memory	Concordance	Pass
`laml_large`	large	Windows	8	11.64 min	33.34 s	21.4×	5.1 → 4.0 GB	—	pass
`laml_medium`	medium	Windows	4	4.73 min	18.24 s	15.4×	2.5 → 2.3 GB	—	pass
`laml_ood_large`	ood_large	Windows	4	9.80 min	28.35 s	20.5×	5.7 → 3.8 GB	—	pass
`laml_ood_xlarge`	ood_xlarge	Windows	8	29.12 min	1.07 min	27.0×	13.4 → 9.4 GB	—	pass
`laml_tiny`	small	Windows	8	2.34 min	12.49 s	11.3×	1.3 → 0.8 GB	—	pass
`laml_large`	large	macOS	4	7.73 min	16.38 s	28.3×	9.7 → 5.8 GB	—	pass
`laml_medium`	medium	macOS	8	2.80 min	7.60 s	23.6×	5.1 → 3.3 GB	—	pass
`laml_ood_large`	ood_large	macOS	4	5.53 min	13.55 s	24.6×	9.5 → 5.8 GB	—	pass
`laml_ood_xlarge`	ood_xlarge	macOS	4	15.99 min	36.88 s	26.9×	19.9 → 12.8 GB	—	pass
`laml_tiny`	small	macOS	8	54.37 s	2.63 s	22.7×	2.8 → 1.2 GB	—	pass

Frequently asked questions

Speeding up maftools

Why is maftools slow?

maftools is CPU-bound, and the stock implementation in maftools leaves performance on the table in its core numerical work. On the benchmark datasets the original takes 29.12 min where the AutoZyme path takes 1.07 min (27.0× faster).

How do I make maftools faster?

Install AutoZyme and activate the maftools patch, then keep using maftools exactly as before. AutoZyme transparently substitutes the faster, output-validated path, up to 27.0× faster on the benchmark datasets, with no pipeline or API changes.

Does the AutoZyme speedup change the maftools output?

No. The accelerated path returns bit-for-bit identical results to the original maftools implementation (maximum absolute difference 0), checked by a frozen concordance gate on every benchmark dataset.

How do I install the maftools speedup?

In R: install the autozyme package, then run library(autozyme) and autozyme::activate("maftools"). The patch applies automatically the next time you call maftools.