Benchmark charts
Speedup distribution
Each dot is one finalized dataset/thread run on WindowsThread sweep
Speedup across finalized thread counts on WindowsMemory
Baseline vs optimized peak memory on WindowsWhat is accelerated
This task targets WGCNA::blockwiseModules in WGCNA. The benchmarked result
preserves the declared scientific output gate while reducing CPU runtime on the listed datasets.
Also searched as: co-expression, gene network, weighted gene co-expression, blockwiseModules, network.
Supported scope
The fast path is correct only for WGCNA's "common case" as explicitly gated in .fast_tom_kernel_dispatch (patch.R lines 359-367): corType="pearson" (CcorType==0), networkType="unsigned" (CnetworkType==0), TOMType="signed" (CTOMType==2), TOMDenom="min"… Read full supported scope
The fast path is correct only for WGCNA's "common case" as explicitly gated in .fast_tom_kernel_dispatch (patch.R lines 359-367): corType="pearson" (CcorType==0), networkType="unsigned" (CnetworkType==0), TOMType="signed" (CTOMType==2), TOMDenom="min" (TOMDenomC==0), no observation weights (weights NULL), cosineCorrelation FALSE, replaceMissingAdjacencies FALSE, suppressTOMForZeroAdjacencies FALSE, suppressNegativeTOM FALSE, useInternalMatrixAlgebra FALSE, and no NAs in the per-block expression submatrix (!anyNA(selExpr)). When ALL those hold the per-block TOM is computed via matrixStats column z-score + BLAS crossprod (Apple Accelerate on macOS, dynamic BLAS on Windows, forked-chunk crossprod on other Unix, direct crossprod fallback) and is claimed bit-perfect vs WGCNA's C kernel. For ANY other combination the dispatch falls through to the original .Call("tomSimilarity_call", PACKAGE="WGCNA"), so non-common-case TOM is handled correctly by upstream. The other three namespace overrides are independently guarded: fast_moduleEigengenes defers to the original when zyme=FALSE and reimplements the upstream eigengene pipeline (irlba truncated SVD, matrixStats row-scale) for arbitrary colors/nPC/align/impute/subHubs; fast_goodSamplesGenes short-circuits to all-TRUE only after verifying no weights, no NAs, and all-finite nonzero column variances, otherwise defers to upstream; fast_collectGarbage is an unconditional no-op. blockwiseModules itself is body-patched (dead scale() skip + TOM .Call redirection) with a guarded fallback to the unmodified original if either string substitution fails to match (e.g. upstream version drift); tested_against WGCNA 1.74. The benchmarked params (power=4, signed TOM, unsigned network, pearson, min denom, no weights, clean HVG matrix) sit squarely inside the common-case gate.
Out-of-scope behavior
silent fallback to upstream
Show detailed speedup table 10 runs
| Dataset | Tier | Platform | Threads | Baseline | Optimized | Speedup | Memory | Concordance | Pass |
|---|---|---|---|---|---|---|---|---|---|
gastrulation_mouse_20kx4k | ood_large | Windows | 8 | 9.50 min | 15.30 s | 36.9× | 6.1 → 3.9 GB | — | pass |
heart_adult_25kx5p5k | ood_xlarge | Windows | 8 | 22.96 min | 21.24 s | 66.4× | 10.2 → 7.0 GB | — | pass |
pbmc200k_glaucoma_20kx4k | large | Windows | 8 | 11.11 min | 11.06 s | 60.2× | 6.4 → 4.4 GB | — | pass |
pbmc68k_15kx3k | medium | Windows | 8 | 4.50 min | 6.72 s | 40.2× | 3.6 → 2.5 GB | — | pass |
pbmc68k_5kx2k | small | Windows | 8 | 55.29 s | 2.69 s | 20.6× | 1.3 → 0.9 GB | — | pass |
gastrulation_mouse_20kx4k | ood_large | macOS | 4 | 6.68 min | 7.13 s | 56.6× | 10.6 → 7.4 GB | — | pass |
heart_adult_25kx5p5k | ood_xlarge | macOS | 4 | 19.51 min | 10.33 s | 115.7× | 15.1 → 11.2 GB | — | pass |
pbmc200k_glaucoma_20kx4k | large | macOS | 8 | 9.24 min | 5.53 s | 101.9× | 10.2 → 6.7 GB | — | pass |
pbmc68k_15kx3k | medium | macOS | 8 | 3.73 min | 3.27 s | 69.7× | 6.2 → 3.8 GB | — | pass |
pbmc68k_5kx2k | small | macOS | 8 | 39.45 s | 1.25 s | 31.6× | 2.3 → 1.3 GB | — | pass |
Frequently asked questions
Why is WGCNA blockwise slow?
WGCNA blockwise is CPU-bound, and the stock implementation in WGCNA leaves performance on the table in its core numerical work. On the benchmark datasets the original takes 22.96 min where the AutoZyme path takes 21.24 s (66.4× faster).
How do I make WGCNA blockwise faster?
Install AutoZyme and activate the WGCNA patch, then keep using WGCNA blockwise exactly as before. AutoZyme transparently substitutes the faster, output-validated path, up to 66.4× faster on the benchmark datasets, with no pipeline or API changes.
Does the AutoZyme speedup change the WGCNA blockwise output?
No. The accelerated path returns bit-for-bit identical results to the original WGCNA implementation (maximum absolute difference 0), checked by a frozen concordance gate on every benchmark dataset.
How do I install the WGCNA speedup?
In R: install the autozyme package, then run library(autozyme) and autozyme::activate("wgcna"). The patch applies automatically the next time you call WGCNA::blockwiseModules.