R Bulk genomics & enrichment WGCNA

Speed up WGCNA blockwise

WGCNA blockwise is one of the slower steps in many bulk genomics & enrichment workflows. AutoZyme ships a verified, drop-in patch that is up to 66.4× faster, returning bit-for-bit identical results with no change to how you call it.

Best speedup 66.4×
Median speedup 58.4×
Output equivalence Bit-exact
Best runtime baseline 22.96 min optimized 21.24 s
Datasets 5
Pass rate 10/10

Benchmark charts

Switch benchmark platform; all charts update together
Platform
Speedup distribution
Each dot is one finalized dataset/thread run on Windows
log scale
heart_adult_25kx5p5kpbmc200k_glaucoma_20k…pbmc68k_15kx3kgastrulation_mouse_20…pbmc68k_5kx2k
Thread sweep
Speedup across finalized thread counts on Windows
50×100×14full (8)heart_adult_25kx5p5k · ood_xlarge1 threads · 39.1× speedup28.02 min baseline → 42.98 s optimizedmemory 10 GB → 7.0 GBheart_adult_25kx5p5k · ood_xlarge4 threads · 39.8× speedup19.72 min baseline → 29.73 s optimizedmemory 10 GB → 7.0 GBheart_adult_25kx5p5k · ood_xlarge8 threads · 66.4× speedup22.96 min baseline → 21.24 s optimizedmemory 10 GB → 7.0 GBpbmc200k_glaucoma_20kx4k · large1 threads · 30.2× speedup9.76 min baseline → 19.38 s optimizedmemory 6.5 GB → 4.4 GBpbmc200k_glaucoma_20kx4k · large4 threads · 48.1× speedup9.67 min baseline → 12.07 s optimizedmemory 6.4 GB → 4.4 GBpbmc200k_glaucoma_20kx4k · large8 threads · 60.2× speedup11.11 min baseline → 11.06 s optimizedmemory 6.4 GB → 4.4 GBpbmc68k_15kx3k · medium1 threads · 22.4× speedup3.82 min baseline → 10.21 s optimizedmemory 3.6 GB → 2.5 GBpbmc68k_15kx3k · medium4 threads · 32.0× speedup3.81 min baseline → 7.14 s optimizedmemory 3.6 GB → 2.5 GBpbmc68k_15kx3k · medium8 threads · 40.2× speedup4.50 min baseline → 6.72 s optimizedmemory 3.6 GB → 2.5 GBgastrulation_mouse_20kx4k · ood_large1 threads · 19.9× speedup7.33 min baseline → 22.11 s optimizedmemory 6.3 GB → 3.9 GBgastrulation_mouse_20kx4k · ood_large4 threads · 24.4× speedup6.22 min baseline → 15.31 s optimizedmemory 6.1 GB → 3.9 GBgastrulation_mouse_20kx4k · ood_large8 threads · 36.9× speedup9.50 min baseline → 15.30 s optimizedmemory 6.1 GB → 3.9 GBpbmc68k_5kx2k · small1 threads · 12.7× speedup36.17 s baseline → 3.31 s optimizedmemory 1.3 GB → 0.8 GBpbmc68k_5kx2k · small4 threads · 13.2× speedup36.05 s baseline → 2.72 s optimizedmemory 1.3 GB → 0.9 GBpbmc68k_5kx2k · small8 threads · 20.6× speedup55.29 s baseline → 2.69 s optimizedmemory 1.3 GB → 0.9 GB
heart_adult_25kx5p5kpbmc200k_glaucoma_2…pbmc68k_15kx3kgastrulation_mouse_…pbmc68k_5kx2k
Memory
Baseline vs optimized peak memory on Windows
0.0 GB10 GB20 GBheart_adult_25kx5…0.68×pbmc200k_glaucoma…0.67×gastrulation_mous…0.62×pbmc68k_15kx3k0.69×pbmc68k_5kx2k0.66×heart_adult_25kx5p5k · ood_xlargememory 10 GB → 7.0 GBoptimized / baseline 0.68×39.8× speedup · 4 threadspbmc200k_glaucoma_20kx4k · largememory 6.5 GB → 4.4 GBoptimized / baseline 0.67×30.2× speedup · 1 threadsgastrulation_mouse_20kx4k · ood_largememory 6.3 GB → 3.9 GBoptimized / baseline 0.62×19.9× speedup · 1 threadspbmc68k_15kx3k · mediummemory 3.6 GB → 2.5 GBoptimized / baseline 0.69×32.0× speedup · 4 threadspbmc68k_5kx2k · smallmemory 1.3 GB → 0.9 GBoptimized / baseline 0.66×13.2× speedup · 4 threads
baselineoptimized

What is accelerated

The public API stays the same; AutoZyme replaces only the supported fast path.

This task targets WGCNA::blockwiseModules in WGCNA. The benchmarked result preserves the declared scientific output gate while reducing CPU runtime on the listed datasets.

Also searched as: co-expression, gene network, weighted gene co-expression, blockwiseModules, network.

Supported scope

The fast path is correct only for WGCNA's "common case" as explicitly gated in .fast_tom_kernel_dispatch (patch.R lines 359-367): corType="pearson" (CcorType==0), networkType="unsigned" (CnetworkType==0), TOMType="signed" (CTOMType==2), TOMDenom="min"… Read full supported scope

The fast path is correct only for WGCNA's "common case" as explicitly gated in .fast_tom_kernel_dispatch (patch.R lines 359-367): corType="pearson" (CcorType==0), networkType="unsigned" (CnetworkType==0), TOMType="signed" (CTOMType==2), TOMDenom="min" (TOMDenomC==0), no observation weights (weights NULL), cosineCorrelation FALSE, replaceMissingAdjacencies FALSE, suppressTOMForZeroAdjacencies FALSE, suppressNegativeTOM FALSE, useInternalMatrixAlgebra FALSE, and no NAs in the per-block expression submatrix (!anyNA(selExpr)). When ALL those hold the per-block TOM is computed via matrixStats column z-score + BLAS crossprod (Apple Accelerate on macOS, dynamic BLAS on Windows, forked-chunk crossprod on other Unix, direct crossprod fallback) and is claimed bit-perfect vs WGCNA's C kernel. For ANY other combination the dispatch falls through to the original .Call("tomSimilarity_call", PACKAGE="WGCNA"), so non-common-case TOM is handled correctly by upstream. The other three namespace overrides are independently guarded: fast_moduleEigengenes defers to the original when zyme=FALSE and reimplements the upstream eigengene pipeline (irlba truncated SVD, matrixStats row-scale) for arbitrary colors/nPC/align/impute/subHubs; fast_goodSamplesGenes short-circuits to all-TRUE only after verifying no weights, no NAs, and all-finite nonzero column variances, otherwise defers to upstream; fast_collectGarbage is an unconditional no-op. blockwiseModules itself is body-patched (dead scale() skip + TOM .Call redirection) with a guarded fallback to the unmodified original if either string substitution fails to match (e.g. upstream version drift); tested_against WGCNA 1.74. The benchmarked params (power=4, signed TOM, unsigned network, pearson, min denom, no weights, clean HVG matrix) sit squarely inside the common-case gate.

Out-of-scope behavior

silent fallback to upstream

Show detailed speedup table 10 runs
Dataset Tier Platform Threads Baseline Optimized Speedup Memory Concordance Pass
gastrulation_mouse_20kx4k ood_large Windows 8 9.50 min 15.30 s 36.9× 6.1 → 3.9 GB pass
heart_adult_25kx5p5k ood_xlarge Windows 8 22.96 min 21.24 s 66.4× 10.2 → 7.0 GB pass
pbmc200k_glaucoma_20kx4k large Windows 8 11.11 min 11.06 s 60.2× 6.4 → 4.4 GB pass
pbmc68k_15kx3k medium Windows 8 4.50 min 6.72 s 40.2× 3.6 → 2.5 GB pass
pbmc68k_5kx2k small Windows 8 55.29 s 2.69 s 20.6× 1.3 → 0.9 GB pass
gastrulation_mouse_20kx4k ood_large macOS 4 6.68 min 7.13 s 56.6× 10.6 → 7.4 GB pass
heart_adult_25kx5p5k ood_xlarge macOS 4 19.51 min 10.33 s 115.7× 15.1 → 11.2 GB pass
pbmc200k_glaucoma_20kx4k large macOS 8 9.24 min 5.53 s 101.9× 10.2 → 6.7 GB pass
pbmc68k_15kx3k medium macOS 8 3.73 min 3.27 s 69.7× 6.2 → 3.8 GB pass
pbmc68k_5kx2k small macOS 8 39.45 s 1.25 s 31.6× 2.3 → 1.3 GB pass

Frequently asked questions

Speeding up WGCNA blockwise
Why is WGCNA blockwise slow?

WGCNA blockwise is CPU-bound, and the stock implementation in WGCNA leaves performance on the table in its core numerical work. On the benchmark datasets the original takes 22.96 min where the AutoZyme path takes 21.24 s (66.4× faster).

How do I make WGCNA blockwise faster?

Install AutoZyme and activate the WGCNA patch, then keep using WGCNA blockwise exactly as before. AutoZyme transparently substitutes the faster, output-validated path, up to 66.4× faster on the benchmark datasets, with no pipeline or API changes.

Does the AutoZyme speedup change the WGCNA blockwise output?

No. The accelerated path returns bit-for-bit identical results to the original WGCNA implementation (maximum absolute difference 0), checked by a frozen concordance gate on every benchmark dataset.

How do I install the WGCNA speedup?

In R: install the autozyme package, then run library(autozyme) and autozyme::activate("wgcna"). The patch applies automatically the next time you call WGCNA::blockwiseModules.