NicheNet is one of the slower steps in many single-cell genomics workflows. AutoZyme ships a
verified, drop-in patch that is up to 1,482× faster, returning bit-for-bit identical results with no change to how you call it.
Best speedup1,482×
Median speedup1,187×
Output equivalenceBit-exact
Best runtime baseline 2.08 min → optimized 94 ms
Datasets5
Pass rate10/10
Benchmark charts
Switch benchmark platform; all charts update together
Platform
Speedup distribution
Each dot is one finalized dataset/thread run on Windows
log scale
lcmv_mouse_full
lcmv_mouse_full · small1 threads · 224.7× speedup2.07 min baseline → 630 ms optimizedmemory 1.0 GB → 0.8 GBlcmv_mouse_full · small4 threads · 645.2× speedup1.81 min baseline → 166 ms optimizedmemory 1.0 GB → 0.8 GBlcmv_mouse_full · small8 threads · 1,482× speedup2.08 min baseline → 94 ms optimizedmemory 1.0 GB → 0.8 GB
1,482×
ifnb_human_x2
ifnb_human_x2 · medium1 threads · 189.5× speedup6.08 min baseline → 2.06 s optimizedmemory 1.7 GB → 1.2 GBifnb_human_x2 · medium4 threads · 575.4× speedup5.02 min baseline → 519 ms optimizedmemory 1.7 GB → 1.2 GBifnb_human_x2 · medium8 threads · 1,293× speedup5.70 min baseline → 285 ms optimizedmemory 1.7 GB → 1.2 GB
1,293×
ifnb_human_x4
ifnb_human_x4 · large1 threads · 164.2× speedup10.91 min baseline → 4.06 s optimizedmemory 2.8 GB → 1.8 GBifnb_human_x4 · large4 threads · 577.0× speedup11.36 min baseline → 1.18 s optimizedmemory 2.8 GB → 1.8 GBifnb_human_x4 · large8 threads · 1,250× speedup10.84 min baseline → 549 ms optimizedmemory 2.8 GB → 1.8 GB
1,250×
tms_spleen_BvsT_x6
tms_spleen_BvsT_x6 · ood_large1 threads · 121.0× speedup11.89 min baseline → 6.04 s optimizedmemory 2.9 GB → 1.9 GBtms_spleen_BvsT_x6 · ood_large4 threads · 520.9× speedup13.26 min baseline → 1.54 s optimizedmemory 2.5 GB → 1.9 GBtms_spleen_BvsT_x6 · ood_large8 threads · 972.7× speedup13.78 min baseline → 832 ms optimizedmemory 2.5 GB → 1.9 GB
972.7×
gbm_malig_vs_macro_x10
gbm_malig_vs_macro_x10 · ood_xlarge1 threads · 109.4× speedup27.72 min baseline → 15.35 s optimizedmemory 5.1 GB → 3.6 GBgbm_malig_vs_macro_x10 · ood_xlarge4 threads · 429.2× speedup29.51 min baseline → 4.13 s optimizedmemory 5.1 GB → 3.6 GBgbm_malig_vs_macro_x10 · ood_xlarge8 threads · 737.0× speedup27.37 min baseline → 2.30 s optimizedmemory 5.1 GB → 3.6 GB
The public API stays the same; AutoZyme replaces only the supported fast path.
This task targets NicheNet in nichenetr. The benchmarked result
preserves the declared scientific output gate while reducing CPU runtime on the listed datasets.
Also searched as: cell-cell communication, ligand-receptor, ligand activity, CCC.
Supported scope
Fast path is taken only when zyme=TRUE (default) AND single=TRUE (default), i.e. the per-ligand scoring mode that is nichenetr's common/default use case.Read full supported scope
Fast path is taken only when zyme=TRUE (default) AND single=TRUE (default), i.e. the per-ligand scoring mode that is nichenetr's common/default use case. It correctly computes the four output metrics (auroc, aupr, aupr_corrected, pearson) for any potential_ligands all present in colnames(ligand_target_matrix), any geneset/background with at least one gene matching rownames(ligand_target_matrix), for a DENSE base R numeric matrix ligand_target_matrix. Dispatch by ligand count: n_ligands >= 64 -> parallel C++ kernel score_ligands_cpp (threads = min(getOption('autozyme.threads',14), n_ligands, physical cores)); n_ligands < 64 -> vectorized R fallback using .nichenetr_fast_score_selected_metrics (bit-exact to upstream caTools::trapz formulation). Within the C++ kernel, n_pos<=2048 uses the binary-search LigandScoreBinaryWorker, n_pos>2048 uses the heap-vector LigandScoreWorker; both produce the same metrics. The benchmark exercises this exact path: all dev/OOD tiers call with the four data args only (single defaults to upstream TRUE), so the benchmarked_call is pure upstream defaults except for data, and bit-exact agreement is reported (max_abs_diff_aupr_corrected=0, all correlations=1).
Out-of-scope behavior
errors
Show detailed speedup table10 runs▾
Dataset
Tier
Platform
Threads
Baseline
Optimized
Speedup
Memory
Concordance
Pass
gbm_malig_vs_macro_x10
ood_xlarge
Windows
8
27.37 min
2.30 s
737.0×
5.1 → 3.6 GB
—
pass
ifnb_human_x2
medium
Windows
8
5.70 min
285 ms
1,293×
1.7 → 1.2 GB
—
pass
ifnb_human_x4
large
Windows
8
10.84 min
549 ms
1,250×
2.8 → 1.8 GB
—
pass
lcmv_mouse_full
small
Windows
8
2.08 min
94 ms
1,482×
1.0 → 0.8 GB
—
pass
tms_spleen_BvsT_x6
ood_large
Windows
8
13.78 min
832 ms
972.7×
2.5 → 1.9 GB
—
pass
gbm_malig_vs_macro_x10
ood_xlarge
macOS
4
15.63 min
1.26 s
752.7×
6.5 → 3.7 GB
—
pass
ifnb_human_x2
medium
macOS
8
3.23 min
247 ms
773.7×
3.4 → 1.2 GB
—
pass
ifnb_human_x4
large
macOS
1
6.34 min
329 ms
1,160×
4.2 → 1.8 GB
—
pass
lcmv_mouse_full
small
macOS
4
1.14 min
49 ms
1,423×
2.2 → 0.8 GB
—
pass
tms_spleen_BvsT_x6
ood_large
macOS
4
6.90 min
343 ms
1,214×
3.8 → 1.9 GB
—
pass
Frequently asked questions
Speeding up NicheNet
Why is NicheNet slow?
NicheNet is CPU-bound, and the stock implementation in nichenetr leaves performance on the table in its core numerical work. On the benchmark datasets the original takes 2.08 min where the AutoZyme path takes 94 ms (1,482× faster).
How do I make NicheNet faster?
Install AutoZyme and activate the nichenetr patch, then keep using NicheNet exactly as before. AutoZyme transparently substitutes the faster, output-validated path, up to 1,482× faster on the benchmark datasets, with no pipeline or API changes.
Does the AutoZyme speedup change the NicheNet output?
No. The accelerated path returns bit-for-bit identical results to the original nichenetr implementation (maximum absolute difference 0), checked by a frozen concordance gate on every benchmark dataset.
How do I install the nichenetr speedup?
In R: install the autozyme package, then run library(autozyme) and autozyme::activate("nichenetr"). The patch applies automatically the next time you call NicheNet.