Python Cell-cell communication CellphoneDB

Speed up CellPhoneDB v5

CellPhoneDB v5 is one of the slower steps in many single-cell genomics workflows. AutoZyme ships a verified, drop-in patch that is up to 17.1× faster, returning output within a strict, verified tolerance with no change to how you call it.

Best speedup 17.1×
Median speedup 20.8×
Output equivalence Tolerance
Best runtime baseline 59.77 s optimized 3.49 s
Datasets 5
Pass rate 10/10

Benchmark charts

Switch benchmark platform; all charts update together
Platform
Speedup distribution
Each dot is one finalized dataset/thread run on Windows
log scale
cpdb_glaucoma_t12_c12…cpdb_glaucoma_t18_c15…cpdb_glaucoma_t22_c20…cpdb_heart_t20_c1800cpdb_heart_t22_c2200
Thread sweep
Speedup across finalized thread counts on Windows
No finalized multi-thread sweep for this platform.
Memory
Baseline vs optimized peak memory on Windows
0.0 GB2.5 GB5.0 GBcpdb_heart_t22_c2…0.96×cpdb_heart_t20_c1…0.92×cpdb_glaucoma_t22…0.92×cpdb_glaucoma_t18…0.83×cpdb_glaucoma_t12…0.85×cpdb_heart_t22_c2200 · ood_xlargememory 2.0 GB → 2.0 GBoptimized / baseline 0.96×9.30× speedup · 1 threadscpdb_heart_t20_c1800 · ood_largememory 1.7 GB → 1.5 GBoptimized / baseline 0.92×9.31× speedup · 1 threadscpdb_glaucoma_t22_c2000 · largememory 1.4 GB → 1.3 GBoptimized / baseline 0.92×9.83× speedup · 1 threadscpdb_glaucoma_t18_c1500 · mediummemory 1.1 GB → 0.9 GBoptimized / baseline 0.83×13.5× speedup · 1 threadscpdb_glaucoma_t12_c1200 · smallmemory 0.7 GB → 0.6 GBoptimized / baseline 0.85×17.1× speedup · 1 threads
baselineoptimized

What is accelerated

The public API stays the same; AutoZyme replaces only the supported fast path.

This task targets cellphonedb.src.core.methods.cpdb_statistical_analysis_method::call in CellphoneDB. The benchmarked result preserves the declared scientific output gate while reducing CPU runtime on the listed datasets.

Also searched as: cpdb, cell-cell communication, ligand-receptor, CCC.

Supported scope

Correct for the statistical analysis method run with counts_data in {ensembl, gene_name, hgnc_symbol} (column-name driven), upstream-default-style args: any threshold (flows into fast_percent_analysis and fast_build_clusters), any separator, any iterations… Read full supported scope

Correct for the statistical analysis method run with counts_data in {ensembl, gene_name, hgnc_symbol} (column-name driven), upstream-default-style args: any threshold (flows into fast_percent_analysis and fast_build_clusters), any separator, any iterations (>= ~50 for the batched matmul to pay off; BATCH=50 hardcoded), any result_precision/pvalue (applied downstream of the patched helpers, unchanged), simple+complex interactions (complex handled via min-over-protein-rows), and any threads value (silently ignored by the fast path, result-identical). The patch rewrites 8 internal cpdb_statistical_analysis_helper functions + 1 no-op file_utils.save_dfs_as_tsv, all activated together as one coupled unit. Outputs are NOT bit-exact vs upstream: means are bit-exact, but np.random.shuffle yields a different permutation sequence than upstream's Categorical-setitem shuffle, so p-values differ (spearman ~0.99, Jaccard ~0.99 — within the task's noise floor, not 1.0).

Out-of-scope behavior

silent fallback to upstream

Show detailed speedup table 10 runs
Dataset Tier Platform Threads Baseline Optimized Speedup Memory Concordance Pass
cpdb_glaucoma_t12_c1200 small Windows 1 59.77 s 3.49 s 17.1× 0.7 → 0.6 GB pass
cpdb_glaucoma_t18_c1500 medium Windows 1 1.80 min 7.98 s 13.5× 1.1 → 0.9 GB pass
cpdb_glaucoma_t22_c2000 large Windows 1 2.32 min 14.16 s 9.83× 1.4 → 1.3 GB pass
cpdb_heart_t20_c1800 ood_large Windows 1 4.01 min 25.87 s 9.31× 1.7 → 1.5 GB pass
cpdb_heart_t22_c2200 ood_xlarge Windows 1 4.52 min 29.14 s 9.30× 2.0 → 2.0 GB pass
cpdb_glaucoma_t12_c1200 small macOS 1 1.31 min 2.40 s 32.8× 1.1 → 0.9 GB pass
cpdb_glaucoma_t18_c1500 medium macOS 1 3.94 min 4.99 s 47.4× 1.6 → 1.4 GB pass
cpdb_glaucoma_t22_c2000 large macOS 1 4.17 min 10.24 s 24.4× 2.0 → 1.7 GB pass
cpdb_heart_t20_c1800 ood_large macOS 1 13.48 min 12.94 s 62.5× 2.4 → 2.1 GB pass
cpdb_heart_t22_c2200 ood_xlarge macOS 1 22.63 min 16.59 s 81.8× 3.1 → 2.6 GB pass

Frequently asked questions

Speeding up CellPhoneDB v5
Why is CellPhoneDB v5 slow?

CellPhoneDB v5 is CPU-bound, and the stock implementation in CellphoneDB leaves performance on the table in its core numerical work. On the benchmark datasets the original takes 59.77 s where the AutoZyme path takes 3.49 s (17.1× faster).

How do I make CellPhoneDB v5 faster?

Install AutoZyme and activate the CellphoneDB patch, then keep using CellPhoneDB v5 exactly as before. AutoZyme transparently substitutes the faster, output-validated path, up to 17.1× faster on the benchmark datasets, with no pipeline or API changes.

Does the AutoZyme speedup change the CellPhoneDB v5 output?

Effectively no. The output is tolerance-equivalent: held within a frozen concordance gate (up to about 0.6% drift from the original CellphoneDB result) on every benchmark dataset.

How do I install the CellphoneDB speedup?

In Python: pip install autozyme, then import autozyme and autozyme.activate("cellphonedb"). The patch applies automatically the next time you call cellphonedb.src.core.methods.cpdb_statistical_analysis_method::call.