Speed up CellPhoneDB v5: up to 17.1× faster, near-identical output

Benchmark charts

Switch benchmark platform; all charts update together

Speedup distribution

Each dot is one finalized dataset/thread run on Windows

log scale

cpdb_glaucoma_t12_c12…

17.1×

cpdb_glaucoma_t18_c15…

13.5×

cpdb_glaucoma_t22_c20…

9.83×

cpdb_heart_t20_c1800

9.31×

cpdb_heart_t22_c2200

9.30×

cpdb_glaucoma_t12_c12…cpdb_glaucoma_t18_c15…cpdb_glaucoma_t22_c20…cpdb_heart_t20_c1800cpdb_heart_t22_c2200

Thread sweep

Speedup across finalized thread counts on Windows

No finalized multi-thread sweep for this platform.

Memory

Baseline vs optimized peak memory on Windows

baselineoptimized

What is accelerated

The public API stays the same; AutoZyme replaces only the supported fast path.

This task targets cellphonedb.src.core.methods.cpdb_statistical_analysis_method::call in CellphoneDB. The benchmarked result preserves the declared scientific output gate while reducing CPU runtime on the listed datasets.

Also searched as: cpdb, cell-cell communication, ligand-receptor, CCC.

Supported scope

Correct for the statistical analysis method run with counts_data in {ensembl, gene_name, hgnc_symbol} (column-name driven), upstream-default-style args: any threshold (flows into fast_percent_analysis and fast_build_clusters), any separator, any iterations… Read full supported scope

Correct for the statistical analysis method run with counts_data in {ensembl, gene_name, hgnc_symbol} (column-name driven), upstream-default-style args: any threshold (flows into fast_percent_analysis and fast_build_clusters), any separator, any iterations (>= ~50 for the batched matmul to pay off; BATCH=50 hardcoded), any result_precision/pvalue (applied downstream of the patched helpers, unchanged), simple+complex interactions (complex handled via min-over-protein-rows), and any threads value (silently ignored by the fast path, result-identical). The patch rewrites 8 internal cpdb_statistical_analysis_helper functions + 1 no-op file_utils.save_dfs_as_tsv, all activated together as one coupled unit. Outputs are NOT bit-exact vs upstream: means are bit-exact, but np.random.shuffle yields a different permutation sequence than upstream's Categorical-setitem shuffle, so p-values differ (spearman ~0.99, Jaccard ~0.99 — within the task's noise floor, not 1.0).

Out-of-scope behavior

silent fallback to upstream

Show detailed speedup table 10 runs

Dataset	Tier	Platform	Threads	Baseline	Optimized	Speedup	Memory	Concordance	Pass
`cpdb_glaucoma_t12_c1200`	small	Windows	1	59.77 s	3.49 s	17.1×	0.7 → 0.6 GB	—	pass
`cpdb_glaucoma_t18_c1500`	medium	Windows	1	1.80 min	7.98 s	13.5×	1.1 → 0.9 GB	—	pass
`cpdb_glaucoma_t22_c2000`	large	Windows	1	2.32 min	14.16 s	9.83×	1.4 → 1.3 GB	—	pass
`cpdb_heart_t20_c1800`	ood_large	Windows	1	4.01 min	25.87 s	9.31×	1.7 → 1.5 GB	—	pass
`cpdb_heart_t22_c2200`	ood_xlarge	Windows	1	4.52 min	29.14 s	9.30×	2.0 → 2.0 GB	—	pass
`cpdb_glaucoma_t12_c1200`	small	macOS	1	1.31 min	2.40 s	32.8×	1.1 → 0.9 GB	—	pass
`cpdb_glaucoma_t18_c1500`	medium	macOS	1	3.94 min	4.99 s	47.4×	1.6 → 1.4 GB	—	pass
`cpdb_glaucoma_t22_c2000`	large	macOS	1	4.17 min	10.24 s	24.4×	2.0 → 1.7 GB	—	pass
`cpdb_heart_t20_c1800`	ood_large	macOS	1	13.48 min	12.94 s	62.5×	2.4 → 2.1 GB	—	pass
`cpdb_heart_t22_c2200`	ood_xlarge	macOS	1	22.63 min	16.59 s	81.8×	3.1 → 2.6 GB	—	pass

Frequently asked questions

Speeding up CellPhoneDB v5

Why is CellPhoneDB v5 slow?

CellPhoneDB v5 is CPU-bound, and the stock implementation in CellphoneDB leaves performance on the table in its core numerical work. On the benchmark datasets the original takes 59.77 s where the AutoZyme path takes 3.49 s (17.1× faster).

How do I make CellPhoneDB v5 faster?

Install AutoZyme and activate the CellphoneDB patch, then keep using CellPhoneDB v5 exactly as before. AutoZyme transparently substitutes the faster, output-validated path, up to 17.1× faster on the benchmark datasets, with no pipeline or API changes.

Does the AutoZyme speedup change the CellPhoneDB v5 output?

Effectively no. The output is tolerance-equivalent: held within a frozen concordance gate (up to about 0.6% drift from the original CellphoneDB result) on every benchmark dataset.

How do I install the CellphoneDB speedup?

In Python: pip install autozyme, then import autozyme and autozyme.activate("cellphonedb"). The patch applies automatically the next time you call cellphonedb.src.core.methods.cpdb_statistical_analysis_method::call.