Python Compositional analysis scCODA

Speed up scCODA

scCODA is one of the slower steps in many single-cell genomics workflows. AutoZyme ships a verified, drop-in patch that is up to 1.69× faster, returning output within a validated, bounded difference with no change to how you call it.

Best speedup 1.69×
Median speedup 1.46×
Output equivalence Bounded
Best runtime baseline 34.30 s optimized 20.31 s
Datasets 5
Pass rate 10/10

Benchmark charts

Switch benchmark platform; all charts update together
Platform
Speedup distribution
Each dot is one finalized dataset/thread run on Windows
ifnb_pseudoheart_adultgastrulation_pijuansa…pbmc200k_glaucomatms_ss2_aging
Thread sweep
Speedup across finalized thread counts on Windows
No finalized multi-thread sweep for this platform.
Memory
Baseline vs optimized peak memory on Windows
0.0 GB5.0 GB10 GBgastrulation_piju…0.56×heart_adult0.60×tms_ss2_aging0.66×pbmc200k_glaucoma0.79×ifnb_pseudo0.86×gastrulation_pijuansala · ood_xlargememory 6.3 GB → 3.6 GBoptimized / baseline 0.56×1.43× speedup · 1 threadsheart_adult · largememory 3.1 GB → 1.9 GBoptimized / baseline 0.60×1.47× speedup · 1 threadstms_ss2_aging · ood_largememory 2.5 GB → 1.6 GBoptimized / baseline 0.66×1.40× speedup · 1 threadspbmc200k_glaucoma · mediummemory 1.2 GB → 1.0 GBoptimized / baseline 0.79×1.42× speedup · 1 threadsifnb_pseudo · smallmemory 0.9 GB → 0.8 GBoptimized / baseline 0.86×1.69× speedup · 1 threads
baselineoptimized

What is accelerated

The public API stays the same; AutoZyme replaces only the supported fast path.

This task targets scCODA in scCODA. The benchmarked result preserves the declared scientific output gate while reducing CPU runtime on the listed datasets.

Also searched as: compositional analysis, differential abundance, cell composition, composition.

Supported scope

Correct only for scCODA's standard spike-and-slab compositional HMC as configured by CompositionalAnalysis: a model whose target_log_prob_fn matches the hard-coded prior reconstructed in _build_fast_tlpf — HalfCauchy(0,1) on sigma_d (D×1), Normal(0,1) on… Read full supported scope

Correct only for scCODA's standard spike-and-slab compositional HMC as configured by CompositionalAnalysis: a model whose target_log_prob_fn matches the hard-coded prior reconstructed in _build_fast_tlpf — HalfCauchy(0,1) on sigma_d (D×1), Normal(0,1) on b_offset and ind_raw (D×(K-1)), Normal(0,5) on alpha (K), ind=sigmoid(50*ind_raw), beta=ind*sigma_d*b_offset with a single zero slope inserted at an INTEGER reference_cell_type, and a DirichletMultinomial(total=n_total, concentration=exp(alpha+x@beta)) likelihood. Fast path is correct for HamiltonianMonteCarlo (not NUTS) with a STATIC integer num_leapfrog_steps in [0,50] (default 10), step_size=0.01 (any scalar passes through to the real kernel), num_adapt_steps=None (→ int(0.8*num_burnin)) using SimpleStepSizeAdaptation(target_accept_prob=0.75) and the constraining_bijectors of the model. Both verbose=True and verbose=False branches are reproduced. The leapfrog unroll is gated by a contextvar so it only fires inside the scCODA sample_hmc call; unrelated TFP HMC users in the same process fall back to upstream unchanged.

Out-of-scope behavior

silent possibly wrong

Show detailed speedup table 10 runs
Dataset Tier Platform Threads Baseline Optimized Speedup Memory Concordance Pass
gastrulation_pijuansala ood_xlarge Windows 1 18.87 min 13.25 min 1.43× 6.3 → 3.6 GB pass
heart_adult large Windows 1 5.33 min 3.62 min 1.47× 3.1 → 1.9 GB pass
ifnb_pseudo small Windows 1 34.30 s 20.31 s 1.69× 0.9 → 0.8 GB pass
pbmc200k_glaucoma medium Windows 1 2.06 min 1.45 min 1.42× 1.2 → 1.0 GB pass
tms_ss2_aging ood_large Windows 1 7.22 min 5.15 min 1.40× 2.5 → 1.6 GB pass
gastrulation_pijuansala ood_xlarge macOS 1 11.10 min 7.49 min 1.48× 6.6 → 3.7 GB pass
heart_adult large macOS 1 5.48 min 4.16 min 1.32× 2.8 → 1.5 GB pass
ifnb_pseudo small macOS 1 34.23 s 19.43 s 1.76× 1.0 → 1.0 GB pass
pbmc200k_glaucoma medium macOS 1 2.08 min 1.36 min 1.53× 1.5 → 1.2 GB pass
tms_ss2_aging ood_large macOS 1 7.60 min 5.19 min 1.46× 2.3 → 1.3 GB pass

Frequently asked questions

Speeding up scCODA
Why is scCODA slow?

scCODA is CPU-bound, and the stock implementation in scCODA leaves performance on the table in its core numerical work. On the benchmark datasets the original takes 34.30 s where the AutoZyme path takes 20.31 s (1.69× faster).

How do I make scCODA faster?

Install AutoZyme and activate the scCODA patch, then keep using scCODA exactly as before. AutoZyme transparently substitutes the faster, output-validated path, up to 1.69× faster on the benchmark datasets, with no pipeline or API changes.

Does the AutoZyme speedup change the scCODA output?

Differences are small and bounded: concordance-validated to within roughly 1.5 to 5% of the original scCODA result on every benchmark dataset, inside a frozen gate.

How do I install the scCODA speedup?

In Python: pip install autozyme, then import autozyme and autozyme.activate("sccoda"). The patch applies automatically the next time you call scCODA.