R Spatial & deconvolution spacexr

Speed up spacexr RCTD

spacexr RCTD is one of the slower steps in many single-cell genomics workflows. AutoZyme ships a verified, drop-in patch that is up to 23.7× faster, returning output within a validated, bounded difference with no change to how you call it.

Best speedup 23.7×
Median speedup 19.9×
Output equivalence Bounded
Best runtime baseline 38.95 min optimized 1.72 min
Datasets 5
Pass rate 9/10

Benchmark charts

Switch benchmark platform; all charts update together
Platform
Speedup distribution
Each dot is one finalized dataset/thread run on Windows
log scale
lymph_node_rctd_10000…lymph_node_rctd_4033lymph_node_rctd_1498lymph_node_rctd_500
Thread sweep
Speedup across finalized thread counts on Windows
25×50×14full (8)lymph_node_rctd_10000_boot · ood_xlarge1 threads · 23.7× speedup38.95 min baseline → 1.72 min optimizedmemory 2.1 GB → 1.8 GBlymph_node_rctd_10000_boot · ood_xlarge4 threads · 16.3× speedup12.40 min baseline → 50.59 s optimizedlymph_node_rctd_10000_boot · ood_xlarge8 threads · 6.66× speedup6.56 min baseline → 59.12 s optimizedlymph_node_rctd_4033 · large1 threads · 23.7× speedup13.51 min baseline → 34.21 s optimizedmemory 1.6 GB → 1.3 GBlymph_node_rctd_4033 · large4 threads · 7.54× speedup5.12 min baseline → 40.77 s optimizedlymph_node_rctd_4033 · large8 threads · 4.29× speedup2.87 min baseline → 40.18 s optimizedlymph_node_rctd_1498 · medium1 threads · 18.8× speedup5.12 min baseline → 16.34 s optimizedmemory 1.2 GB → 1.0 GBlymph_node_rctd_1498 · medium4 threads · 6.22× speedup1.88 min baseline → 18.17 s optimizedlymph_node_rctd_1498 · medium8 threads · 4.50× speedup1.34 min baseline → 17.92 s optimizedlymph_node_rctd_500 · small1 threads · 11.6× speedup1.83 min baseline → 9.50 s optimizedmemory 1.1 GB → 1.0 GBlymph_node_rctd_500 · small4 threads · 5.11× speedup51.97 s baseline → 10.17 s optimizedlymph_node_rctd_500 · small8 threads · 4.63× speedup48.51 s baseline → 10.47 s optimized
lymph_node_rctd_100…lymph_node_rctd_4033lymph_node_rctd_1498lymph_node_rctd_500
Memory
Baseline vs optimized peak memory on Windows
0.0 GB2.5 GB5.0 GBlymph_node_rctd_1…0.87×lymph_node_rctd_4…0.80×lymph_node_rctd_1…0.82×lymph_node_rctd_5…0.87×lymph_node_rctd_10000_boot · ood_xlargememory 2.1 GB → 1.8 GBoptimized / baseline 0.87×23.7× speedup · 1 threadslymph_node_rctd_4033 · largememory 1.6 GB → 1.3 GBoptimized / baseline 0.80×23.7× speedup · 1 threadslymph_node_rctd_1498 · mediummemory 1.2 GB → 1.0 GBoptimized / baseline 0.82×18.8× speedup · 1 threadslymph_node_rctd_500 · smallmemory 1.1 GB → 1.0 GBoptimized / baseline 0.87×11.6× speedup · 1 threads
baselineoptimized

What is accelerated

The public API stays the same; AutoZyme replaces only the supported fast path.

This task targets run.RCTD in spacexr. The benchmarked result preserves the declared scientific output gate while reducing CPU runtime on the listed datasets.

Also searched as: spacexr, deconvolution, spatial deconvolution, cell type deconvolution.

Supported scope

Accelerates ONLY the doublet-mode pixel-fitting loop reached by run.RCTD(., doublet_mode="doublet"), and the "full" mode loop (decompose_batch is patched). Read full supported scope

Accelerates ONLY the doublet-mode pixel-fitting loop reached by run.RCTD(., doublet_mode="doublet"), and the "full" mode loop (decompose_batch is patched). It patches 10 spacexr namespace internals (calc_log_l_vec, get_der_fast, solveWLS, solveIRWLS.weights, psd, process_bead_doublet, decompose_sparse, gather_results, process_beads_batch, decompose_batch) with C++/closed-form kernels. Fast C++ paths fire only for the non-bulk, non-constrained branches that fitPixels invokes (fitPixels calls process_beads_batch with constrain=F). Specifically: get_der_fast non-bulk; solveWLS p=1 (closed-form scalar Newton, exact) and p=2 (4-vertex active-set, equivalent to solve.QP for the 2x2 bound problem) only when !bulk_mode && !constrain; solveWLS p>2 non-bulk-unconstrained uses R quadprog single Newton step with the patched get_der_fast; solveIRWLS.weights non-bulk unconstrained ncol(S)>2 via rctd_cpp_irwls_full_nonbulk; decompose_sparse unconstrained p<=2 via rctd_cpp_irwls_sparse_p12; psd closed-form for 1x1 and 2x2; process_bead_doublet fused C++ candidate scoring + sparse pair refit in the !constrain branch. Full cell-type weights are reproduced exactly (pearson_weights=1.0) and small/medium/large/ood_xlarge tiers pass all thresholds on macOS and Windows. Threads {1,4,8} supported (mclapply fork on non-Windows when cores>1; PSOCK on Windows only when N>=6000 pixels, capped at 4 workers; otherwise serial). Likelihood globals Q_mat/SQ_mat/X_vals/K_val must be populated by spacexr::set_likelihood_vars (done by run.RCTD before fitPixels) before any fast_* runs.

Out-of-scope behavior

silent possibly wrong

Show detailed speedup table 10 runs
Dataset Tier Platform Threads Baseline Optimized Speedup Memory Concordance Pass
lymph_node_rctd_10000_boot ood_xlarge Windows 1 38.95 min 1.72 min 23.7× 2.1 → 1.8 GB pass
lymph_node_rctd_1498 medium Windows 1 5.12 min 16.34 s 18.8× 1.2 → 1.0 GB pass
lymph_node_rctd_3000_seed42 ood_large Windows 1 11.31 min 36.80 s 18.4× 1.4 → 1.2 GB fail
lymph_node_rctd_4033 large Windows 1 13.51 min 34.21 s 23.7× 1.6 → 1.3 GB pass
lymph_node_rctd_500 small Windows 1 1.83 min 9.50 s 11.6× 1.1 → 1.0 GB pass
lymph_node_rctd_10000_boot ood_xlarge macOS 1 18.94 min 53.51 s 21.4× 3.5 → 2.6 GB pass
lymph_node_rctd_1498 medium macOS 1 2.95 min 8.45 s 21.0× 2.5 → 1.5 GB pass
lymph_node_rctd_3000_seed42 ood_large macOS 1 5.96 min 21.96 s 16.5× 2.1 → 1.7 GB pass
lymph_node_rctd_4033 large macOS 1 7.74 min 21.83 s 21.3× 2.6 → 2.0 GB pass
lymph_node_rctd_500 small macOS 1 1.12 min 8.70 s 7.70× 2.0 → 1.3 GB pass

Frequently asked questions

Speeding up spacexr RCTD
Why is spacexr RCTD slow?

spacexr RCTD is CPU-bound, and the stock implementation in spacexr leaves performance on the table in its core numerical work. On the benchmark datasets the original takes 38.95 min where the AutoZyme path takes 1.72 min (23.7× faster).

How do I make spacexr RCTD faster?

Install AutoZyme and activate the spacexr patch, then keep using spacexr RCTD exactly as before. AutoZyme transparently substitutes the faster, output-validated path, up to 23.7× faster on the benchmark datasets, with no pipeline or API changes.

Does the AutoZyme speedup change the spacexr RCTD output?

Differences are small and bounded: concordance-validated to within roughly 1.5 to 5% of the original spacexr result on every benchmark dataset, inside a frozen gate.

How do I install the spacexr speedup?

In R: install the autozyme package, then run library(autozyme) and autozyme::activate("spacexr"). The patch applies automatically the next time you call run.RCTD.