Benchmark charts
Speedup distribution
Each dot is one finalized dataset/thread run on WindowsThread sweep
Speedup across finalized thread counts on WindowsMemory
Baseline vs optimized peak memory on WindowsWhat is accelerated
This task targets run.RCTD in spacexr. The benchmarked result
preserves the declared scientific output gate while reducing CPU runtime on the listed datasets.
Also searched as: spacexr, deconvolution, spatial deconvolution, cell type deconvolution.
Supported scope
Accelerates ONLY the doublet-mode pixel-fitting loop reached by run.RCTD(., doublet_mode="doublet"), and the "full" mode loop (decompose_batch is patched). Read full supported scope
Accelerates ONLY the doublet-mode pixel-fitting loop reached by run.RCTD(., doublet_mode="doublet"), and the "full" mode loop (decompose_batch is patched). It patches 10 spacexr namespace internals (calc_log_l_vec, get_der_fast, solveWLS, solveIRWLS.weights, psd, process_bead_doublet, decompose_sparse, gather_results, process_beads_batch, decompose_batch) with C++/closed-form kernels. Fast C++ paths fire only for the non-bulk, non-constrained branches that fitPixels invokes (fitPixels calls process_beads_batch with constrain=F). Specifically: get_der_fast non-bulk; solveWLS p=1 (closed-form scalar Newton, exact) and p=2 (4-vertex active-set, equivalent to solve.QP for the 2x2 bound problem) only when !bulk_mode && !constrain; solveWLS p>2 non-bulk-unconstrained uses R quadprog single Newton step with the patched get_der_fast; solveIRWLS.weights non-bulk unconstrained ncol(S)>2 via rctd_cpp_irwls_full_nonbulk; decompose_sparse unconstrained p<=2 via rctd_cpp_irwls_sparse_p12; psd closed-form for 1x1 and 2x2; process_bead_doublet fused C++ candidate scoring + sparse pair refit in the !constrain branch. Full cell-type weights are reproduced exactly (pearson_weights=1.0) and small/medium/large/ood_xlarge tiers pass all thresholds on macOS and Windows. Threads {1,4,8} supported (mclapply fork on non-Windows when cores>1; PSOCK on Windows only when N>=6000 pixels, capped at 4 workers; otherwise serial). Likelihood globals Q_mat/SQ_mat/X_vals/K_val must be populated by spacexr::set_likelihood_vars (done by run.RCTD before fitPixels) before any fast_* runs.
Out-of-scope behavior
silent possibly wrong
Show detailed speedup table 10 runs
| Dataset | Tier | Platform | Threads | Baseline | Optimized | Speedup | Memory | Concordance | Pass |
|---|---|---|---|---|---|---|---|---|---|
lymph_node_rctd_10000_boot | ood_xlarge | Windows | 1 | 38.95 min | 1.72 min | 23.7× | 2.1 → 1.8 GB | — | pass |
lymph_node_rctd_1498 | medium | Windows | 1 | 5.12 min | 16.34 s | 18.8× | 1.2 → 1.0 GB | — | pass |
lymph_node_rctd_3000_seed42 | ood_large | Windows | 1 | 11.31 min | 36.80 s | 18.4× | 1.4 → 1.2 GB | — | fail |
lymph_node_rctd_4033 | large | Windows | 1 | 13.51 min | 34.21 s | 23.7× | 1.6 → 1.3 GB | — | pass |
lymph_node_rctd_500 | small | Windows | 1 | 1.83 min | 9.50 s | 11.6× | 1.1 → 1.0 GB | — | pass |
lymph_node_rctd_10000_boot | ood_xlarge | macOS | 1 | 18.94 min | 53.51 s | 21.4× | 3.5 → 2.6 GB | — | pass |
lymph_node_rctd_1498 | medium | macOS | 1 | 2.95 min | 8.45 s | 21.0× | 2.5 → 1.5 GB | — | pass |
lymph_node_rctd_3000_seed42 | ood_large | macOS | 1 | 5.96 min | 21.96 s | 16.5× | 2.1 → 1.7 GB | — | pass |
lymph_node_rctd_4033 | large | macOS | 1 | 7.74 min | 21.83 s | 21.3× | 2.6 → 2.0 GB | — | pass |
lymph_node_rctd_500 | small | macOS | 1 | 1.12 min | 8.70 s | 7.70× | 2.0 → 1.3 GB | — | pass |
Frequently asked questions
Why is spacexr RCTD slow?
spacexr RCTD is CPU-bound, and the stock implementation in spacexr leaves performance on the table in its core numerical work. On the benchmark datasets the original takes 38.95 min where the AutoZyme path takes 1.72 min (23.7× faster).
How do I make spacexr RCTD faster?
Install AutoZyme and activate the spacexr patch, then keep using spacexr RCTD exactly as before. AutoZyme transparently substitutes the faster, output-validated path, up to 23.7× faster on the benchmark datasets, with no pipeline or API changes.
Does the AutoZyme speedup change the spacexr RCTD output?
Differences are small and bounded: concordance-validated to within roughly 1.5 to 5% of the original spacexr result on every benchmark dataset, inside a frozen gate.
How do I install the spacexr speedup?
In R: install the autozyme package, then run library(autozyme) and autozyme::activate("spacexr"). The patch applies automatically the next time you call run.RCTD.