Speed up BayesSpace: up to 13.1× faster, validated output

Benchmark charts

Switch benchmark platform; all charts update together

Speedup distribution

Each dot is one finalized dataset/thread run on Windows

log scale

10x_visium_human_lymp…

13.1×

10x_visium_human_brea…

11.9×

10x_visium_mouse_brai…

10.3×

10x_visium_mouse_brai…

8.92×

thrane_melanoma_ST_me…

8.41×

10x_visium_human_lymp…10x_visium_human_brea…10x_visium_mouse_brai…10x_visium_mouse_brai…thrane_melanoma_ST_me…

Thread sweep

Speedup across finalized thread counts on Windows

No finalized multi-thread sweep for this platform.

Memory

Baseline vs optimized peak memory on Windows

baselineoptimized

What is accelerated

The public API stays the same; AutoZyme replaces only the supported fast path.

This task targets BayesSpace::spatialCluster in BayesSpace. The benchmarked result preserves the declared scientific output gate while reducing CPU runtime on the listed datasets.

Also searched as: spatial clustering, spatial domains, spatialCluster.

Supported scope

The patch overrides ONLY BayesSpace internal iterate_t, the Gibbs/MH MCMC inner loop invoked when spatialCluster is called with model="t". Read full supported scope

The patch overrides ONLY BayesSpace internal iterate_t, the Gibbs/MH MCMC inner loop invoked when spatialCluster is called with model="t". Within that path the fast kernel (fast_iterate_t_impl in src/bayesspace.cpp) reproduces the full upstream t-model math for arbitrary n (spots), d (PC dims), q>=2 (clusters), gamma, nrep, thin, burn.in, and any df_j neighbor structure (handles empty neighbor lists). It is platform-agnostic because platform only affects how spatialCluster builds df_j (the neighbor list) before iterate_t runs, and any q,d,gamma,nrep are honored. It is NOT bit-exact: BLAS-batched rooti projection introduces fp reordering and, more importantly, the proposal draw was changed from Rcpp::sample to R::unif_rand, which rotates the entire RNG trajectory — equilibrium distribution preserved, individual trajectory diverges. Correctness is gated statistically (ARI/NMI permutation-invariant clustering similarity, noise_multiplier widened 5%), not element-wise. Validated tiers span platform in {ST, Visium}, q in 4-12, d in 7-20, nrep 10000-50000.

Out-of-scope behavior

silent fallback to upstream

Show detailed speedup table 10 runs

Dataset	Tier	Platform	Threads	Baseline	Optimized	Speedup	Memory	Concordance	Pass
`10x_visium_human_breast_cancer_block_a`	ood_xlarge	Windows	1	14.05 min	1.18 min	11.9×	1.7 → 1.7 GB	—	pass
`10x_visium_human_lymph_node`	ood_large	Windows	1	8.27 min	37.87 s	13.1×	1.8 → 1.8 GB	—	pass
`10x_visium_mouse_brain_sagittal_anterior`	medium	Windows	1	2.24 min	15.07 s	8.92×	1.5 → 1.5 GB	—	pass
`10x_visium_mouse_brain_sagittal_posterior`	large	Windows	1	5.59 min	32.63 s	10.3×	1.5 → 1.5 GB	—	pass
`thrane_melanoma_ST_mel1_rep2`	small	Windows	1	37.66 s	4.48 s	8.41×	0.9 → 0.8 GB	—	pass
`10x_visium_human_breast_cancer_block_a`	ood_xlarge	macOS	1	12.28 min	1.32 min	9.34×	1.9 → 1.8 GB	—	pass
`10x_visium_human_lymph_node`	ood_large	macOS	1	5.40 min	38.73 s	8.37×	1.9 → 1.9 GB	—	pass
`10x_visium_mouse_brain_sagittal_anterior`	medium	macOS	1	1.75 min	13.71 s	7.65×	1.6 → 1.5 GB	—	pass
`10x_visium_mouse_brain_sagittal_posterior`	large	macOS	1	4.42 min	31.32 s	8.48×	1.7 → 1.6 GB	—	pass
`thrane_melanoma_ST_mel1_rep2`	small	macOS	1	29.23 s	3.66 s	7.99×	1.1 → 0.9 GB	—	pass

Frequently asked questions

Speeding up BayesSpace

Why is BayesSpace slow?

BayesSpace is CPU-bound, and the stock implementation in BayesSpace leaves performance on the table in its core numerical work. On the benchmark datasets the original takes 8.27 min where the AutoZyme path takes 37.87 s (13.1× faster).

How do I make BayesSpace faster?

Install AutoZyme and activate the BayesSpace patch, then keep using BayesSpace exactly as before. AutoZyme transparently substitutes the faster, output-validated path, up to 13.1× faster on the benchmark datasets, with no pipeline or API changes.

Does the AutoZyme speedup change the BayesSpace output?

Differences are small and bounded: concordance-validated to within roughly 1.5 to 5% of the original BayesSpace result on every benchmark dataset, inside a frozen gate.

How do I install the BayesSpace speedup?

In Python: pip install autozyme, then import autozyme and autozyme.activate("bayesspace"). The patch applies automatically the next time you call BayesSpace::spatialCluster.