Threading & multiprocessing
Most users do not need to touch this page. AutoZyme chooses a reasonable thread count by default.
Read this page only if:
- you run on a shared server or HPC node and need to limit CPU use
- your pipeline already uses
multiprocessing,joblib,future,BiocParallel, or cluster workers - a benchmark looks slower because too many tools are fighting over the same CPU cores
Some names you may see in logs:
- BLAS: the linear algebra library used underneath PCA, matrix multiplication, and regression.
- OpenMP: a common C/C++ threading system used by compiled scientific code.
- numba: a Python compiler used by some AutoZyme Python kernels.
- worker process: a separate Python or R session created by tools like
multiprocessing,joblib,parallel, orBiocParallel.
Two practical rules cover almost everything:
- Set thread counts before heavy imports when possible.
- Worker processes may need their own
activate()call.
Use the Python / R toggle on any code block to switch languages — your choice is shared with the rest of the docs.
Set thread count
Use this when you want AutoZyme and common math libraries to use a fixed number of CPU threads.
import autozyme
autozyme.set_threads(8)
autozyme.activate("scanpy")This is usually enough. It sets the common thread-count environment variables used by scientific Python packages.
If a Python patch uses numba, numba reads its thread count very early. Set NUMBA_NUM_THREADS before importing AutoZyme if you need that exact limit:
import os
os.environ["NUMBA_NUM_THREADS"] = "4"
import autozyme
autozyme.activate("scanpy")AutoZyme also respects the AUTOZYME_THREADS environment variable (the older spelling AUTOZYMER_THREADS is still honored as a fallback). This is mostly useful for scripted benchmark sweeps.
library(autozyme)
autozyme::set_threads(8)
autozyme::activate("seurat")For scripted runs, you can also set the common environment variables before loading packages:
Sys.setenv(OMP_NUM_THREADS = "8")
Sys.setenv(OPENBLAS_NUM_THREADS = "8")
Sys.setenv(MKL_NUM_THREADS = "8")
library(autozyme)
autozyme::activate("seurat")You do not need to know which package uses which variable. These are the common knobs used by matrix libraries and compiled R/C++ code.
Worker & parallel processes
If your code starts worker processes, activation may not automatically carry over.
The friendly version:
- Linux default
multiprocessingworkers usually inherit AutoZyme. - Windows, macOS,
joblib, andlokyworkers usually start fresh and needactivate()inside the worker.
On Linux default multiprocessing.Pool, this usually works:
from multiprocessing import Pool
import autozyme
autozyme.activate("cell2location")
with Pool(4) as pool:
pool.map(work_fn, items)On Windows, macOS, spawn, joblib, or loky, activate inside each worker:
from multiprocessing import Pool
def init_worker():
import autozyme
autozyme.activate("cell2location")
with Pool(4, initializer=init_worker) as pool:
pool.map(work_fn, items)This same pattern applies to joblib.Parallel(backend="loky") and concurrent.futures.ProcessPoolExecutor when they start fresh Python interpreters.
The friendly version:
mclapply,MulticoreParam, andfuture::multicoreusually inherit AutoZyme.makeCluster,SnowParam, and similar backends start new R sessions and needactivate()inside the cluster.
Fork-based workers inherit active patches:
library(autozyme)
autozyme::activate("seurat")
parallel::mclapply(items, work_fn, mc.cores = 4)This applies to parallel::mclapply, BiocParallel::MulticoreParam, and future::multicore.
Snow-style workers start separate R sessions, so activate inside the cluster:
cl <- parallel::makeCluster(4)
parallel::clusterEvalQ(cl, {
library(autozyme)
autozyme::activate("seurat")
})
parallel::parLapply(cl, items, work_fn)
parallel::stopCluster(cl)This applies to parallel::makeCluster, BiocParallel::SnowParam, and other worker backends that start new R sessions.
Check what is in effect
autozyme.env_snapshot() autozyme::env_snapshot() The snapshot records active patches, their upstream package versions, and platform details (AutoZyme version, language version, and OS). It does not capture thread-count environment variables — to check those, read the environment directly (os.environ in Python, Sys.getenv() in R).