Threading & multiprocessing

Most users do not need to touch this page. AutoZyme chooses a reasonable thread count by default.

Read this page only if:

  • you run on a shared server or HPC node and need to limit CPU use
  • your pipeline already uses multiprocessing, joblib, future, BiocParallel, or cluster workers
  • a benchmark looks slower because too many tools are fighting over the same CPU cores

Some names you may see in logs:

  • BLAS: the linear algebra library used underneath PCA, matrix multiplication, and regression.
  • OpenMP: a common C/C++ threading system used by compiled scientific code.
  • numba: a Python compiler used by some AutoZyme Python kernels.
  • worker process: a separate Python or R session created by tools like multiprocessing, joblib, parallel, or BiocParallel.

Two practical rules cover almost everything:

  1. Set thread counts before heavy imports when possible.
  2. Worker processes may need their own activate() call.

Use the Python / R toggle on any code block to switch languages — your choice is shared with the rest of the docs.

Set thread count

Use this when you want AutoZyme and common math libraries to use a fixed number of CPU threads.

import autozyme

autozyme.set_threads(8)
autozyme.activate("scanpy")

This is usually enough. It sets the common thread-count environment variables used by scientific Python packages.

If a Python patch uses numba, numba reads its thread count very early. Set NUMBA_NUM_THREADS before importing AutoZyme if you need that exact limit:

import os

os.environ["NUMBA_NUM_THREADS"] = "4"

import autozyme
autozyme.activate("scanpy")

AutoZyme also respects the AUTOZYME_THREADS environment variable (the older spelling AUTOZYMER_THREADS is still honored as a fallback). This is mostly useful for scripted benchmark sweeps.

library(autozyme)

autozyme::set_threads(8)
autozyme::activate("seurat")

For scripted runs, you can also set the common environment variables before loading packages:

Sys.setenv(OMP_NUM_THREADS = "8")
Sys.setenv(OPENBLAS_NUM_THREADS = "8")
Sys.setenv(MKL_NUM_THREADS = "8")

library(autozyme)
autozyme::activate("seurat")

You do not need to know which package uses which variable. These are the common knobs used by matrix libraries and compiled R/C++ code.

Worker & parallel processes

If your code starts worker processes, activation may not automatically carry over.

The friendly version:

  • Linux default multiprocessing workers usually inherit AutoZyme.
  • Windows, macOS, joblib, and loky workers usually start fresh and need activate() inside the worker.

On Linux default multiprocessing.Pool, this usually works:

from multiprocessing import Pool
import autozyme

autozyme.activate("cell2location")

with Pool(4) as pool:
    pool.map(work_fn, items)

On Windows, macOS, spawn, joblib, or loky, activate inside each worker:

from multiprocessing import Pool

def init_worker():
    import autozyme
    autozyme.activate("cell2location")

with Pool(4, initializer=init_worker) as pool:
    pool.map(work_fn, items)

This same pattern applies to joblib.Parallel(backend="loky") and concurrent.futures.ProcessPoolExecutor when they start fresh Python interpreters.

The friendly version:

  • mclapply, MulticoreParam, and future::multicore usually inherit AutoZyme.
  • makeCluster, SnowParam, and similar backends start new R sessions and need activate() inside the cluster.

Fork-based workers inherit active patches:

library(autozyme)
autozyme::activate("seurat")

parallel::mclapply(items, work_fn, mc.cores = 4)

This applies to parallel::mclapply, BiocParallel::MulticoreParam, and future::multicore.

Snow-style workers start separate R sessions, so activate inside the cluster:

cl <- parallel::makeCluster(4)
parallel::clusterEvalQ(cl, {
  library(autozyme)
  autozyme::activate("seurat")
})

parallel::parLapply(cl, items, work_fn)
parallel::stopCluster(cl)

This applies to parallel::makeCluster, BiocParallel::SnowParam, and other worker backends that start new R sessions.

Check what is in effect

autozyme.env_snapshot()
autozyme::env_snapshot()

The snapshot records active patches, their upstream package versions, and platform details (AutoZyme version, language version, and OS). It does not capture thread-count environment variables — to check those, read the environment directly (os.environ in Python, Sys.getenv() in R).