Python Statistics & survival statsmodels

Speed up statsmodels

statsmodels is one of the slower steps in many statistics & survival workflows. AutoZyme ships a verified, drop-in patch that is up to 25.2× faster, returning output within a strict, verified tolerance with no change to how you call it.

Best speedup 25.2×
Median speedup 23.4×
Output equivalence Tolerance
Best runtime baseline 3.31 min optimized 8.29 s
Datasets 5
Pass rate 7/7

Benchmark charts

Switch benchmark platform; all charts update together
Platform
Speedup distribution
Each dot is one finalized dataset/thread run on Windows
log scale
glm_poisson_ood_corr_…glm_poisson_mediumglm_poisson_tinyglm_poisson_largeglm_poisson_ood_xlarge
Thread sweep
Speedup across finalized thread counts on Windows
25×50×14full (8)glm_poisson_ood_corr_dense · ood_large1 threads · 18.0× speedup4.80 min baseline → 17.95 s optimizedmemory 55 GB → 12 GBglm_poisson_ood_corr_dense · ood_large4 threads · 16.9× speedup3.22 min baseline → 11.35 s optimizedmemory 55 GB → 12 GBglm_poisson_ood_corr_dense · ood_large8 threads · 25.2× speedup3.31 min baseline → 8.29 s optimizedmemory 55 GB → 12 GBglm_poisson_medium · medium1 threads · 16.8× speedup3.01 min baseline → 10.73 s optimizedmemory 35 GB → 7.4 GBglm_poisson_medium · medium4 threads · 17.5× speedup2.40 min baseline → 9.28 s optimizedmemory 41 GB → 7.6 GBglm_poisson_medium · medium8 threads · 23.4× speedup2.33 min baseline → 5.98 s optimizedmemory 35 GB → 7.4 GBglm_poisson_tiny · small1 threads · 21.6× speedup1.84 min baseline → 5.16 s optimizedmemory 19 GB → 4.1 GBglm_poisson_tiny · small4 threads · 23.2× speedup1.26 min baseline → 3.17 s optimizedmemory 19 GB → 3.3 GBglm_poisson_tiny · small8 threads · 17.2× speedup1.26 min baseline → 5.02 s optimizedmemory 19 GB → 4.1 GBglm_poisson_large · large1 threads · 15.6× speedup4.53 min baseline → 18.50 s optimizedmemory 55 GB → 12 GBglm_poisson_large · large4 threads · 18.1× speedup3.29 min baseline → 10.96 s optimizedmemory 55 GB → 12 GBglm_poisson_large · large8 threads · 22.0× speedup3.32 min baseline → 9.01 s optimizedmemory 55 GB → 12 GBglm_poisson_ood_xlarge · ood_xlarge1 threads · 14.4× speedup5.80 min baseline → 25.19 s optimizedmemory 76 GB → 16 GBglm_poisson_ood_xlarge · ood_xlarge4 threads · 14.7× speedup3.92 min baseline → 16.03 s optimizedmemory 76 GB → 16 GBglm_poisson_ood_xlarge · ood_xlarge8 threads · 17.6× speedup3.90 min baseline → 13.16 s optimizedmemory 76 GB → 16 GB
glm_poisson_ood_cor…glm_poisson_mediumglm_poisson_tinyglm_poisson_largeglm_poisson_ood_xla…
Memory
Baseline vs optimized peak memory on Windows
0.0 GB50 GB100 GBglm_poisson_ood_x…0.21×glm_poisson_large0.21×glm_poisson_ood_c…0.21×glm_poisson_medium0.18×glm_poisson_tiny0.22×glm_poisson_ood_xlarge · ood_xlargememory 76 GB → 16 GBoptimized / baseline 0.21×17.6× speedup · 8 threadsglm_poisson_large · largememory 55 GB → 12 GBoptimized / baseline 0.21×22.0× speedup · 8 threadsglm_poisson_ood_corr_dense · ood_largememory 55 GB → 12 GBoptimized / baseline 0.21×25.2× speedup · 8 threadsglm_poisson_medium · mediummemory 41 GB → 7.6 GBoptimized / baseline 0.18×17.5× speedup · 4 threadsglm_poisson_tiny · smallmemory 19 GB → 4.1 GBoptimized / baseline 0.22×17.2× speedup · 8 threads
baselineoptimized

What is accelerated

The public API stays the same; AutoZyme replaces only the supported fast path.

This task targets statsmodels.genmod.generalized_linear_model.GLM.fit in statsmodels. The benchmarked result preserves the declared scientific output gate while reducing CPU runtime on the listed datasets.

Also searched as: GLM, generalized linear model, regression, logistic regression, poisson regression.

Supported scope

Fast Poisson/log-link IRLS path is gated by _can_fast_poisson_irls (__init__.py:135-173) and activates ONLY for: family is exactly sm.families.Poisson with sm.families.links.Log link (default); method='IRLS'; scale is None; cov_type='nonrobust'; cov_kwds is… Read full supported scope

Fast Poisson/log-link IRLS path is gated by _can_fast_poisson_irls (__init__.py:135-173) and activates ONLY for: family is exactly sm.families.Poisson with sm.families.links.Log link (default); method='IRLS'; scale is None; cov_type='nonrobust'; cov_kwds is None; kwargs attach_wls=False, wls_method='lstsq', tol_criterion='deviance', rtol in (0,0.0,None); _offset_exposure all-zero (no offset/exposure); freq_weights, var_weights, iweights, n_trials all all-ones (unit weights, no binomial trials); start_params either None or shape[0]==exog.shape[1]; and design matrix is FULL RANK (implicit — fast_minimal_wls_fit at :285-287 uses np.linalg.solve(wexog.T@wexog, wexog.T@wendog) normal equations, and fast_glm_initialize at :210-219 sets df_model=p-1 / df_resid=n-p directly, skipping the matrix_rank SVD). Convergence uses abs(dev[i-1]-dev[i])<=atol (atol=tol), which is mathematically identical to upstream _check_convergence's np.allclose(...,rtol=0) on the deviance criterion. fast_handle_constant (:179-200) assumes the first all-ones finite column (as produced by sm.add_constant) is the intercept. For the benchmarked default Poisson fit() the produced params/llf/scale/converged/n_iter match upstream within max_abs/rel_diff 1e-6 and rel_diff_llf 1e-8 (task.yaml metrics).

Out-of-scope behavior

silent fallback to upstream

Show detailed speedup table 7 runs
Dataset Tier Platform Threads Baseline Optimized Speedup Memory Concordance Pass
glm_poisson_large large Windows 8 3.32 min 9.01 s 22.0× 55.1 → 11.6 GB pass
glm_poisson_medium medium Windows 8 2.33 min 5.98 s 23.4× 34.7 → 7.4 GB pass
glm_poisson_ood_corr_dense ood_large Windows 8 3.31 min 8.29 s 25.2× 55.1 → 11.6 GB pass
glm_poisson_ood_xlarge ood_xlarge Windows 8 3.90 min 13.16 s 17.6× 76.3 → 15.9 GB pass
glm_poisson_tiny small Windows 4 1.26 min 3.17 s 23.2× 18.7 → 3.3 GB pass
glm_poisson_medium medium macOS 1 1.85 min 3.21 s 34.4× 21.2 → 7.9 GB pass
glm_poisson_tiny small macOS 1 57.73 s 1.60 s 35.6× 17.4 → 4.4 GB pass

Frequently asked questions

Speeding up statsmodels
Why is statsmodels slow?

statsmodels is CPU-bound, and the stock implementation in statsmodels leaves performance on the table in its core numerical work. On the benchmark datasets the original takes 3.31 min where the AutoZyme path takes 8.29 s (25.2× faster).

How do I make statsmodels faster?

Install AutoZyme and activate the statsmodels patch, then keep using statsmodels exactly as before. AutoZyme transparently substitutes the faster, output-validated path, up to 25.2× faster on the benchmark datasets, with no pipeline or API changes.

Does the AutoZyme speedup change the statsmodels output?

Effectively no. The output is tolerance-equivalent: held within a frozen concordance gate (up to about 0.6% drift from the original statsmodels result) on every benchmark dataset.

How do I install the statsmodels speedup?

In Python: pip install autozyme, then import autozyme and autozyme.activate("statsmodels"). The patch applies automatically the next time you call statsmodels.genmod.generalized_linear_model.GLM.fit.