Speed up statsmodels: up to 25.2× faster, near-identical output

Q: Why is statsmodels slow?

statsmodels is CPU-bound, and the stock implementation in statsmodels leaves performance on the table in its core numerical work. On the benchmark datasets the original takes 3.31 min where the AutoZyme path takes 8.29 s (25.2× faster).

Q: How do I make statsmodels faster?

Install AutoZyme and activate the statsmodels patch, then keep using statsmodels exactly as before. AutoZyme transparently substitutes the faster, output-validated path, up to 25.2× faster on the benchmark datasets, with no pipeline or API changes.

Q: Does the AutoZyme speedup change the statsmodels output?

Effectively no. The output is tolerance-equivalent: held within a frozen concordance gate (up to about 0.6% drift from the original statsmodels result) on every benchmark dataset.

Q: How do I install the statsmodels speedup?

In Python: pip install autozyme, then import autozyme and autozyme.activate("statsmodels"). The patch applies automatically the next time you call statsmodels.genmod.generalized_linear_model.GLM.fit.

Benchmark charts

Switch benchmark platform; all charts update together

Speedup distribution

Each dot is one finalized dataset/thread run on Windows

log scale

glm_poisson_ood_corr_…

25.2×

glm_poisson_medium

23.4×

glm_poisson_tiny

23.2×

glm_poisson_large

22.0×

glm_poisson_ood_xlarge

17.6×

glm_poisson_ood_corr_…glm_poisson_mediumglm_poisson_tinyglm_poisson_largeglm_poisson_ood_xlarge

Thread sweep

Speedup across finalized thread counts on Windows

glm_poisson_ood_cor…glm_poisson_mediumglm_poisson_tinyglm_poisson_largeglm_poisson_ood_xla…

Memory

Baseline vs optimized peak memory on Windows

baselineoptimized

What is accelerated

The public API stays the same; AutoZyme replaces only the supported fast path.

This task targets statsmodels.genmod.generalized_linear_model.GLM.fit in statsmodels. The benchmarked result preserves the declared scientific output gate while reducing CPU runtime on the listed datasets.

Also searched as: GLM, generalized linear model, regression, logistic regression, poisson regression.

Supported scope

Fast Poisson/log-link IRLS path is gated by _can_fast_poisson_irls (__init__.py:135-173) and activates ONLY for: family is exactly sm.families.Poisson with sm.families.links.Log link (default); method='IRLS'; scale is None; cov_type='nonrobust'; cov_kwds is… Read full supported scope

Fast Poisson/log-link IRLS path is gated by _can_fast_poisson_irls (__init__.py:135-173) and activates ONLY for: family is exactly sm.families.Poisson with sm.families.links.Log link (default); method='IRLS'; scale is None; cov_type='nonrobust'; cov_kwds is None; kwargs attach_wls=False, wls_method='lstsq', tol_criterion='deviance', rtol in (0,0.0,None); _offset_exposure all-zero (no offset/exposure); freq_weights, var_weights, iweights, n_trials all all-ones (unit weights, no binomial trials); start_params either None or shape[0]==exog.shape[1]; and design matrix is FULL RANK (implicit — fast_minimal_wls_fit at :285-287 uses np.linalg.solve(wexog.T@wexog, wexog.T@wendog) normal equations, and fast_glm_initialize at :210-219 sets df_model=p-1 / df_resid=n-p directly, skipping the matrix_rank SVD). Convergence uses abs(dev[i-1]-dev[i])<=atol (atol=tol), which is mathematically identical to upstream _check_convergence's np.allclose(...,rtol=0) on the deviance criterion. fast_handle_constant (:179-200) assumes the first all-ones finite column (as produced by sm.add_constant) is the intercept. For the benchmarked default Poisson fit() the produced params/llf/scale/converged/n_iter match upstream within max_abs/rel_diff 1e-6 and rel_diff_llf 1e-8 (task.yaml metrics).

Out-of-scope behavior

silent fallback to upstream

Show detailed speedup table 7 runs

Dataset	Tier	Platform	Threads	Baseline	Optimized	Speedup	Memory	Concordance	Pass
`glm_poisson_large`	large	Windows	8	3.32 min	9.01 s	22.0×	55.1 → 11.6 GB	—	pass
`glm_poisson_medium`	medium	Windows	8	2.33 min	5.98 s	23.4×	34.7 → 7.4 GB	—	pass
`glm_poisson_ood_corr_dense`	ood_large	Windows	8	3.31 min	8.29 s	25.2×	55.1 → 11.6 GB	—	pass
`glm_poisson_ood_xlarge`	ood_xlarge	Windows	8	3.90 min	13.16 s	17.6×	76.3 → 15.9 GB	—	pass
`glm_poisson_tiny`	small	Windows	4	1.26 min	3.17 s	23.2×	18.7 → 3.3 GB	—	pass
`glm_poisson_medium`	medium	macOS	1	1.85 min	3.21 s	34.4×	21.2 → 7.9 GB	—	pass
`glm_poisson_tiny`	small	macOS	1	57.73 s	1.60 s	35.6×	17.4 → 4.4 GB	—	pass

Frequently asked questions

Speeding up statsmodels

Why is statsmodels slow?

statsmodels is CPU-bound, and the stock implementation in statsmodels leaves performance on the table in its core numerical work. On the benchmark datasets the original takes 3.31 min where the AutoZyme path takes 8.29 s (25.2× faster).

How do I make statsmodels faster?

Install AutoZyme and activate the statsmodels patch, then keep using statsmodels exactly as before. AutoZyme transparently substitutes the faster, output-validated path, up to 25.2× faster on the benchmark datasets, with no pipeline or API changes.

Does the AutoZyme speedup change the statsmodels output?

Effectively no. The output is tolerance-equivalent: held within a frozen concordance gate (up to about 0.6% drift from the original statsmodels result) on every benchmark dataset.

How do I install the statsmodels speedup?

In Python: pip install autozyme, then import autozyme and autozyme.activate("statsmodels"). The patch applies automatically the next time you call statsmodels.genmod.generalized_linear_model.GLM.fit.