Python Statistics & survival lifelines

Speed up lifelines Cox

lifelines Cox is one of the slower steps in many statistics & survival workflows. AutoZyme ships a verified, drop-in patch that is up to 2.12× faster, returning output within a strict, verified tolerance with no change to how you call it.

Best speedup 2.12×
Median speedup 1.46×
Output equivalence Tolerance
Best runtime baseline 3.91 min optimized 1.84 min
Datasets 5
Pass rate 10/10

Benchmark charts

Switch benchmark platform; all charts update together
Platform
Speedup distribution
Each dot is one finalized dataset/thread run on Windows
cox_synth_8M_d100cox_synth_1M_d30cox_synth_4M_d50cox_synth_4M_d90cox_synth_1M_d150_lo
Thread sweep
Speedup across finalized thread counts on Windows
No finalized multi-thread sweep for this platform.
Memory
Baseline vs optimized peak memory on Windows
0.0 GB25 GB50 GBcox_synth_8M_d1001.33×cox_synth_4M_d901.27×cox_synth_4M_d501.26×cox_synth_1M_d150…1.28×cox_synth_1M_d301.25×cox_synth_8M_d100 · ood_xlargememory 36 GB → 48 GBoptimized / baseline 1.33×2.12× speedup · 1 threadscox_synth_4M_d90 · largememory 19 GB → 25 GBoptimized / baseline 1.27×1.28× speedup · 1 threadscox_synth_4M_d50 · mediummemory 11 GB → 14 GBoptimized / baseline 1.26×1.40× speedup · 1 threadscox_synth_1M_d150_lo · ood_largememory 8.1 GB → 10 GBoptimized / baseline 1.28×1.05× speedup · 1 threadscox_synth_1M_d30 · smallmemory 1.8 GB → 2.3 GBoptimized / baseline 1.25×1.71× speedup · 1 threads
baselineoptimized

What is accelerated

The public API stays the same; AutoZyme replaces only the supported fast path.

This task targets lifelines · Cox in lifelines. The benchmarked result preserves the declared scientific output gate while reducing CPU runtime on the listed datasets.

Also searched as: survival analysis, cox regression, cox proportional hazards, hazard ratio, kaplan meier, time to event.

Supported scope

The patch replaces four SemiParametricPHFitter methods (the breslow/semi-parametric Cox path used by default CoxPHFitter). Read full supported scope

The patch replaces four SemiParametricPHFitter methods (the breslow/semi-parametric Cox path used by default CoxPHFitter). It is mathematically faithful to upstream for the default unweighted, unstratified, non-left-truncated Cox fit on a clean dense numeric DataFrame with the identity (default) formula. (1) fast_get_efron_values_batch / _efron_kernel_jit reproduces upstream _get_efron_values_batch exactly (Efron tied-time partial likelihood, per-observation weights, full d×d Hessian + gradient + log-lik every Newton step), so variance_matrix_/SE/p-values/CI inference stays valid. The kernel is ONLY invoked when lifelines' own _BatchVsSingle().decide() chooses 'batch' (or batch_mode=True); when 'single' is chosen the unpatched upstream _get_efron_values_single runs (correct). The benchmark datasets have many tied integer durations, forcing the batch path. (2) penalizer (L1/L2 elastic-net) and strata are applied OUTSIDE the kernel by the unpatched _newton_raphson_for_efron_model / _partition_by_strata_and_apply, so penalized and stratified fits remain correct (kernel just returns per-stratum h/g/ll). (3) fast_preprocess_dataframe stashes a contiguous float64 view and overrides X.mean/X.std with numpy equivalents using ddof=1 (matches pandas std). (4) safe_check_pre_fitting falls back to full upstream validation whenever weights_col or entry_col is set, any non-numeric dtype, any non-finite value, or any exception; only fully-clean numeric unweighted/non-truncated frames take the cheap finite-check path. (5) fast_predict_log_partial_hazard runs only post-fit (outside the timed window) and falls back to upstream for pandas Series input.

Out-of-scope behavior

silent fallback to upstream

Show detailed speedup table 10 runs
Dataset Tier Platform Threads Baseline Optimized Speedup Memory Concordance Pass
cox_synth_1M_d150_lo ood_large Windows 1 21.37 s 20.34 s 1.05× 8.1 → 10.3 GB pass
cox_synth_1M_d30 small Windows 1 3.64 s 2.13 s 1.71× 1.8 → 2.3 GB pass
cox_synth_4M_d50 medium Windows 1 27.47 s 19.61 s 1.40× 11.0 → 13.9 GB pass
cox_synth_4M_d90 large Windows 1 50.62 s 39.58 s 1.28× 19.4 → 24.6 GB pass
cox_synth_8M_d100 ood_xlarge Windows 1 3.91 min 1.84 min 2.12× 36.4 → 48.5 GB pass
cox_synth_1M_d150_lo ood_large macOS 1 18.84 s 12.29 s 1.53× 9.7 → 10.7 GB pass
cox_synth_1M_d30 small macOS 1 3.29 s 2.19 s 1.50× 2.4 → 2.7 GB pass
cox_synth_4M_d50 medium macOS 1 26.21 s 15.06 s 1.74× 12.5 → 12.1 GB pass
cox_synth_4M_d90 large macOS 1 51.09 s 35.71 s 1.43× 19.6 → 17.7 GB pass
cox_synth_8M_d100 ood_xlarge macOS 1 2.68 min 1.97 min 1.36× 24.2 → 23.8 GB pass

Frequently asked questions

Speeding up lifelines Cox
Why is lifelines Cox slow?

lifelines Cox is CPU-bound, and the stock implementation in lifelines leaves performance on the table in its core numerical work. On the benchmark datasets the original takes 3.91 min where the AutoZyme path takes 1.84 min (2.12× faster).

How do I make lifelines Cox faster?

Install AutoZyme and activate the lifelines patch, then keep using lifelines Cox exactly as before. AutoZyme transparently substitutes the faster, output-validated path, up to 2.12× faster on the benchmark datasets, with no pipeline or API changes.

Does the AutoZyme speedup change the lifelines Cox output?

Effectively no. The output is tolerance-equivalent: held within a frozen concordance gate (up to about 0.6% drift from the original lifelines result) on every benchmark dataset.

How do I install the lifelines speedup?

In Python: pip install autozyme, then import autozyme and autozyme.activate("lifelines"). The patch applies automatically the next time you call lifelines Cox.