Speed up lifelines Cox: up to 2.12× faster, near-identical output

Benchmark charts

Switch benchmark platform; all charts update together

Speedup distribution

Each dot is one finalized dataset/thread run on Windows

cox_synth_8M_d100

2.12×

cox_synth_1M_d30

1.71×

cox_synth_4M_d50

1.40×

cox_synth_4M_d90

1.28×

cox_synth_1M_d150_lo

1.05×

cox_synth_8M_d100cox_synth_1M_d30cox_synth_4M_d50cox_synth_4M_d90cox_synth_1M_d150_lo

Thread sweep

Speedup across finalized thread counts on Windows

No finalized multi-thread sweep for this platform.

Memory

Baseline vs optimized peak memory on Windows

baselineoptimized

What is accelerated

The public API stays the same; AutoZyme replaces only the supported fast path.

This task targets lifelines · Cox in lifelines. The benchmarked result preserves the declared scientific output gate while reducing CPU runtime on the listed datasets.

Also searched as: survival analysis, cox regression, cox proportional hazards, hazard ratio, kaplan meier, time to event.

Supported scope

The patch replaces four SemiParametricPHFitter methods (the breslow/semi-parametric Cox path used by default CoxPHFitter). Read full supported scope

The patch replaces four SemiParametricPHFitter methods (the breslow/semi-parametric Cox path used by default CoxPHFitter). It is mathematically faithful to upstream for the default unweighted, unstratified, non-left-truncated Cox fit on a clean dense numeric DataFrame with the identity (default) formula. (1) fast_get_efron_values_batch / _efron_kernel_jit reproduces upstream _get_efron_values_batch exactly (Efron tied-time partial likelihood, per-observation weights, full d×d Hessian + gradient + log-lik every Newton step), so variance_matrix_/SE/p-values/CI inference stays valid. The kernel is ONLY invoked when lifelines' own _BatchVsSingle().decide() chooses 'batch' (or batch_mode=True); when 'single' is chosen the unpatched upstream _get_efron_values_single runs (correct). The benchmark datasets have many tied integer durations, forcing the batch path. (2) penalizer (L1/L2 elastic-net) and strata are applied OUTSIDE the kernel by the unpatched _newton_raphson_for_efron_model / _partition_by_strata_and_apply, so penalized and stratified fits remain correct (kernel just returns per-stratum h/g/ll). (3) fast_preprocess_dataframe stashes a contiguous float64 view and overrides X.mean/X.std with numpy equivalents using ddof=1 (matches pandas std). (4) safe_check_pre_fitting falls back to full upstream validation whenever weights_col or entry_col is set, any non-numeric dtype, any non-finite value, or any exception; only fully-clean numeric unweighted/non-truncated frames take the cheap finite-check path. (5) fast_predict_log_partial_hazard runs only post-fit (outside the timed window) and falls back to upstream for pandas Series input.

Out-of-scope behavior

silent fallback to upstream

Show detailed speedup table 10 runs

Dataset	Tier	Platform	Threads	Baseline	Optimized	Speedup	Memory	Concordance	Pass
`cox_synth_1M_d150_lo`	ood_large	Windows	1	21.37 s	20.34 s	1.05×	8.1 → 10.3 GB	—	pass
`cox_synth_1M_d30`	small	Windows	1	3.64 s	2.13 s	1.71×	1.8 → 2.3 GB	—	pass
`cox_synth_4M_d50`	medium	Windows	1	27.47 s	19.61 s	1.40×	11.0 → 13.9 GB	—	pass
`cox_synth_4M_d90`	large	Windows	1	50.62 s	39.58 s	1.28×	19.4 → 24.6 GB	—	pass
`cox_synth_8M_d100`	ood_xlarge	Windows	1	3.91 min	1.84 min	2.12×	36.4 → 48.5 GB	—	pass
`cox_synth_1M_d150_lo`	ood_large	macOS	1	18.84 s	12.29 s	1.53×	9.7 → 10.7 GB	—	pass
`cox_synth_1M_d30`	small	macOS	1	3.29 s	2.19 s	1.50×	2.4 → 2.7 GB	—	pass
`cox_synth_4M_d50`	medium	macOS	1	26.21 s	15.06 s	1.74×	12.5 → 12.1 GB	—	pass
`cox_synth_4M_d90`	large	macOS	1	51.09 s	35.71 s	1.43×	19.6 → 17.7 GB	—	pass
`cox_synth_8M_d100`	ood_xlarge	macOS	1	2.68 min	1.97 min	1.36×	24.2 → 23.8 GB	—	pass

Frequently asked questions

Speeding up lifelines Cox

Why is lifelines Cox slow?

lifelines Cox is CPU-bound, and the stock implementation in lifelines leaves performance on the table in its core numerical work. On the benchmark datasets the original takes 3.91 min where the AutoZyme path takes 1.84 min (2.12× faster).

How do I make lifelines Cox faster?

Install AutoZyme and activate the lifelines patch, then keep using lifelines Cox exactly as before. AutoZyme transparently substitutes the faster, output-validated path, up to 2.12× faster on the benchmark datasets, with no pipeline or API changes.

Does the AutoZyme speedup change the lifelines Cox output?

Effectively no. The output is tolerance-equivalent: held within a frozen concordance gate (up to about 0.6% drift from the original lifelines result) on every benchmark dataset.

How do I install the lifelines speedup?

In Python: pip install autozyme, then import autozyme and autozyme.activate("lifelines"). The patch applies automatically the next time you call lifelines Cox.