Benchmark charts
Speedup distribution
Each dot is one finalized dataset/thread run on WindowsThread sweep
Speedup across finalized thread counts on WindowsMemory
Baseline vs optimized peak memory on WindowsWhat is accelerated
This task targets lifelines · Cox in lifelines. The benchmarked result
preserves the declared scientific output gate while reducing CPU runtime on the listed datasets.
Also searched as: survival analysis, cox regression, cox proportional hazards, hazard ratio, kaplan meier, time to event.
Supported scope
The patch replaces four SemiParametricPHFitter methods (the breslow/semi-parametric Cox path used by default CoxPHFitter). Read full supported scope
The patch replaces four SemiParametricPHFitter methods (the breslow/semi-parametric Cox path used by default CoxPHFitter). It is mathematically faithful to upstream for the default unweighted, unstratified, non-left-truncated Cox fit on a clean dense numeric DataFrame with the identity (default) formula. (1) fast_get_efron_values_batch / _efron_kernel_jit reproduces upstream _get_efron_values_batch exactly (Efron tied-time partial likelihood, per-observation weights, full d×d Hessian + gradient + log-lik every Newton step), so variance_matrix_/SE/p-values/CI inference stays valid. The kernel is ONLY invoked when lifelines' own _BatchVsSingle().decide() chooses 'batch' (or batch_mode=True); when 'single' is chosen the unpatched upstream _get_efron_values_single runs (correct). The benchmark datasets have many tied integer durations, forcing the batch path. (2) penalizer (L1/L2 elastic-net) and strata are applied OUTSIDE the kernel by the unpatched _newton_raphson_for_efron_model / _partition_by_strata_and_apply, so penalized and stratified fits remain correct (kernel just returns per-stratum h/g/ll). (3) fast_preprocess_dataframe stashes a contiguous float64 view and overrides X.mean/X.std with numpy equivalents using ddof=1 (matches pandas std). (4) safe_check_pre_fitting falls back to full upstream validation whenever weights_col or entry_col is set, any non-numeric dtype, any non-finite value, or any exception; only fully-clean numeric unweighted/non-truncated frames take the cheap finite-check path. (5) fast_predict_log_partial_hazard runs only post-fit (outside the timed window) and falls back to upstream for pandas Series input.
Out-of-scope behavior
silent fallback to upstream
Show detailed speedup table 10 runs
| Dataset | Tier | Platform | Threads | Baseline | Optimized | Speedup | Memory | Concordance | Pass |
|---|---|---|---|---|---|---|---|---|---|
cox_synth_1M_d150_lo | ood_large | Windows | 1 | 21.37 s | 20.34 s | 1.05× | 8.1 → 10.3 GB | — | pass |
cox_synth_1M_d30 | small | Windows | 1 | 3.64 s | 2.13 s | 1.71× | 1.8 → 2.3 GB | — | pass |
cox_synth_4M_d50 | medium | Windows | 1 | 27.47 s | 19.61 s | 1.40× | 11.0 → 13.9 GB | — | pass |
cox_synth_4M_d90 | large | Windows | 1 | 50.62 s | 39.58 s | 1.28× | 19.4 → 24.6 GB | — | pass |
cox_synth_8M_d100 | ood_xlarge | Windows | 1 | 3.91 min | 1.84 min | 2.12× | 36.4 → 48.5 GB | — | pass |
cox_synth_1M_d150_lo | ood_large | macOS | 1 | 18.84 s | 12.29 s | 1.53× | 9.7 → 10.7 GB | — | pass |
cox_synth_1M_d30 | small | macOS | 1 | 3.29 s | 2.19 s | 1.50× | 2.4 → 2.7 GB | — | pass |
cox_synth_4M_d50 | medium | macOS | 1 | 26.21 s | 15.06 s | 1.74× | 12.5 → 12.1 GB | — | pass |
cox_synth_4M_d90 | large | macOS | 1 | 51.09 s | 35.71 s | 1.43× | 19.6 → 17.7 GB | — | pass |
cox_synth_8M_d100 | ood_xlarge | macOS | 1 | 2.68 min | 1.97 min | 1.36× | 24.2 → 23.8 GB | — | pass |
Frequently asked questions
Why is lifelines Cox slow?
lifelines Cox is CPU-bound, and the stock implementation in lifelines leaves performance on the table in its core numerical work. On the benchmark datasets the original takes 3.91 min where the AutoZyme path takes 1.84 min (2.12× faster).
How do I make lifelines Cox faster?
Install AutoZyme and activate the lifelines patch, then keep using lifelines Cox exactly as before. AutoZyme transparently substitutes the faster, output-validated path, up to 2.12× faster on the benchmark datasets, with no pipeline or API changes.
Does the AutoZyme speedup change the lifelines Cox output?
Effectively no. The output is tolerance-equivalent: held within a frozen concordance gate (up to about 0.6% drift from the original lifelines result) on every benchmark dataset.
How do I install the lifelines speedup?
In Python: pip install autozyme, then import autozyme and autozyme.activate("lifelines"). The patch applies automatically the next time you call lifelines Cox.