Benchmark charts
Speedup distribution
Each dot is one finalized dataset/thread run on WindowsThread sweep
Speedup across finalized thread counts on WindowsMemory
Baseline vs optimized peak memory on WindowsWhat is accelerated
This task targets xclim in xclim. The benchmarked result
preserves the declared scientific output gate while reducing CPU runtime on the listed datasets.
Also searched as: climate indices, climate indicators, climate.
Supported scope
Fast numba two-phase per-cell scan handles: op in {">", ">=", "gt", "ge"}; mid_date a valid "MM-DD" string that exists in the calendar; freq="YS" (annual, year-start); a DataArray with a "time" dimension on a daily axis (noleap OR Gregorian/standard calendars… Read full supported scope
Fast numba two-phase per-cell scan handles: op in {">", ">=", "gt", "ge"}; mid_date a valid "MM-DD" string that exists in the calendar; freq="YS" (annual, year-start); a DataArray with a "time" dimension on a daily axis (noleap OR Gregorian/standard calendars both work — year boundaries derived from times.year diffs, mid offsets from month/day matching). Within this scope the kernel faithfully reproduces upstream xclim season/season_length semantics: start = first run of `window` consecutive condition-true days whose run-start index is < mid_date; end = first run of `window` consecutive condition-false days at/after max(start, mid_date); length = end-beg, or (year_end - beg) when a start exists but no end is found. Verified against upstream xclim 0.60.0 run_length.season / first_run_before_date / first_run_after_date source: the start upper bound mid+window-1, the run_start<mid acceptance, the search_start=max(beg,mid) end search, and the size-minus-start fallback all match. Benchmarked tiers hit pct_exact=1.0 / max_abs_diff=0.
Out-of-scope behavior
silent possibly wrong
Show detailed speedup table 9 runs
| Dataset | Tier | Platform | Threads | Baseline | Optimized | Speedup | Memory | Concordance | Pass |
|---|---|---|---|---|---|---|---|---|---|
gregorian_smooth_550x550x50y | ood_xlarge | Windows | 1 | 5.09 min | 462 ms | 661.0× | 33.8 → 21.0 GB | — | pass |
gregorian_smooth_700x700x30y | ood_large | Windows | 1 | 6.09 min | 435 ms | 840.9× | 37.8 → 20.4 GB | — | pass |
synth_350x350x30y | small | Windows | 1 | 46.27 s | 126 ms | 367.4× | 9.7 → 5.5 GB | — | pass |
synth_500x500x30y | medium | Windows | 1 | 2.66 min | 276 ms | 577.3× | 19.4 → 10.5 GB | — | pass |
synth_700x700x30y | large | Windows | 1 | 8.33 min | 544 ms | 918.2× | 33.8 → 20.4 GB | — | pass |
gregorian_smooth_700x700x30y | ood_large | macOS | 1 | 5.13 min | 825 ms | 375.1× | 20.0 → 20.5 GB | — | pass |
synth_350x350x30y | small | macOS | 1 | 43.14 s | 152 ms | 288.1× | 7.5 → 5.6 GB | — | pass |
synth_500x500x30y | medium | macOS | 1 | 1.74 min | 246 ms | 424.9× | 12.2 → 10.7 GB | — | pass |
synth_700x700x30y | large | macOS | 1 | 5.29 min | 1.00 s | 316.3× | 14.3 → 20.4 GB | — | pass |
Frequently asked questions
Why is xclim slow?
xclim is CPU-bound, and the stock implementation in xclim leaves performance on the table in its core numerical work. On the benchmark datasets the original takes 8.33 min where the AutoZyme path takes 544 ms (918.2× faster).
How do I make xclim faster?
Install AutoZyme and activate the xclim patch, then keep using xclim exactly as before. AutoZyme transparently substitutes the faster, output-validated path, up to 918.2× faster on the benchmark datasets, with no pipeline or API changes.
Does the AutoZyme speedup change the xclim output?
No. The accelerated path returns bit-for-bit identical results to the original xclim implementation (maximum absolute difference 0), checked by a frozen concordance gate on every benchmark dataset.
How do I install the xclim speedup?
In Python: pip install autozyme, then import autozyme and autozyme.activate("xclim"). The patch applies automatically the next time you call xclim.