Speed up xclim: up to 918.2× faster, identical output

Q: Why is xclim slow?

xclim is CPU-bound, and the stock implementation in xclim leaves performance on the table in its core numerical work. On the benchmark datasets the original takes 8.33 min where the AutoZyme path takes 544 ms (918.2× faster).

Q: How do I make xclim faster?

Install AutoZyme and activate the xclim patch, then keep using xclim exactly as before. AutoZyme transparently substitutes the faster, output-validated path, up to 918.2× faster on the benchmark datasets, with no pipeline or API changes.

Q: Does the AutoZyme speedup change the xclim output?

No. The accelerated path returns bit-for-bit identical results to the original xclim implementation (maximum absolute difference 0), checked by a frozen concordance gate on every benchmark dataset.

Q: How do I install the xclim speedup?

In Python: pip install autozyme, then import autozyme and autozyme.activate("xclim"). The patch applies automatically the next time you call xclim.

Benchmark charts

Switch benchmark platform; all charts update together

Speedup distribution

Each dot is one finalized dataset/thread run on Windows

log scale

synth_700x700x30y

918.2×

gregorian_smooth_700x…

840.9×

gregorian_smooth_550x…

661.0×

synth_500x500x30y

577.3×

synth_350x350x30y

367.4×

synth_700x700x30ygregorian_smooth_700x…gregorian_smooth_550x…synth_500x500x30ysynth_350x350x30y

Thread sweep

Speedup across finalized thread counts on Windows

No finalized multi-thread sweep for this platform.

Memory

Baseline vs optimized peak memory on Windows

baselineoptimized

What is accelerated

The public API stays the same; AutoZyme replaces only the supported fast path.

This task targets xclim in xclim. The benchmarked result preserves the declared scientific output gate while reducing CPU runtime on the listed datasets.

Also searched as: climate indices, climate indicators, climate.

Supported scope

Fast numba two-phase per-cell scan handles: op in {">", ">=", "gt", "ge"}; mid_date a valid "MM-DD" string that exists in the calendar; freq="YS" (annual, year-start); a DataArray with a "time" dimension on a daily axis (noleap OR Gregorian/standard calendars… Read full supported scope

Fast numba two-phase per-cell scan handles: op in {">", ">=", "gt", "ge"}; mid_date a valid "MM-DD" string that exists in the calendar; freq="YS" (annual, year-start); a DataArray with a "time" dimension on a daily axis (noleap OR Gregorian/standard calendars both work — year boundaries derived from times.year diffs, mid offsets from month/day matching). Within this scope the kernel faithfully reproduces upstream xclim season/season_length semantics: start = first run of `window` consecutive condition-true days whose run-start index is < mid_date; end = first run of `window` consecutive condition-false days at/after max(start, mid_date); length = end-beg, or (year_end - beg) when a start exists but no end is found. Verified against upstream xclim 0.60.0 run_length.season / first_run_before_date / first_run_after_date source: the start upper bound mid+window-1, the run_start<mid acceptance, the search_start=max(beg,mid) end search, and the size-minus-start fallback all match. Benchmarked tiers hit pct_exact=1.0 / max_abs_diff=0.

Out-of-scope behavior

silent possibly wrong

Show detailed speedup table 9 runs

Dataset	Tier	Platform	Threads	Baseline	Optimized	Speedup	Memory	Concordance	Pass
`gregorian_smooth_550x550x50y`	ood_xlarge	Windows	1	5.09 min	462 ms	661.0×	33.8 → 21.0 GB	—	pass
`gregorian_smooth_700x700x30y`	ood_large	Windows	1	6.09 min	435 ms	840.9×	37.8 → 20.4 GB	—	pass
`synth_350x350x30y`	small	Windows	1	46.27 s	126 ms	367.4×	9.7 → 5.5 GB	—	pass
`synth_500x500x30y`	medium	Windows	1	2.66 min	276 ms	577.3×	19.4 → 10.5 GB	—	pass
`synth_700x700x30y`	large	Windows	1	8.33 min	544 ms	918.2×	33.8 → 20.4 GB	—	pass
`gregorian_smooth_700x700x30y`	ood_large	macOS	1	5.13 min	825 ms	375.1×	20.0 → 20.5 GB	—	pass
`synth_350x350x30y`	small	macOS	1	43.14 s	152 ms	288.1×	7.5 → 5.6 GB	—	pass
`synth_500x500x30y`	medium	macOS	1	1.74 min	246 ms	424.9×	12.2 → 10.7 GB	—	pass
`synth_700x700x30y`	large	macOS	1	5.29 min	1.00 s	316.3×	14.3 → 20.4 GB	—	pass

Frequently asked questions

Speeding up xclim

Why is xclim slow?

xclim is CPU-bound, and the stock implementation in xclim leaves performance on the table in its core numerical work. On the benchmark datasets the original takes 8.33 min where the AutoZyme path takes 544 ms (918.2× faster).

How do I make xclim faster?

Install AutoZyme and activate the xclim patch, then keep using xclim exactly as before. AutoZyme transparently substitutes the faster, output-validated path, up to 918.2× faster on the benchmark datasets, with no pipeline or API changes.

Does the AutoZyme speedup change the xclim output?

No. The accelerated path returns bit-for-bit identical results to the original xclim implementation (maximum absolute difference 0), checked by a frozen concordance gate on every benchmark dataset.

How do I install the xclim speedup?

In Python: pip install autozyme, then import autozyme and autozyme.activate("xclim"). The patch applies automatically the next time you call xclim.