Python Earth & atmospheric sciences sarsen

Speed up sarsen

sarsen is one of the slower steps in many earth & atmospheric sciences workflows. AutoZyme ships a verified, drop-in patch that is up to 26.5× faster, returning bit-for-bit identical results with no change to how you call it.

Best speedup 26.5×
Median speedup 10.5×
Output equivalence Bit-exact
Best runtime baseline 7.72 min optimized 17.50 s
Datasets 4
Pass rate 4/4

Benchmark charts

Static evidence view; finalized rows for the benchmarked platform
Speedup distribution
Each dot is one finalized dataset/thread run on Mac
log scale
rome_13krome_10krome_8krome_grd_gamma_12k
Thread sweep
Speedup across finalized thread counts on Mac
25×50×14full (8)rome_13k · large1 threads · 14.6× speedup7.72 min baseline → 31.78 s optimizedmemory 26 GB → 21 GBrome_13k · large4 threads · 26.5× speedup7.72 min baseline → 17.50 s optimizedmemory 26 GB → 22 GBrome_10k · medium1 threads · 5.52× speedup1.76 min baseline → 18.88 s optimizedmemory 25 GB → 17 GBrome_10k · medium4 threads · 10.4× speedup1.72 min baseline → 9.98 s optimizedmemory 25 GB → 17 GBrome_10k · medium8 threads · 11.9× speedup1.75 min baseline → 8.78 s optimizedmemory 25 GB → 18 GBrome_8k · small1 threads · 4.80× speedup57.77 s baseline → 12.10 s optimizedmemory 20 GB → 12 GBrome_8k · small4 threads · 8.27× speedup58.39 s baseline → 7.02 s optimizedmemory 20 GB → 12 GBrome_8k · small8 threads · 9.19× speedup57.63 s baseline → 6.31 s optimizedmemory 20 GB → 13 GBrome_grd_gamma_12k · ood_large1 threads · 4.70× speedup6.19 min baseline → 1.32 min optimizedmemory 26 GB → 22 GBrome_grd_gamma_12k · ood_large4 threads · 5.34× speedup6.19 min baseline → 1.16 min optimizedmemory 26 GB → 22 GB
rome_13krome_10krome_8krome_grd_gamma_12k
Memory
Baseline vs optimized peak memory on Mac
0.0 GB25 GB50 GBrome_13k0.85×rome_grd_gamma_12k0.87×rome_10k0.70×rome_8k0.62×rome_13k · largememory 26 GB → 22 GBoptimized / baseline 0.85×26.5× speedup · 4 threadsrome_grd_gamma_12k · ood_largememory 26 GB → 22 GBoptimized / baseline 0.87×5.34× speedup · 4 threadsrome_10k · mediummemory 25 GB → 18 GBoptimized / baseline 0.70×11.9× speedup · 8 threadsrome_8k · smallmemory 20 GB → 13 GBoptimized / baseline 0.62×9.19× speedup · 8 threads
baselineoptimized

What is accelerated

The public API stays the same; AutoZyme replaces only the supported fast path.

This task targets sarsen in sarsen. The benchmarked result preserves the declared scientific output gate while reducing CPU runtime on the listed datasets.

Also searched as: SAR, synthetic aperture radar, terrain correction, sentinel-1, radar.

Supported scope

Correct (bit-exact, pass_rate 1.0, max_abs_diff 0.0) for: GRD products (GroundRangeSarProduct) processed in-memory with chunks=None, interp_method="nearest" (the upstream default), correct_radiometry in {None (GTC), "gamma_nearest"} — note "gamma_nearest"… Read full supported scope

Correct (bit-exact, pass_rate 1.0, max_abs_diff 0.0) for: GRD products (GroundRangeSarProduct) processed in-memory with chunks=None, interp_method="nearest" (the upstream default), correct_radiometry in {None (GTC), "gamma_nearest"} — note "gamma_nearest" still goes through upstream do_terrain_correction radiometry chain but calls the patched simulate_acquisition/orbit/geocoding helpers, and verifies bit-exact at ood_large. The patch replaces 9 internal helpers (transform_dem_3d, convert_to_dem_3d, slant_range_time_to_ground_range, the three OrbitPolyfitInterpolator polyval fits, the two zero_doppler Newton kernels, simulate_acquisition, GroundRangeSarProduct.interp_sar, Sentinel1SarProduct.beta_nought) plus a beta_nought process-local cache; the public terrain_correction wrapper only scopes xr.set_options(use_bottleneck=False) and delegates to upstream. Verified upstream version sarsen 0.9.6.dev5+g6c5e37d1d on the Rome DEM / S1B GRD IW/VV product.

Out-of-scope behavior

silent fallback to upstream

Show detailed speedup table 4 runs
Dataset Tier Platform Threads Baseline Optimized Speedup Memory Concordance Pass
rome_10k medium macOS 8 1.75 min 8.78 s 11.9× 25.2 → 17.7 GB pass
rome_13k large macOS 4 7.72 min 17.50 s 26.5× 26.3 → 22.4 GB pass
rome_8k small macOS 8 57.63 s 6.31 s 9.19× 20.3 → 12.5 GB pass
rome_grd_gamma_12k ood_large macOS 4 6.19 min 1.16 min 5.34× 25.6 → 22.3 GB pass

Frequently asked questions

Speeding up sarsen
Why is sarsen slow?

sarsen is CPU-bound, and the stock implementation in sarsen leaves performance on the table in its core numerical work. On the benchmark datasets the original takes 7.72 min where the AutoZyme path takes 17.50 s (26.5× faster).

How do I make sarsen faster?

Install AutoZyme and activate the sarsen patch, then keep using sarsen exactly as before. AutoZyme transparently substitutes the faster, output-validated path, up to 26.5× faster on the benchmark datasets, with no pipeline or API changes.

Does the AutoZyme speedup change the sarsen output?

No. The accelerated path returns bit-for-bit identical results to the original sarsen implementation (maximum absolute difference 0), checked by a frozen concordance gate on every benchmark dataset.

How do I install the sarsen speedup?

In Python: pip install autozyme, then import autozyme and autozyme.activate("sarsen"). The patch applies automatically the next time you call sarsen.