Astropy BoxLeastSquares is one of the slower steps in many astronomy workflows. AutoZyme ships a
verified, drop-in patch that is up to 6.69× faster, returning bit-for-bit identical results with no change to how you call it.
Best speedup6.69×
Median speedup6.51×
Output equivalenceBit-exact
Best runtime baseline 16.68 min → optimized 2.74 min
Datasets5
Pass rate9/9
Benchmark charts
Switch benchmark platform; all charts update together
Platform
Speedup distribution
Each dot is one finalized dataset/thread run on Windows
kepler11_q3_q4_months…
kepler11_q3_q4_months1_5_sc · ood_xlarge1 threads · 1.17× speedup15.40 min baseline → 15.66 min optimizedmemory 0.2 GB → 0.2 GBkepler11_q3_q4_months1_5_sc · ood_xlarge4 threads · 3.17× speedup21.08 min baseline → 5.77 min optimizedmemory 0.2 GB → 0.3 GBkepler11_q3_q4_months1_5_sc · ood_xlarge8 threads · 6.69× speedup16.68 min baseline → 2.74 min optimizedmemory 0.2 GB → 0.3 GB
6.69×
kepler9_q2_full_sc
kepler9_q2_full_sc · large1 threads · 1.09× speedup4.51 min baseline → 4.52 min optimizedmemory 0.1 GB → 0.1 GBkepler9_q2_full_sc · large4 threads · 3.38× speedup5.79 min baseline → 1.46 min optimizedmemory 0.1 GB → 0.2 GBkepler9_q2_full_sc · large8 threads · 6.51× speedup4.95 min baseline → 46.35 s optimizedmemory 0.1 GB → 0.2 GB
6.51×
kepler8_q2_full_sc
kepler8_q2_full_sc · ood_large1 threads · 1.17× speedup4.67 min baseline → 4.69 min optimizedmemory 0.1 GB → 0.1 GBkepler8_q2_full_sc · ood_large4 threads · 3.36× speedup5.69 min baseline → 1.63 min optimizedmemory 0.1 GB → 0.2 GBkepler8_q2_full_sc · ood_large8 threads · 6.30× speedup5.30 min baseline → 52.22 s optimizedmemory 0.1 GB → 0.2 GB
6.30×
kepler10_q3_months1_2…
kepler10_q3_months1_2_sc · medium1 threads · 0.99× speedup2.36 min baseline → 2.33 min optimizedmemory 0.1 GB → 0.1 GBkepler10_q3_months1_2_sc · medium4 threads · 2.79× speedup2.29 min baseline → 49.51 s optimizedmemory 0.1 GB → 0.2 GBkepler10_q3_months1_2_sc · medium8 threads · 6.17× speedup2.30 min baseline → 22.38 s optimizedmemory 0.1 GB → 0.2 GB
6.17×
kepler_tres2_q2_month…
kepler_tres2_q2_month1_sc · small1 threads · 0.99× speedup29.30 s baseline → 29.30 s optimizedmemory 0.1 GB → 0.1 GBkepler_tres2_q2_month1_sc · small4 threads · 2.77× speedup29.20 s baseline → 10.48 s optimizedmemory 0.1 GB → 0.1 GBkepler_tres2_q2_month1_sc · small8 threads · 5.78× speedup28.73 s baseline → 5.02 s optimizedmemory 0.1 GB → 0.1 GB
The public API stays the same; AutoZyme replaces only the supported fast path.
This task targets astropy.timeseries.BoxLeastSquares.autopower in astropy. The benchmarked result
preserves the declared scientific output gate while reducing CPU runtime on the listed datasets.
Also searched as: BLS, box least squares, periodogram, transit search, exoplanet, period search.
Supported scope
The patch replaces the native kernel astropy.timeseries.periodograms.bls.methods.bls_fast (the hot function under BoxLeastSquares.autopower/.power for method='fast').Read full supported scope
The patch replaces the native kernel astropy.timeseries.periodograms.bls.methods.bls_fast (the hot function under BoxLeastSquares.autopower/.power for method='fast'). It is an embarrassingly-parallel split of the upstream bls_fast over the trial-period axis: it slices period[chunk] into n_workers disjoint contiguous ranges, calls the captured upstream _orig_bls_fast (bls_impl) on each slice with the SAME (t, y, ivar, duration, oversample, use_likelihood) it received, then re-assembles the 7-tuple result by field-wise np.concatenate in original period order. Because each trial period's BLS computation in bls_impl is independent of every other period, this reproduces upstream bit-exactly (speedups_finalized.tsv shows pearson_power=1.0, q99_rel_diff_power=0.0, rel_peak_period_err=0.0, top10_peak_jaccard=1.0). Crucially, the patch receives oversample and use_likelihood as already-resolved arguments and passes them through unchanged, so it correctly handles ANY objective ('likelihood' or 'snr'), ANY oversample>=1, ANY duration array, and any period grid produced by autopower/autoperiod (any minimum_n_transit/minimum_period/maximum_period/frequency_factor). method='slow' is NOT patched (only bls_fast is in targets=), so method='slow' transparently uses unmodified upstream bls_slow. The parallel path only engages when len(period) >= 2*16384 (32768) AND the resolved worker count (_thread_count) >= 2; otherwise (single-thread env or small grid) it returns the original bls_fast result unchanged. Threads resolve from ZYME_THREADS/AUTOZYME_THREADS/AUTOZYMER_THREADS/OMP_NUM_THREADS, else os.cpu_count()+6.
Out-of-scope behavior
silent fallback to upstream
Show detailed speedup table9 runs▾
Dataset
Tier
Platform
Threads
Baseline
Optimized
Speedup
Memory
Concordance
Pass
kepler_tres2_q2_month1_sc
small
Windows
8
28.73 s
5.02 s
5.78×
0.1 → 0.1 GB
—
pass
kepler10_q3_months1_2_sc
medium
Windows
8
2.30 min
22.38 s
6.17×
0.1 → 0.2 GB
—
pass
kepler11_q3_q4_months1_5_sc
ood_xlarge
Windows
8
16.68 min
2.74 min
6.69×
0.2 → 0.3 GB
—
pass
kepler8_q2_full_sc
ood_large
Windows
8
5.30 min
52.22 s
6.30×
0.1 → 0.2 GB
—
pass
kepler9_q2_full_sc
large
Windows
8
4.95 min
46.35 s
6.51×
0.1 → 0.2 GB
—
pass
kepler_tres2_q2_month1_sc
small
macOS
8
27.77 s
3.64 s
7.64×
0.1 → 0.1 GB
—
pass
kepler10_q3_months1_2_sc
medium
macOS
8
2.58 min
19.64 s
7.84×
0.1 → 0.2 GB
—
pass
kepler8_q2_full_sc
ood_large
macOS
4
5.44 min
1.41 min
3.89×
0.1 → 0.2 GB
—
pass
kepler9_q2_full_sc
large
macOS
8
5.22 min
39.73 s
7.79×
0.1 → 0.2 GB
—
pass
Frequently asked questions
Speeding up Astropy BoxLeastSquares
Why is Astropy BoxLeastSquares slow?
Astropy BoxLeastSquares is CPU-bound, and the stock implementation in astropy leaves performance on the table in its core numerical work. On the benchmark datasets the original takes 16.68 min where the AutoZyme path takes 2.74 min (6.69× faster).
How do I make Astropy BoxLeastSquares faster?
Install AutoZyme and activate the astropy patch, then keep using Astropy BoxLeastSquares exactly as before. AutoZyme transparently substitutes the faster, output-validated path, up to 6.69× faster on the benchmark datasets, with no pipeline or API changes.
Does the AutoZyme speedup change the Astropy BoxLeastSquares output?
No. The accelerated path returns bit-for-bit identical results to the original astropy implementation (maximum absolute difference 0), checked by a frozen concordance gate on every benchmark dataset.
How do I install the astropy speedup?
In Python: pip install autozyme, then import autozyme and autozyme.activate("astropy"). The patch applies automatically the next time you call astropy.timeseries.BoxLeastSquares.autopower.