Python Molecular & structural biology ProDy

Speed up ProDy

ProDy is one of the slower steps in many molecular & structural biology workflows. AutoZyme ships a verified, drop-in patch that is up to 180.7× faster, returning bit-for-bit identical results with no change to how you call it.

Best speedup 180.7×
Median speedup 48.3×
Output equivalence Bit-exact
Best runtime baseline 60.02 min optimized 19.94 s
Datasets 5
Pass rate 10/10

Benchmark charts

Switch benchmark platform; all charts update together
Platform
Speedup distribution
Each dot is one finalized dataset/thread run on Windows
log scale
7b0u6tlj1aon5gar1oel
Thread sweep
Speedup across finalized thread counts on Windows
No finalized multi-thread sweep for this platform.
Memory
Baseline vs optimized peak memory on Windows
0.0 GB25 GB50 GB7b0u0.06×6tlj0.04×1aon0.04×5gar0.07×1oel0.12×7b0u · ood_xlargememory 24 GB → 1.4 GBoptimized / baseline 0.06×180.7× speedup · 1 threads6tlj · ood_largememory 12 GB → 0.4 GBoptimized / baseline 0.04×108.8× speedup · 1 threads1aon · largememory 8.8 GB → 0.4 GBoptimized / baseline 0.04×81.5× speedup · 1 threads5gar · mediummemory 4.8 GB → 0.3 GBoptimized / baseline 0.07×45.3× speedup · 1 threads1oel · smallmemory 1.9 GB → 0.2 GBoptimized / baseline 0.12×34.7× speedup · 1 threads
baselineoptimized

What is accelerated

The public API stays the same; AutoZyme replaces only the supported fast path.

This task targets ProDy in ProDy. The benchmarked result preserves the declared scientific output gate while reducing CPU runtime on the listed datasets.

Also searched as: protein dynamics, normal mode analysis, NMA, elastic network model.

Supported scope

The patch replaces two functions (prody.dynamics.anm.ANMBase.buildHessian and prody.dynamics.anm.solveEig); calcANM/calcModes route through them. Read full supported scope

The patch replaces two functions (prody.dynamics.anm.ANMBase.buildHessian and prody.dynamics.anm.solveEig); calcANM/calcModes route through them. buildHessian is general and mathematically exact: it builds the Hessian in float64 unconditionally, supports AtomGroup/Atomic inputs (via _getCoords/getCoords) and raw numpy coord arrays (via checkCoords), any cutoff>0 (validated by checkENMParameters), and both scalar gamma and callable Gamma objects (callable invoked as gamma(dist2,i,j), matching upstream exactly, incl. squared-distance arg). It also restores a real sparse CSR Kirchhoff matrix and attaches the build coords to the sparse Hessian instance (no global state). solveEig has a deterministic eigsh shift-invert path (eigsh sigma=-1e-8, which=LM, tol=1e-10) that handles any sparse M with a finite integer n_modes < dof for both zeros=False and zeros=True, with an internal guard that defers to upstream if final_n_modes exceeds available values. The LOBPCG accelerated path (the fast path actually measured) is taken only for the standard ANM config: M sparse, reverse=False, integer n_modes < dof, zeros=False, expct_n_zeros==6, build coords attached, coords.shape[0]*3==dof, and coords.shape[0] < 12000; its result is accepted only after a residual-norm gate (max scaled residual <= 1e-3) and otherwise falls through to the deterministic eigsh path. Benchmark tiers 1OEL/5GAR/1aon/6TLJ (3668–9200 Cα) exercise LOBPCG; the 7B0U OOD tier (13320 Cα) is >=12000 so it uses the eigsh shift-invert path, not LOBPCG.

Out-of-scope behavior

silent fallback to upstream

Show detailed speedup table 10 runs
Dataset Tier Platform Threads Baseline Optimized Speedup Memory Concordance Pass
1aon large Windows 1 13.30 min 9.79 s 81.5× 8.8 → 0.4 GB pass
1oel small Windows 1 1.27 min 2.20 s 34.7× 1.9 → 0.2 GB pass
5gar medium Windows 1 5.21 min 6.89 s 45.3× 4.8 → 0.3 GB pass
6tlj ood_large Windows 1 19.87 min 10.96 s 108.8× 11.6 → 0.4 GB pass
7b0u ood_xlarge Windows 1 60.02 min 19.94 s 180.7× 24.0 → 1.4 GB pass
1aon large macOS 1 9.46 min 6.37 s 89.1× 8.9 → 1.0 GB pass
1oel small macOS 1 59.37 s 1.62 s 36.6× 2.1 → 0.5 GB pass
5gar medium macOS 1 1.86 min 4.52 s 24.6× 5.2 → 0.8 GB pass
6tlj ood_large macOS 1 6.37 min 7.50 s 51.0× 8.5 → 1.3 GB pass
7b0u ood_xlarge macOS 1 19.67 min 25.86 s 45.6× 20.4 → 1.7 GB pass

Frequently asked questions

Speeding up ProDy
Why is ProDy slow?

ProDy is CPU-bound, and the stock implementation in ProDy leaves performance on the table in its core numerical work. On the benchmark datasets the original takes 60.02 min where the AutoZyme path takes 19.94 s (180.7× faster).

How do I make ProDy faster?

Install AutoZyme and activate the ProDy patch, then keep using ProDy exactly as before. AutoZyme transparently substitutes the faster, output-validated path, up to 180.7× faster on the benchmark datasets, with no pipeline or API changes.

Does the AutoZyme speedup change the ProDy output?

No. The accelerated path returns bit-for-bit identical results to the original ProDy implementation (maximum absolute difference 0), checked by a frozen concordance gate on every benchmark dataset.

How do I install the ProDy speedup?

In Python: pip install autozyme, then import autozyme and autozyme.activate("prody"). The patch applies automatically the next time you call ProDy.