January 24, 2023 numpy python optimisation ☕️ buy me a coffee

I was profiling some code I had written where `np.clip`

was being called a couple hundreds of millions of times. The profiler was flagging some weird related functions that seemed unimportant so I delved deeper. Turns out that as of numpy 1.17, the `np.clip`

has been substantially slower than previously which has been documented here. This caught my curiosity and after some experimentation I found two substantially faster methods which in turn sped up my code by $50\%$!

Below is my log of what I tried, but if you just want results then here is your link.

**Note**: These tests were undertaken using numpy version `1.23.5`

. np clip might have been updated since.

```
VMIN = -0.5
VMAX = 0.5
X_scalar = 2 * np.random.rand() - 0.5
```

```
np.clip(X_scalar, -0.5, 0.5)
> 11.3 µs ± 202 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)
```

Testing the baseline gave us a fairly decent time of $11.3\mu s$.

```
np.core.umath.maximum(np.core.umath.minimum(X_scalar, VMAX), VMIN)
> 2.25 µs ± 64.6 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)
```

However, we quickly got huge gains by using max/min functions directly.

```
np.max([np.min([X_scalar, VMAX]), VMIN])
> 8.74 µs ± 141 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)
```

Using the normal max/min functions proved still faster but not nearly as fast as calling the `umath`

functions directly.

```
VMIN if X_scalar < VMIN else VMAX if X_scalar > VMAX else X_scalar
> 75.2 ns ± 7.8 ns per loop (mean ± std. dev. of 7 runs, 10,000,000 loops each)
```

For scalars, there is not much faster than direct logic in python it seems. Huge gains!

```
X_array = 2 * (np.random.rand(50000) - VMAX)
```

Again, testing the baseline:

```
np.clip(X_array, VMIN, VMAX)
> 172 µs ± 646 ns per loop (mean ± std. dev. of 7 runs, 10,000 loops each)
```

This gave us a very slow time of $172\mu s$ which isn’t good enough. Since using the usual min/max functions proved fruitless in the scalar case, we skip it here.

```
np.core.umath.maximum(np.core.umath.minimum(X_array, VMAX), VMIN)
> 40.9 µs ± 388 ns per loop (mean ± std. dev. of 7 runs, 10,000 loops each)
```

Again, the `umath`

functions were faster.

```
np.asarray([VMIN if Xi < VMIN else VMAX if Xi > VMAX else Xi for Xi in X_array])
> 10 ms ± 140 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
```

And (as expected honestly) the raw python variation was horribly slow.

For scalars, using `VMIN if X < VMIN else VMAX if X > VMAX else X`

is 150x faster than normal `np.clip`

. However, using raw python for arrays is (expectedly) horrible. For this case, using `np.core.umath.maximum(np.core.umath.minimum(X, VMAX), VMIN)`

is roughly 4x faster than `np.clip`

.

I got most of these ideas from numpy#14281 but wanted to verify them for myself.