Python examples demonstrating performance improvements using cython and numba

I came across an old post by jakevdp on Numba vs Cython. I thought I will revisit this topic because both Numba and Cython has matured significantly over this time period. In this post I am going to do two examples:

- Pairwise distance estimation example that Jake discusses. The intention is to see how the maturity of these projects has contributed to improvements.
- A simple cashflow payment calculation of an amortizing bond or mortgage payments. This a calculation that cannot be vectorized in a numpy sense. So the speedups would have to come from optimizing loops using tools like Numba or Cython.

In [1]:

```
import numpy as np
import numba
import cython
%load_ext cython
import pandas as pd
numba.__version__, cython.__version__, np.__version__
```

Out[1]:

('0.31.0', '0.24.1', '1.11.3')

In [2]:

```
X = np.random.random((1000, 3))
```

In [3]:

```
def pairwise_python(X):
M = X.shape[0]
N = X.shape[1]
D = np.empty((M, M), dtype=np.float)
for i in range(M):
for j in range(M):
d = 0.0
for k in range(N):
tmp = X[i, k] - X[j, k]
d += tmp * tmp
D[i, j] = np.sqrt(d)
return D
%timeit -n10 pairwise_python(X)
```

In [4]:

```
def pairwise_numpy(X):
return np.sqrt(((X[:, None, :] - X) ** 2).sum(-1))
%timeit -n10 pairwise_numpy(X)
```

In [5]:

```
pairwise_numba = numba.jit(pairwise_python)
%timeit -n10 pairwise_numba(X)
```

In [6]:

```
%%cython
import numpy as np
cimport cython
from libc.math cimport sqrt
@cython.boundscheck(False)
@cython.wraparound(False)
def pairwise_cython(double[:, ::1] X):
cdef int M = X.shape[0]
cdef int N = X.shape[1]
cdef double tmp, d
cdef double[:, ::1] D = np.empty((M, M), dtype=np.float64)
for i in range(M):
for j in range(M):
d = 0.0
for k in range(N):
tmp = X[i, k] - X[j, k]
d += tmp * tmp
D[i, j] = sqrt(d)
return np.asarray(D)
```

In [7]:

```
%timeit -n10 pairwise_cython(X)
```

In [8]:

```
df1 = pd.DataFrame({"Time (ms)": [13400,111, 9.12, 9.87], "Speedup": [1, 121, 1469, 1357]},
index=["Python", "Numpy", "Numba", "Cython"])
df2 = pd.DataFrame({"Time (ms)": [2470, 38.3, 4.04, 6.6], "Speedup": [1, 65, 611, 374]},
index=["Python", "Numpy", "Numba", "Cython"])
df = pd.concat([df1, df2], axis = 1, keys=(["2013", "2017"]))
df
```

Out[8]:

The timings are speedup number for the 2013 and 2017 runs are very different due to differences in versions and perhaps even the python version. This post is using Py35 running in Windows. The take away here is that the numpy is atleast 2 orders of magnitude faster than python. And the numba and cython snippets are about an order of magnitude faster than numpy in both the benchmarks.

I will not rush to make any claims on numba vs cython. It is unclear what kinds of optimizations is used in the `cython`

magic. I would expect the `cython`

code to be as fast as C and perhaps some tweaking will help us get there. It is really interesting how easy it is to get performance boost from `numba`

. From an ease of use point of view, `numba`

is hands down winner in this simple example.

In [9]:

```
def amortize_payments_py(B0, R, term, cpr=0.0):
smm = 1. - pow(1 - cpr/100., 1/12.)
r = R/1200.
S = np.zeros(term)
P = np.zeros(term)
I = np.zeros(term)
B = np.zeros(term)
Pr = np.zeros(term)
Bt = B0
pow_term = pow(1+r, term)
A = Bt*r*pow_term/(pow_term - 1)
for i in range(term):
n = term-i
I[i] = Bt * r
Pr[i] = smm*Bt
S[i] = A-I[i] if Bt>1e-2 else 0.
P[i] = S[i] + Pr[i]
Bt = max(Bt - P[i], 0.0)
B[i] = Bt
return S,I, Pr,P, B
```

Here is the equivalent Cython function.

In [10]:

```
%%cython
cimport cython
import numpy as np
from libc.math cimport pow
@cython.boundscheck(False)
@cython.wraparound(False)
def amortize_payments_cy(double B0,double R,int term,double cpr=0.0):
cdef double smm = 1. - pow(1 - cpr/100., 1/12.)
cdef double r = R/1200.
cdef double[:] D = np.empty(term, dtype=np.float64)
cdef double[:] S = np.empty(term, dtype=np.float64)
cdef double[:] P = np.empty(term, dtype=np.float64)
cdef double[:] I = np.empty(term, dtype=np.float64)
cdef double[:] B = np.empty(term, dtype=np.float64)
cdef double[:] Pr = np.empty(term, dtype=np.float64)
cdef double Bt = B0
cdef double pow_term = pow(1+r, term)
cdef double A = Bt*r*pow_term/(pow_term - 1.)
cdef double n = term
cdef int i=0
for i in range(term):
n = term-i
I[i] = Bt * r
Pr[i] = smm*Bt
S[i] = A-I[i] if Bt>1e-2 else 0.
P[i] = S[i] + Pr[i]
Bt = max(Bt - P[i], 0.0)
B[i] = Bt
return np.asarray(S),np.asarray(I), np.asarray(Pr),np.asarray(P), np.asarray(B)
```

Here is the Numba version

In [11]:

```
amortize_payments_nb = numba.njit(cache=True)(amortize_payments_py)
```

Let's compare the performance of the three function types.

In [12]:

```
B0 = 500000.
R = 4.0
term = 360
```

** Python**

In [17]:

```
%timeit -n1000 S,I, Pr,P, B = amortize_payments_py(B0, R, term, cpr=10)
```

** Numba **

In [18]:

```
%timeit -n1000 S,I, Pr,P, B = amortize_payments_nb(B0, R, term, cpr=10)
```

** Cython **

In [19]:

```
%timeit -n1000 S,I, Pr,P, B = amortize_payments_cy(B0, R, term, cpr=10)
```

`cython`

and `numba`

methods are orders of magnitude (roughly 15X) faster than python version. The `numba`

version is almost 3X faster than Cython. There are probably some performance cython tweaks that I am missing perhaps.

In this post, I have redone Jake's pairwise distance example to see how the same code performs 3 years forward. I also discuss a common finance example of amortizing cashflow generation. The `numba`

and `cython`

snippets are orders of magnitude faster than a pure python version. Surprisingly, `numba`

is 20% to 300% faster than `cython`

on these examples. There may very well be some `cython`

tweaks I might be missing. But nevertheless these examples show how one can easily get performance boost using `numba`

module.

python programming development pandas numpy numba

- Numpy Vs Pandas Performance Comparison
- Multi-Processing with Pandas and Dask
- Multi-Processing With Pandas
- Running ZEO as a Windows Service
- QuantLib Python Notebooks On Docker