Python examples demonstrating performance improvements using cython and numba
I came across an old post by jakevdp on Numba vs Cython. I thought I will revisit this topic because both Numba and Cython has matured significantly over this time period. In this post I am going to do two examples:
import numpy as np
import numba
import cython
%load_ext cython
import pandas as pd
numba.__version__, cython.__version__, np.__version__
('0.31.0', '0.24.1', '1.11.3')
X = np.random.random((1000, 3))
def pairwise_python(X):
M = X.shape[0]
N = X.shape[1]
D = np.empty((M, M), dtype=np.float)
for i in range(M):
for j in range(M):
d = 0.0
for k in range(N):
tmp = X[i, k] - X[j, k]
d += tmp * tmp
D[i, j] = np.sqrt(d)
return D
%timeit -n10 pairwise_python(X)
def pairwise_numpy(X):
return np.sqrt(((X[:, None, :] - X) ** 2).sum(-1))
%timeit -n10 pairwise_numpy(X)
pairwise_numba = numba.jit(pairwise_python)
%timeit -n10 pairwise_numba(X)
%%cython
import numpy as np
cimport cython
from libc.math cimport sqrt
@cython.boundscheck(False)
@cython.wraparound(False)
def pairwise_cython(double[:, ::1] X):
cdef int M = X.shape[0]
cdef int N = X.shape[1]
cdef double tmp, d
cdef double[:, ::1] D = np.empty((M, M), dtype=np.float64)
for i in range(M):
for j in range(M):
d = 0.0
for k in range(N):
tmp = X[i, k] - X[j, k]
d += tmp * tmp
D[i, j] = sqrt(d)
return np.asarray(D)
%timeit -n10 pairwise_cython(X)
The timiing for the results in Jake's post (2013) and the results from this post (2017) are summarized below.
df1 = pd.DataFrame({"Time (ms)": [13400,111, 9.12, 9.87], "Speedup": [1, 121, 1469, 1357]},
index=["Python", "Numpy", "Numba", "Cython"])
df2 = pd.DataFrame({"Time (ms)": [2470, 38.3, 4.04, 6.6], "Speedup": [1, 65, 611, 374]},
index=["Python", "Numpy", "Numba", "Cython"])
df = pd.concat([df1, df2], axis = 1, keys=(["2013", "2017"]))
df
The timings are speedup number for the 2013 and 2017 runs are very different due to differences in versions and perhaps even the python version. This post is using Py35 running in Windows. The take away here is that the numpy is atleast 2 orders of magnitude faster than python. And the numba and cython snippets are about an order of magnitude faster than numpy in both the benchmarks.
I will not rush to make any claims on numba vs cython. It is unclear what kinds of optimizations is used in the cython
magic. I would expect the cython
code to be as fast as C and perhaps some tweaking will help us get there. It is really interesting how easy it is to get performance boost from numba
. From an ease of use point of view, numba
is hands down winner in this simple example.
Here lets look at one more example. This is an amortizing payment calculation, such as in mortgage payments.
def amortize_payments_py(B0, R, term, cpr=0.0):
smm = 1. - pow(1 - cpr/100., 1/12.)
r = R/1200.
S = np.zeros(term)
P = np.zeros(term)
I = np.zeros(term)
B = np.zeros(term)
Pr = np.zeros(term)
Bt = B0
pow_term = pow(1+r, term)
A = Bt*r*pow_term/(pow_term - 1)
for i in range(term):
n = term-i
I[i] = Bt * r
Pr[i] = smm*Bt
S[i] = A-I[i] if Bt>1e-2 else 0.
P[i] = S[i] + Pr[i]
Bt = max(Bt - P[i], 0.0)
B[i] = Bt
return S,I, Pr,P, B
Here is the equivalent Cython function.
%%cython
cimport cython
import numpy as np
from libc.math cimport pow
@cython.boundscheck(False)
@cython.wraparound(False)
def amortize_payments_cy(double B0,double R,int term,double cpr=0.0):
cdef double smm = 1. - pow(1 - cpr/100., 1/12.)
cdef double r = R/1200.
cdef double[:] D = np.empty(term, dtype=np.float64)
cdef double[:] S = np.empty(term, dtype=np.float64)
cdef double[:] P = np.empty(term, dtype=np.float64)
cdef double[:] I = np.empty(term, dtype=np.float64)
cdef double[:] B = np.empty(term, dtype=np.float64)
cdef double[:] Pr = np.empty(term, dtype=np.float64)
cdef double Bt = B0
cdef double pow_term = pow(1+r, term)
cdef double A = Bt*r*pow_term/(pow_term - 1.)
cdef double n = term
cdef int i=0
for i in range(term):
n = term-i
I[i] = Bt * r
Pr[i] = smm*Bt
S[i] = A-I[i] if Bt>1e-2 else 0.
P[i] = S[i] + Pr[i]
Bt = max(Bt - P[i], 0.0)
B[i] = Bt
return np.asarray(S),np.asarray(I), np.asarray(Pr),np.asarray(P), np.asarray(B)
Here is the Numba version
amortize_payments_nb = numba.njit(cache=True)(amortize_payments_py)
Let's compare the performance of the three function types.
B0 = 500000.
R = 4.0
term = 360
Python
%timeit -n1000 S,I, Pr,P, B = amortize_payments_py(B0, R, term, cpr=10)
Numba
%timeit -n1000 S,I, Pr,P, B = amortize_payments_nb(B0, R, term, cpr=10)
Cython
%timeit -n1000 S,I, Pr,P, B = amortize_payments_cy(B0, R, term, cpr=10)
Once again we see that cython
and numba
methods are orders of magnitude (roughly 15X) faster than python version. The numba
version is almost 3X faster than Cython. There are probably some performance cython tweaks that I am missing perhaps.
In this post, I have redone Jake's pairwise distance example to see how the same code performs 3 years forward. I also discuss a common finance example of amortizing cashflow generation. The numba
and cython
snippets are orders of magnitude faster than a pure python version. Surprisingly, numba
is 20% to 300% faster than cython
on these examples. There may very well be some cython
tweaks I might be missing. But nevertheless these examples show how one can easily get performance boost using numba
module.