In this blog post, we will find out why numpy array is faster than normal python list.
What is NumPy?
NumPy allows us for efficient manipulation of homogeneous numerical data in Python. This means it is computationally fast but the drawback is that the all the data has to be the same type. One of the main tools in NumPy is the multidimensional array also known as ndarray.
Nested Python lists can also be used to represent multidimensional data, so why do we need NumPy?and why numpy is fast?
Code snippet to check this
import numpy as np import time size = 1000000 #Size of array L1 = range(size) L2 = range(size) a1 = np.arange(size) # decalre two arrays a2 = np.arange(size) ## list start = time.time() result = [(x+y) for x,y in zip(L1 , L2)] print("python list took :" ,(time.time()-start)*1000 ) ## numpy array start = time.time() result = a1 + a2 print("numpy array took :" ,(time.time() - start)*1000) # Rum this code to see the difference
python list took : 89.81657028198242
numpy array took : 13.30113410949707
Why this is happening
Reason : what makes numpy so fast
NumPy is written (mostly) in C which is a low-level language, makes it very fast. It hides all this complexity under an easy to use module of Python.
Looping over lists in Python is slow because the language itself is dynamically typed. This means that you do not have to specify a variables data type but every time Python uses a variable it has to check data type. Even though Python is also written in C, this bookkeeping makes it much slower than other low-level languages.
Numpy arrays are densely packed arrays of a homogeneous numerical data type. Python lists, by contrast, are arrays of pointers to objects, even when all of them are of the same type, memory is dynamically allocated. So, you get the benefits of locality of reference.
Operations in Numpy are much faster because they take advantage of parallelism (which is the case of Single Instruction Multiple Data (SIMD)), while traditional for loop can't make use of it.