NumPy¶

NumPy is a Python library containing many useful features for numerical and scientific computation. It is not a built-in module, but is available in the default Anaconda base environment.

NumPy can be imported using:

import numpy

You will often find that the NumPy namespace is aliased as:

import numpy as np

In this tutorial we will use this np namespace.

This tutorial will not go into the details of the vast majority of functionality in NumPy, but much more information can be found through NumPy's official documentation. We will mainly focus on NumPy's array class, which is useful for holding multi-dimensional numerical arrays.

NumPy arrays¶

The basic class for NumPy arrays is the ndarray class. They can hold arrays of any Python object, but are most generally used to hold arrays of numbers.

NumPy ndarrays have a variety of useful data attributes and methods, a few of which will be mentioned below.

Creating an array¶

Rather than using the ndarray class definition for creating new instances of NumPy arrays, the standard way is to use the array function. To create a numerical array you can pass the array function a list of values:

x = np.array([1, 2, 3])
print(type(x))
print(x)
print(x.dtype)
<class 'numpy.ndarray'>
[1 2 3]
int64

Above, we passed array a list of values and created a NumPy ndarray holding those values. The array function will automatically try and identify the number's type and as we passed a list of integers it has set the array's dtype attribute to int64 (which stands for a 64 bit integer). If any of the numbers were a float the type for all numbers would be a float:

x = np.array([1.0, 2, 3])
print(x.dtype)
float64

It is often useful to be explicit about the type of number that you want to store, and generally make sure they are floats if you are going to perform further mathematical operations on the array:

x = np.array([1, 2, 3], dtype=float)  # explicitly type the values to be floats
print(x.dtype)
float64

Warning

If you have an integer array and use the += operator to add another non-integer array to it, it will raise an error:

x = np.array([1, 2, 3])
y = np.array([3.5, 4.5, 6.5])
x += y
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-23-3fa90755daf3> in <module>
----> 1 x += y

TypeError: Cannot cast ufunc add output from dtype('float64') to dtype('int64') with casting rule 'same_kind'

ndarrays have a shape attribute that returns a tuple containing the dimensions of the array and the size of each dimension:

x = np.array([1, 2, 3])
print(x.shape)
print("Number of dimension: {}".format(len(x.shape)))
print("Length of array: {}".format(x.shape[0]))
(3,)
Number of dimension: 1
Length of array: 3

Like a regular list, values within an array can be accessed using indexing, where the index starts at zero:

print(x[0])  # first value in the array
1
print(x[1])  # second value in the array
2

You can also change the contents of an array by setting the values using their index, e.g.,

x[2] = 100
print(x)
[  1   2 100]

As with a list, multiple values can be returned, or assigned, using a slice (the :):

x = np.array([0, 10, 20, 30, 40, 50, 60, 70, 80, 90, 100])
print(x[2:4])  # get values from indexes 2 and 3
[20 30]
print(x[4::2])  # get values from index 4 to the end in steps of 2
[ 40  60  80 100]
x[0:3] = [20, 19, 18]  # re-assign the first three values
print(x)
[ 20,  19,  18,  30,  40,  50,  60,  70,  80,  90, 100]

You can assign returned parts of an array to a new variable:

y = x[4::2]
print(y)
[ 40  60  80 100]

Again, similarly to a list, if you assign a new variable name to an existing lists it just "points" to the existing list rather than making a copy, e.g.,:

x = np.array([1, 2, 3])
y = x  # assign a variable y to point to x
y[0] = 100  # change something in y
print(x)  # see the change in x
[100   2   3]

The same is true of returned slices of parts of a list. If you assigned the return slice from a list as a new variable it just "points" back to the memory in the original array, so if you alter the new array the original one will be altered too:

x = np.array([0, 10, 20, 30, 40, 50, 60, 70, 80, 90, 100])
y = x[4::2]
print(y)
[ 40  60  80 100]
y[0] = 1000  # change the first index of y
print(y)
[1000   60   80  100]
print(x)  # see the change effects x
[   0   10   20   30 1000   50   60   70   80   90  100]

Copying arrays¶

If you want a new array that contains the same contents as another one, but is not just a pointer, you must copy the array. The simplest way to do this is to create a new array from the old one:

x = np.array([1, 2, 3])
y = np.array(x)  # make a new array that contains a copy of x
print(y)
[1 2 3]
y[0] = 100  # change y
print(y)
[100   2   3]
print(x)  # show x in unaffected by the change to y
[1 2 3]

There are a couple of equivalent ways to do this using the copy method of a ndarray or the NumPy copy function:

y = x.copy()  # another way to copy
y = np.copy(x)  # yet another way to copy!

Pre-initialised arrays¶

You can create arrays initialised with all elements containing zero, using the zeros function:

x = np.zeros(10)  # an array of 10 zeros
print(x)
[0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]

Or, containing all ones (using the ones function):

x = np.ones(10)  # an array of 10 ones
print(x)
[1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

x = 2.3964 * np.ones(3)  # an array of 3 values of 2.3964
print(x)
[2.3964 2.3964 2.3964]

# or equivalently
x = np.full(3, 2.3964)

Complex arrays¶

Arrays can contain complex numbers, e.g.,:

x = np.array([2 + 3j, 9 + 5j, -5 - 6j])

The real and imaginary components of the array can be accessed separately using the real and imag array attributes (there are equivalent real and imag functions that can also be used):

print(x.real)
[ 2.  9. -5.]
print(x.imag)
[ 3.  5. -6.]

Useful methods¶

Some other useful methods of an ndarray are:

sum: return the sum of all the values in the array
prod: return the product of all the values in the array
mean: return the mean of the values in the array
std: return the standard deviation of the values in the array
max: return the maximum value in the array
argmax: return the index of the maximum value in the array
min: return the minimum value in the array
argmin: return the index of the minimum value in the array
tolist: return the contents of the array as a Python list

x = np.array([1.0, 2.0, 3.0, 4.0])

# sum the array
print(x.sum())  # these are methods so the brackets are required
10.0

# get the product
print(x.prod())
24.0

# get the mean
print(x.mean())
2.5

# get the standard deviation
print(x.std())
1.118033988749895

# get the maximum value
print(x.max())
4.0

# get the index of the maximum value
print(x.argmax())
3

# get the minimum value
print(x.min())
1.0

# get the index of the minimum value
print(x.argmin())
0

# return the array as a list
y = x.tolist()
print(y)
[1.0, 2.0, 3.0, 4.0]
print(type(y))
<class 'list'>

There are many other methods not covered here. The majority of these methods have equivalent NumPy functions that can be used instead, and some take in arguments, which have not been covered here (i.e., they can work differently for arrays with more than one dimension).

Multidimensional arrays¶

Arrays can have any number of dimensions. You can create multi-dimensional arrays from multi-dimensional lists (in the examples here we will stick to 2D objects), e.g.,:

x = np.array([[1, 2, 3], [4, 5, 6]])  # a 2x3 2D array
print(x.shape)
(2, 3)
print(x)
(2, 3)
[[1 2 3]
 [4 5 6]]

Values can be accessed either in a list-like indexing manner:

print(x[0][2])  # get the value from the 1st row and 3rd column
3

or using a comma to separate the indices of the rows and columns, e.g.,:

print(x[0, 2])  # get the value from the 1st row and 3rd column
3

Slices can also be used, e.g.,:

print(x[1, :])  # get all values from the 2nd row
[4 5 6]

Reminder

Slices are a way of accessing multiple index values using the colon : notation, e.g., start:stop or start:stop:step, where start is the index of the first value to return, stop is the index one after that of last value to return, and step is the integer step between indices of the returned value. If start is not supplied, it defaults to 0, i.e., the start of the array, if stop is not defined it defaults to the last value in the array, and if step is not supplied it defaults to 1.

The slice built-in function can also be used to generate a slice.

You can get the transpose of an array using the T attribute, e.g.,:

y = x.T
print(y.shape)
(3, 2)
print(y)
[[1 4]
 [2 5]
 [3 6]]

The zeros and ones functions can also return multi-dimensional arrays initialised to contain 0 or 1, by specifying the shape of the required array as a tuple, e.g.,:

x = np.zeros((4, 5))  # a 4x5 2D array of zeros
print(x)
[[0. 0. 0. 0. 0.]
 [0. 0. 0. 0. 0.]
 [0. 0. 0. 0. 0.]
 [0. 0. 0. 0. 0.]]

Joining arrays¶

Unlike a list, a NumPy ndarray does not have an append method to add new values to it. There are instead a variety of ways to add new values.

You can insert values into an indexed position in an array using the insert function.

x = np.array([0, 1, 2, 3])
y = np.insert(x, 0, -1)  # insert the value -1 into the 0th index of the array
print(y)
[-1  0  1  2  3]

This does not alter the original array, but instead returns a new copy containing the inserted value.

You can concatenate two arrays (of the same shape) using the concatenate function:

x = np.array([1, 2, 3])
y = np.array([4, 5, 6])
z = np.concatenate((x, y))  # pass the arrays for concatenation via a tuple
print(z)
[1 2 3 4 5 6]

There are also the hstack and vstack functions that are similar to concatenate.

Mathematical operations on an array¶

NumPy ndarrays are overloaded so that the standard mathematical operators can be applied to them.

# adding two arrays
x = np.array([5.5, 9.6, 12.2])
y = np.array([4.5, 0.4, -2.2])
z = x + y
print(z)
[10. 10. 10.]

# add a number to all values in an array
z = x + 2
print(z)
[ 7.5 11.6 14.2]

# add y to x in-place (i.e., x gets changed)
x += y
print(x)
[10. 10. 10.]

# subtract two arrays
z = x - y
[ 5.5  9.6 12.2]

# multiply two arrays (multiplication of each component individually,
# sometimes called the Hadamard product)
z = x * y
print(z)

# multiply all numbers in an array by a value
z = x * 2
print(z)
[20. 20. 20.]

# divide two arrays
z = x / y
print(z)
[ 2.22222222 25.         -4.54545455]

# raise an array to a power
z = x ** 2
print(z)
[100. 100. 100.]

Linear algebra¶

1D arrays can be thought of as vectors, and NumPy can be used to compute vector dot and cross products:

x = np.array([1, 0, 0])
y = np.array([0, 1, 0])
print(np.dot(x, y))  # vectors are orthogonal
0
z = np.cross(x, y)
print(z)
[0, 0, 1]

Note

The dot function can be used for arrays of any number of dimensions, but if you have 1D arrays (e.g., vectors) then the vdot function can be used instead (if you supply higher dimensional arrays to vdot they will get "flattened" into 1D arrays). vdot is faster than dot when used on vectors:

%timeit np.dot(x, y)
818 ns ± 39 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
%timeit np.vdot(x, y)
613 ns ± 6.89 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)

If the first vector supplied to vdot is complex valued then its complex conjugate is automatically used when calculating the dot product. This is not the case for dot (see also, yet another similar function inner!):

x = np.array([3 + 4j, 8 - 1j, 9 + 2j])
y = np.array([10.0, 9.0, 8.0])
print(np.dot(x, y))
(174+47j)
print(np.vdot(x, y))
(174-47j)
print(np.dot(np.conj(x), y))  # explicitly perform complex conjugation to replicate vdot
(174-47j)

The norm or magnitude of a vector can be calculated using the norm function in the NumPy linear algebra submodule linalg:

print(np.linalg.norm(z))
1.0

For square 2D arrays, you can get the diagonal elements with the diag function:

x = np.array([[1.0, 2.0], [3.0, 4.0]])
print(np.diag(x))
[1. 4.]

The inverse of the array and its determinant can be found using the inv and det methods in the linalg submodule, e.g.,:

# get the inverse
print(np.linalg.inv(x))
[[-2.   1. ]
 [ 1.5 -0.5]]

# get the determinant
print(np.linalg.det(x))
-2.0000000000000004

Matrix multiplication can be performed using the matmul function, e.g.,:

x = np.array([[1.0, 2.0], [3.0, 4.0]])
y = np.array([[4.5, 5.5], [-9.2, 6.4]])

z = np.matmul(x, y)
print(z)
[[-13.9  18.3]
 [-23.3  42.1]]

Note

There is actually an operator @ that can be used for matrix multiplication:

z = x @ y
[[-13.9  18.3]
 [-23.3  42.1]]

Matrix-vector products can be performed using the dot function, e.g., for performing a 90 degree anti-clockwise rotation:

R = np.array([[0, -1], [1, 0]])
x = np.array([0.5, 0.5])
y = np.dot(R, x)
print(y)
[-0.5  0.5]

For more advanced usage there are also the tensordot function and the powerful (but tricky) einsum function.

Range-like arrays¶

The built-in Python range function can return a list of integers, but it is often useful to have a 1D array of non-integer uniformly spaced values. There are a couple of different ways to do this using NumPy: the linspace and arange functions.

linspace takes in an initial value and a final value, and the number of points within that range (inclusive of the start and end values) at which to generate evenly spaced values:

start = 0.1
stop = 1.2
num = 12
values = np.linspace(start, stop, num)
print(len(values))
12
print(values)
[0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.  1.1 1.2]

With arange, instead of supplying the number of steps you supply the step size, so equivalently we could have done:

start = 0.1
stop = 1.2
step = 0.1
values = np.arange(start, stop, step)
print(values)
[0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.  1.1]

Note

arange is not inclusive of the final value, so if you want to include a value at your intended final values you could instead use:

values = np.arange(start, stop + step, step)
print(values)
[0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.  1.1 1.2]

If you just give arange an integer start value, and optionally a stop value, it will work just like range (i.e., using a default step size of 1), but will return a NumPy array rather than a list.

NumPy maths¶

NumPy has its own functions for many mathematical functions. Ones that are commonly used are the trigonometric functions, exponentiation and logarithm. Many of these are equivalent to, and can be used in place of, the functions in the built-in math library. The main differences is that these functions work on arrays (or regular lists), so all values in the array can have the function applied to them. NumPy also contains some constants, such as π, accessed with np.pi.

Some examples are:

x = np.arange(0, 2 * np.pi, np.pi / 2)  # values at 0, pi/2, pi and 3pi/2 rads
print(np.sin(x))  # Sine of values
[ 0.0000000e+00  1.0000000e+00  1.2246468e-16 -1.0000000e+00]

print(np.cos(x))  # Cosine of values
[ 1.0000000e+00  6.1232340e-17 -1.0000000e+00 -1.8369702e-16]

print(np.exp([1., 2., 3.]))  # e to the power of values
[ 2.71828183  7.3890561  20.08553692]

print(np.log([10, 100, 1000]))  # natural logarithm
[2.30258509 4.60517019 6.90775528]

print(np.log10([10, 100, 1000]))  # base-10 logarithm
[1. 2. 3.]

NumPy random number generation¶

NumPy can be used to generate random numbers using functions within the random submodule. These can be useful for Monte Carlo simulations.

Random numbers drawn by a computer using software are only pseudo-random, i.e., they use a seed as a starting point and they have a period (eventually they will return the same number, although in most practical applications this will never happen). By default the seed will be set using something like the computer's clock time, but it can be set (or reset) manually, for example if you explicitly wanted repeatable "random" draws, using the seed function, which takes an integer value:

from numpy.random import default_rng

rng = default_rng(98263954)
print(rng.uniform())  # a random number between 0 and 1
0.22816342984233473
print(rng.uniform())
0.33584602757208415

# reset the seed
rng = default_rng(98263954)
print(rng.uniform())  # a random number between 0 and 1
0.22816342984233473

Warning

Only set the seed if you want really want identical and repeatable random numbers. If doing simulations that rely on the random aspect being independent over multiple runs then it is best to not set a seed.

There are many probability distributions from which random numbers can be drawn, but below are a few common examples.

Uniform distribution¶

Numbers can be drawn randomly from a uniform distribution within a given interval, using the uniform function. This takes the lower and upper range of the interval and the shape of array to return (by default a single draw is returned):

from numpy.random import default_rng

rng = default_rng()  # use default seed

# draw a single random number between 0 and 5
x = rng.uniform(0, 5)
print(x)
4.853972144648509

# draw an array of 5 random numbers between 3.5 and 6.7
x = rng.uniform(3.5, 6.7, 5)
print(x)
[4.87159694 5.39695169 5.0000419  5.25914921 4.49457314]

# draw a 2x2 array of random numbers between -1 and 1
x = rng.uniform(-1, 1, (2, 2))
print(x)
[[ 0.68813313 -0.97703578]
 [ 0.25991054 -0.09875825]]

Normal distribution¶

Numbers can be drawn from a Normal (or Gaussian) distribution using the normal function, which takes the mean (the loc parameter, defaulting to 0), the standard deviation (the scale parameter, defaulting to 1) and the shape of the array to return:

from numpy.random import default_rng

rng = default_rng()

# draw a single Normally distributed random number from a distribution with a
# mean of 2 and standard deviation of 1
x = rng.normal(2, 1)
print(x)                                                     
1.5730723103489876

# draw an array of 100 random numbers between with a mean of 0 and standard
# deviation of 2
x = rng.normal(0.0, 1.0, 100)
print(x.mean())
print(x.std())
0.13988286116368978
1.0674409970150702

Random integers¶

Random integers between an upper and lower bound can be generated using the integers function:

from numpy.random import default_rng

rng = default_rng()

# generate 10 random integer values between 1 and 7 (excluding 7), i.e., dice rolls
dicerolls = rng.integers(1, 7, 10)
print(dicerolls)
[5, 1, 4, 0, 4, 0, 3, 0, 4, 2]

# generate some more
dicerolls = rng.integers(1, 7, 10000)
print("Fraction of 1's rolled: {}".format((dicerolls == 1).sum() / len(dicerols)))
Fraction of 1's rolled: 0.1712

# similar is the choice function
coinflips = rng.choice(["head", "tail"], 10)
print(coinflips)
['head' 'tail' 'tail' 'head' 'head' 'tail' 'head' 'head' 'head' 'tail']

Saving and loading data¶

Saving and loading data using NumPy is discussed in the "Reading and writing data" tutorial.