In [90]:
import matplotlib
#%matplotlib notebook
%matplotlib inline
matplotlib.rcParams['figure.figsize'] = (7, 3)
import numpy as np

Missing bits from Lecture 1

More looping

There is of course a while-loop

In [91]:
a = 3

while a > 0:
    print(a)
    a -= 1
3
2
1

More logic: if-else-elif

In [92]:
today = 'Wednesday'

if today == 'Monday':
    chance_of_party = 'slim to none'
elif today in ['Tuesday', 'Thursday', 'Sunday']:
    chance_of_party = 'poor'
elif today == 'Wednesday':
    chance_of_party = 'possible'
elif today in ['Friday', 'Saturday']:
    chance_of_party = 'likely'
else:
    raise ValueError("'{}' is not a weekday".format(today))
    
print('Current chance of party is', chance_of_party)
Current chance of party is possible

There is no switch statement in Python!

Scientific Python

Hannes Ovrén

Scientific Python Stack

Some text Image source: Fernando Perez

NumPy

The core numerical array/matrix library

  • Not part of standard Python install
  • numpy.ndarray class for N-dimensional arrays
  • Mostly C or Fortran
  • Functionality for:
    • Linear Algebra
    • Fourier transforms
    • ...

Example

It is common to alias numpy to np for less writing

In [93]:
import numpy as np

Let's create a $2 \times 3$ array

In [94]:
A = np.array([[1, 2, 3],
              [10, 20, 30]])
A
Out[94]:
array([[ 1,  2,  3],
       [10, 20, 30]])

and transpose it

In [95]:
A.T
Out[95]:
array([[ 1, 10],
       [ 2, 20],
       [ 3, 30]])

numpy.ndarray

  • N-dimensional array
    • ndarray.ndim
    • ndarray.shape
  • item type: numpy.dtype - (float64, uint8, bool, ...)
In [96]:
A = np.random.normal(size=(2,3,2))
In [97]:
print('#ndim: {}, shape: {}, #elements: {}'.format(A.ndim, A.shape, 
                                                   A.size))
print('dtype: {}, itemsize: {}'.format(A.dtype, A.itemsize))
#ndim: 3, shape: (2, 3, 2), #elements: 12
dtype: float64, itemsize: 8

Accessing elements

In [98]:
A = np.array([[1,2,3], [4,5,6], [7,8,9]])
A
Out[98]:
array([[1, 2, 3],
       [4, 5, 6],
       [7, 8, 9]])
In [99]:
A[1,2]
Out[99]:
6

Access rows

In [100]:
A[0, :]
Out[100]:
array([1, 2, 3])
In [101]:
A[0]
Out[101]:
array([1, 2, 3])

Note for MATLAB-users: zero-indexed arrays!

Access columns (note the shape!)

In [102]:
c = A[:, 0]
print(c, c.shape, c.ndim)
[1 4 7] (3,) 1

Extract parts

In [103]:
A[:2, :2] # 2x2 block
Out[103]:
array([[1, 2],
       [4, 5]])
In [104]:
A[:, (0, 2)] # first and third columns
Out[104]:
array([[1, 3],
       [4, 6],
       [7, 9]])

Boolean masks

Same size mask

In [105]:
y = np.arange(6).reshape(2, 3)
y
Out[105]:
array([[0, 1, 2],
       [3, 4, 5]])
In [106]:
mask = (y % 2 == 0)
mask
Out[106]:
array([[ True, False,  True],
       [False,  True, False]], dtype=bool)
In [107]:
y[mask]
Out[107]:
array([0, 2, 4])

Boolean mask, with different shapes

Only first dimensions must match.

Extracting pixels from an image, gives pixel value "vectors"

In [108]:
image = np.random.randint(0, 255, size=(2,2,3)) # 2x2 RGB image
print(image)
[[[ 79 192 241]
  [233 243 237]]

 [[ 74  52 249]
  [149 200  89]]]
In [109]:
mask = np.array([[True, False], [False, True]]) # 2x2 mask
image[mask]
Out[109]:
array([[ 79, 192, 241],
       [149, 200,  89]])

Beware of non-numpy Boolean-masks!

In [110]:
y = np.array([[1, 2, 3], [4, 5, 6]])
py_mask = [[True, True, False], [False, False, True]]
np_mask = np.array(py_mask)
In [111]:
print(y[np_mask])
print(y[py_mask])
[1 2 6]
[4 4 2]
/home/hannes/miniconda/envs/pycourse/lib/python3.5/site-packages/ipykernel/__main__.py:2: FutureWarning: in the future, boolean array-likes will be handled as a boolean array index
  from ipykernel import kernelapp as app

What happened?

Sharing data

Most functions/methods return views of an array (sharing data)

In [112]:
A = np.array([[1, 2, 3], [4, 5, 6]])

middle_col = A[:, 1]
middle_col[:] = 99

A
Out[112]:
array([[ 1, 99,  3],
       [ 4, 99,  6]])

Use .copy() to get a copy

In [113]:
A = np.array([[1, 2, 3], [4, 5, 6]])
middle_col = A[:, 1].copy()
middle_col[:] = 99
A
Out[113]:
array([[1, 2, 3],
       [4, 5, 6]])

Sharing data: be careful!

Example: We want a function that gives a mean-removed point set

In [114]:
def normalized(pts):
    m = np.mean(pts, axis=1).reshape(-1, 1)
    pts -= m
    return pts

orig = np.random.normal(10., 1.0, size=(2, 4))
orig
Out[114]:
array([[  8.55295309,   9.59138014,  10.27300416,  10.50291701],
       [ 11.4171007 ,  11.41576679,   9.91223316,  10.06468581]])
In [115]:
normalized(orig)
Out[115]:
array([[-1.17711051, -0.13868346,  0.54294056,  0.77285341],
       [ 0.71465408,  0.71332017, -0.79021345, -0.6377608 ]])
In [116]:
orig
Out[116]:
array([[-1.17711051, -0.13868346,  0.54294056,  0.77285341],
       [ 0.71465408,  0.71332017, -0.79021345, -0.6377608 ]])

Array operations

In [117]:
A = np.array([[1, 2], [3, 4]])
B = np.ones((2, 2)) # 2x2 of ones
print(A)
print(B)
[[1 2]
 [3 4]]
[[ 1.  1.]
 [ 1.  1.]]

Addition and subtraction

In [118]:
A + B
Out[118]:
array([[ 2.,  3.],
       [ 4.,  5.]])
In [119]:
A - B
Out[119]:
array([[ 0.,  1.],
       [ 2.,  3.]])

What about multiplication?

In [120]:
print(A)
print(B)
[[1 2]
 [3 4]]
[[ 1.  1.]
 [ 1.  1.]]
In [121]:
A * B
Out[121]:
array([[ 1.,  2.],
       [ 3.,  4.]])

Operations are element-wise!

numpy.ndarray is an array, not a matrix.

For matrix multiplications use np.dot or np.ndarray.dot

In [122]:
np.dot(A, B)
Out[122]:
array([[ 3.,  3.],
       [ 7.,  7.]])
In [123]:
A.dot(B)
Out[123]:
array([[ 3.,  3.],
       [ 7.,  7.]])

Python 3.5 + NumPy 1.10: @-operator

In [124]:
A @ B
Out[124]:
array([[ 3.,  3.],
       [ 7.,  7.]])

Reshaping

In [125]:
A = np.arange(10)
print(A, A.shape)
[0 1 2 3 4 5 6 7 8 9] (10,)
In [126]:
B = A.reshape((2, 5))
print(B, B.shape)
[[0 1 2 3 4]
 [5 6 7 8 9]] (2, 5)

A and b points at the same data!

In [127]:
B[0,0] = 100

print(A)
[100   1   2   3   4   5   6   7   8   9]

Reshaping (cont.)

Automatic calculation of one dimension by setting it to -1

In [128]:
A = np.arange(24)

B = A.reshape(2, -1, 4)
print(B.shape)
(2, 3, 4)

Illegal shapes are... illegal :)

In [129]:
C = A.reshape(2, 4, 4)
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-129-c08763be3d59> in <module>()
----> 1 C = A.reshape(2, 4, 4)

ValueError: total size of new array must be unchanged

Flattening arrays

  • Flatten an array from ND to 1D
In [130]:
A = np.arange(10).reshape(2,5)
In [131]:
B = A.ravel()
B
Out[131]:
array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
In [132]:
C = A.flatten() # Copies!
C
Out[132]:
array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
In [133]:
B[0] = 99
C[1] = 100
A
Out[133]:
array([[99,  1,  2,  3,  4],
       [ 5,  6,  7,  8,  9]])

Linear indexing with ndarray.flat iterator (no copy)

In [134]:
A.flat[::2] = 0
A
Out[134]:
array([[0, 1, 0, 3, 0],
       [5, 0, 7, 0, 9]])

Joining arrays

  • hstack, vstack, dstack
In [135]:
a = np.array([1, 2])
b = np.array([3, 4])
c = np.array([5, 6])
In [136]:
np.hstack((a, b, c))
Out[136]:
array([1, 2, 3, 4, 5, 6])
In [137]:
np.vstack((a, b, c))
Out[137]:
array([[1, 2],
       [3, 4],
       [5, 6]])
  • General function: concatenate
In [138]:
np.concatenate((a, b, c), axis=0)
Out[138]:
array([1, 2, 3, 4, 5, 6])

Shape matters

In [139]:
A = np.array([[1, 2]])
B = np.array([[3],[4]])
print(A)
print('+')
print(B)
[[1 2]]
+
[[3]
 [4]]
In [140]:
A + B
Out[140]:
array([[4, 5],
       [5, 6]])

We just witnessed the numpy broadcasting system

Broadcasting

  • Quite useful!
  • Does not copy data!
  • Example: scale all image RGB values
In [141]:
img = np.random.randint(0, 255, size=(256, 256, 3)) # 256x256 RGB image

scaled = img * np.array([0.2, 0.5, 0.3])
print(scaled.shape)
(256, 256, 3)
  • Example: Remove mean from point set
In [142]:
points = np.random.normal(loc=3, size=(3, 100)) # N 3D points
m = np.mean(points, axis=1)
print('Original mean: ', m)

shifted = points - m.reshape(3,-1)
print('Shifted mean:', np.mean(shifted, axis=1))
Original mean:  [ 2.86716784  2.93001807  2.98902105]
Shifted mean: [ -5.32907052e-16   4.26325641e-16  -3.99680289e-17]

Broadcasting rules

  • Line up starting from last dimension
  • Compatible if equal or 1 (else fail!)
  • Size 1 dimensions are "stretched" to match
  • Then perform operation
A      (3d array):  15 x 3 x 5
B      (3d array):  15 x 1 x 5
Result (3d array):  15 x 3 x 5
A      (4d array):  8 x 1 x 6 x 1
B      (3d array):      7 x 1 x 5
Result (4d array):  8 x 7 x 6 x 5

Creating arrays

  • Ranges of integers or floats
In [143]:
np.linspace(-2, 2, num=10)
Out[143]:
array([-2.        , -1.55555556, -1.11111111, -0.66666667, -0.22222222,
        0.22222222,  0.66666667,  1.11111111,  1.55555556,  2.        ])
In [144]:
np.arange(10)
Out[144]:
array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
  • zeros, ones, eye
In [145]:
np.zeros((2, 5))
Out[145]:
array([[ 0.,  0.,  0.,  0.,  0.],
       [ 0.,  0.,  0.,  0.,  0.]])
In [146]:
np.eye(3)
Out[146]:
array([[ 1.,  0.,  0.],
       [ 0.,  1.,  0.],
       [ 0.,  0.,  1.]])
  • ones_like, zeros_like
In [147]:
A = np.eye(4)
A
Out[147]:
array([[ 1.,  0.,  0.,  0.],
       [ 0.,  1.,  0.,  0.],
       [ 0.,  0.,  1.,  0.],
       [ 0.,  0.,  0.,  1.]])
In [148]:
np.ones_like(A)
Out[148]:
array([[ 1.,  1.,  1.,  1.],
       [ 1.,  1.,  1.,  1.],
       [ 1.,  1.,  1.,  1.],
       [ 1.,  1.,  1.,  1.]])
  • empty, only allocates memory (very fast). Array items are whatever was in memory.
In [149]:
np.empty((2, 5))
Out[149]:
array([[ 0.,  0.,  0.,  0.,  0.],
       [ 0.,  0.,  0.,  0.,  0.]])

Converting between types

Use the ndarray.astype() method:

In [150]:
A = np.array([[1.5, 2.0, 3.7],
              [1.0, 2.0, 9.99]])
B = A.astype('uint8')
B
Out[150]:
array([[1, 2, 3],
       [1, 2, 9]], dtype=uint8)

This produces a copy of the array.

Memory layout

  • Memory is contiguous for new arrays
  • Row-major ("C-order") is default
  • Column-major ("Fortran order"/MATLAB) works as well

It is possible to change memory if required (when do you need to?)

  • np.ascontiguous()
  • np.asfortranarray() or
    A = np.array([1,2], order='F')
  • np.require() -- The most general way
  • Check array properties with ndarray.flags attribute
In [151]:
A = np.array([[1,2], [3, 4]])
b = A[:, 1]
c = np.ascontiguousarray(b)

lines = [str(x.flags).splitlines() for x in (A, b, c)]
print('{:<16s} {:^8s} {:^8s} {:^8s}'.format('', 'A', 'b','c'))
for l in zip(*lines):
    #print(l)
    vals = [s.split(':')[-1].strip()for s in l]
    vals = [v + '*' if not v == vals[0] else v for v in vals]
    flagname = l[0].split(':')[0].strip()
    print('{:<16s} {:^8s} {:^8s} {:^8s}'.format(flagname, *vals))
                    A        b        c    
C_CONTIGUOUS       True    False*    True  
F_CONTIGUOUS      False    False    True*  
OWNDATA            True    False*    True  
WRITEABLE          True     True     True  
ALIGNED            True     True     True  
UPDATEIFCOPY      False    False    False  

Randomness

Module numpy.random has most distributions avilable.

In [152]:
points = np.random.randint(-3, 3, size=(3, 10))

sigma = 0.2
noisy_points = points + np.random.normal(0, sigma, size=points.shape)

print(noisy_points)
[[ 1.29122068 -0.82561563  0.79110243  2.17405314  1.6519741   0.6800585
  -1.83964693  0.99466687 -0.4484195  -0.8430179 ]
 [ 0.20428938 -1.89918263 -3.13698162 -2.35535725 -3.05498032 -0.79690424
  -1.87530157 -1.11215585  0.87633914  1.15527529]
 [ 0.15777087 -2.83102468  1.82822541 -0.06375282  0.03015451 -2.13302273
  -1.22664891 -1.40695077  1.75335996  2.23809561]]

Random sampling

Use np.shuffle() or np.choice()

In [153]:
A = np.arange(10)

Randomly select 3 items from A without replacement

In [154]:
np.random.shuffle(A) # shuffle inplace
A[:3]
Out[154]:
array([0, 2, 6])
In [155]:
np.random.choice(A, 3, replace=False)
Out[155]:
array([0, 2, 8])

Finding matching samples

  • numpy.nonzero or numpy.flatnonzero
In [156]:
A = np.random.randint(0, 20, size=10)
indices = np.flatnonzero(A % 3 == 0)
In [157]:
A
Out[157]:
array([14, 14, 14,  5,  1,  5, 17, 10,  5,  3])
In [158]:
indices
Out[158]:
array([9])
In [159]:
A[indices]
Out[159]:
array([3])

Other functions

  • np.sum()
  • np.mean()
  • np.max()
  • np.argmax()

We can use either max() or numpy.max(), the latter is faster!

In [160]:
N = 10000
%timeit max(np.random.normal(size=N))
1000 loops, best of 3: 1.15 ms per loop
In [161]:
%timeit np.max(np.random.normal(size=N))
1000 loops, best of 3: 456 µs per loop

Plotting: matplotlib

  • Two different API
    • matplotlib: Object-oriented interface
    • matplotlib.pyplot: MATLAB-like interface
    • mix and match as you wish!

pyplot example

In [162]:
import matplotlib.pyplot as plt

x = np.linspace(0, 4*np.pi)
y = np.sin(x)
plt.figure()
plt.plot(x, y, '--')
plt.show()

Multiple plots

There is no need for MATLABs hold on.

In [164]:
plt.figure()
plt.plot(x, np.sin(x))
plt.plot(x, np.sin(2*x+0.5))
plt.show()

Legend

Create automatically using label= keyword and legend()

In [165]:
y1 = np.sin(x)
y2 = y + np.random.normal(scale=0.1, size=y1.shape)
plt.figure()
plt.plot(x, y1, label='Model')
plt.scatter(x, y2, c='red', marker='x', label='Measurements')
plt.legend(ncol=2)
plt.show()

plot() command

plot(x, y)        # default line style and color
plot(x, y, 'bo')  # blue circle markers
plot(y)           # x is index array 0..N-1
plot(y, 'r+')     # ditto, but with red plusses

Some useful keyword arguments

  • color ('r', 'red', '#ff0000', ...)
  • linestyle ('-', '--', ...)
  • linewidth
  • alpha
  • label

Subplots

In [166]:
plt.figure()
plt.subplot(2, 1, 1)
plt.plot(x, np.sin(x))
plt.subplot(2, 1, 2)
plt.plot(x, np.sin(2 * x))
plt.show()

Axis sharing

In [167]:
plt.figure(figsize=(10, 3))
ax1 = plt.subplot(1, 2, 1)
ax2 = plt.subplot(1, 2, 2, sharex=ax1)
ax1.plot(x, np.sin(x))
ax2.plot(x, 5 * np.sin(2 * (x - np.pi / 2)))
plt.show()

We can use $\LaTeX$ as well

In [168]:
plt.figure()
y = np.pi * np.sin(3 * x)
plt.plot(x, y, label='$\pi \cdot \sin(3x)$')
plt.legend()
plt.show()

Saving plots

Manually

Press the "save" button :)

From script

  • Replace show() with savefig(filename)
  • Output formats depend on backend
  • PNG, PDF, SVG, EPS, PS should be supported by most
  • dpi= option

Showing images

imshow() defaults to interpolation, but this can be turned off using interpolation='none'.

In [169]:
image = np.random.randint(0, 255, size=(8, 8))
plt.figure(figsize=(6,3))
plt.subplot(1,2,1)
plt.imshow(image)
plt.subplot(1,2,2)
plt.imshow(image, interpolation='none')
plt.show()

SciPy

Mixed bag of useful things

Example modules

  • I/O (.MAT-files, etc)
  • Linear algebra
  • Signal processing
  • Image processing
  • Sparse matrices
  • Optimization
  • Interpolation
  • ...

See documentation for full list

Pandas

  • DataFrame table-like object
  • Merge, join, group data frames
  • Compute statistics
In [170]:
import pandas as pd
df = pd.read_csv('pojknamn.csv', skiprows=2, encoding='latin1', 
                 index_col='tilltalsnamn', na_values=['..']).transpose()
In [171]:
df.head()
Out[171]:
tilltalsnamn Aaron Abbas Abbe Abdallah Abdirahim Abdirahman Abdulahi Abdullahi Abdullah Abdulrahman ... Zakarias Zakariya Zakk Zander Zebastian Zeb Zeke Zion Åke Ömer
1998 NaN NaN NaN NaN NaN 15 NaN 12 NaN NaN ... NaN NaN NaN NaN 23 NaN NaN NaN NaN NaN
1999 NaN NaN NaN NaN NaN 15 NaN NaN 12 NaN ... 10 NaN NaN NaN 22 NaN NaN NaN NaN NaN
2000 12 NaN NaN NaN NaN 21 NaN 17 10 NaN ... NaN NaN NaN NaN 26 11 NaN NaN 13 NaN
2001 NaN NaN NaN NaN NaN 20 NaN 12 NaN NaN ... 12 NaN NaN NaN 29 NaN NaN NaN NaN NaN
2002 15 NaN 11 NaN NaN 16 NaN NaN NaN NaN ... 14 NaN NaN NaN 26 11 NaN NaN 10 NaN

5 rows × 824 columns

In [172]:
subset = df[['Hannes', 'Adam']]
subset.sum()
Out[172]:
tilltalsnamn
Hannes    3084
Adam      9735
dtype: float64

Pandas plotting

In [173]:
df[['Hannes', 'Andreas', 'Mikael', 'Marcus']].plot()
plt.title('Newly born male names')
Out[173]:
<matplotlib.text.Text at 0x7f3f7f299f28>

HDF5

Hierarchical Data Format v5

  • Dataset: Multidimensional arrays
  • Group: Set of Datasets
  • Supports metadata
  • Path-like identifiers: /camera1/rgb/frame1
In [174]:
import h5py
with h5py.File('hero3_atan.hdf', 'r') as f:
    print('Keys:', list(f.keys()))
    dataset = f['K']
    print(dataset)
    print(dataset.value)
Keys: ['K', 'fps', 'lgamma', 'opt_residual_mean', 'opt_residual_std', 'readout', 'size', 'wc']
<HDF5 dataset "K": shape (3, 3), type "<f8">
[[ 853.12703455    0.          988.06311256]
 [   0.          873.54956631  525.71056312]
 [   0.            0.            1.        ]]

IPython/Jupyter Notebook

  • Write math $x = \frac{1}{2}$ inline or as blocks $$ y = \sum_{i=3} x_i^2 $$

  • Mix text using markdown or HTML with code

  • Export to HTML, PDF, presentation, ...
  • These presentations are notebooks
  • Language support: Python, R, Julia, ...
  • Widgets!

SymPy: Symbolic math

Example: Derivative of $\sin(x) e^x$

In [175]:
from sympy import init_printing, symbols, exp, sin, diff
init_printing()

x = symbols('x')
diff(sin(x)*exp(x), x)
Out[175]:
$$e^{x} \sin{\left (x \right )} + e^{x} \cos{\left (x \right )}$$

OpenCV

  • Functions / classes same as C++
  • cv::Mat represented by numpy.ndarray
  • Channels in the third dimension
  • Plotting
    • OpenCV: cv2.namedWindow(), cv2.imshow(), ...
    • matplotlib: plt.imshow(), ... (highly recommended)
In [176]:
import cv2
# White square on black background
image = np.zeros((128, 128), dtype='uint8') # CV_8U
image[50:80, 50:80] = 255
image_blurred = cv2.blur(image, ksize=(11, 11))

plt.figure(figsize=(8, 3))
plt.gray()
plt.subplot(1, 2, 1)
plt.imshow(image, interpolation='none')
plt.subplot(1, 2, 2)
plt.imshow(image_blurred, interpolation='none')
plt.show()

NumPy dtype

  • Allows structured data arrays
In [177]:
dt = np.dtype([
        ('name', np.str_, 16),
        ('population', np.uint32)
    ])
print(dt)
[('name', '<U16'), ('population', '<u4')]
In [178]:
cities = np.array([
        ('Stockholm', 851155),
        ('Göteborg', 516532),
        ('Malmö', 293909),
        ('Linköping', 97428)
    ],dtype=dt)
In [179]:
np.sum(cities['population'])
Out[179]:
1759024
In [180]:
big_cities = cities[cities['population'] > 500000]
print(big_cities['name'])
['Stockholm' 'Göteborg']