Missing bits from Lecture 1¶

More looping¶

There is of course a while-loop

In [91]:

a = 3

while a > 0:
    print(a)
    a -= 1

3
2
1

More logic: `if-else-elif`¶

In [92]:

today = 'Wednesday'

if today == 'Monday':
    chance_of_party = 'slim to none'
elif today in ['Tuesday', 'Thursday', 'Sunday']:
    chance_of_party = 'poor'
elif today == 'Wednesday':
    chance_of_party = 'possible'
elif today in ['Friday', 'Saturday']:
    chance_of_party = 'likely'
else:
    raise ValueError("'{}' is not a weekday".format(today))
    
print('Current chance of party is', chance_of_party)

Current chance of party is possible

There is no switch statement in Python!

Scientific Python¶

Hannes Ovrén

Scientific Python Stack¶

Some text Image source: Fernando Perez

NumPy¶

The core numerical array/matrix library

Not part of standard Python install
numpy.ndarray class for N-dimensional arrays
Mostly C or Fortran
Functionality for:
- Linear Algebra
- Fourier transforms
- ...

Example¶

It is common to alias numpy to np for less writing

In [93]:

import numpy as np

Let's create a $2 \times 3$ array

In [94]:

A = np.array([[1, 2, 3],
              [10, 20, 30]])
A

Out[94]:

array([[ 1,  2,  3],
       [10, 20, 30]])

and transpose it

In [95]:

A.T

Out[95]:

array([[ 1, 10],
       [ 2, 20],
       [ 3, 30]])

`numpy.ndarray`¶

N-dimensional array
- ndarray.ndim
- ndarray.shape
item type: numpy.dtype - (float64, uint8, bool, ...)

In [96]:

A = np.random.normal(size=(2,3,2))

In [97]:

print('#ndim: {}, shape: {}, #elements: {}'.format(A.ndim, A.shape, 
                                                   A.size))
print('dtype: {}, itemsize: {}'.format(A.dtype, A.itemsize))

#ndim: 3, shape: (2, 3, 2), #elements: 12
dtype: float64, itemsize: 8

Accessing elements¶

In [98]:

A = np.array([[1,2,3], [4,5,6], [7,8,9]])
A

Out[98]:

array([[1, 2, 3],
       [4, 5, 6],
       [7, 8, 9]])

In [99]:

A[1,2]

Out[99]:

Access rows

In [100]:

A[0, :]

Out[100]:

array([1, 2, 3])

In [101]:

A[0]

Out[101]:

array([1, 2, 3])

Note for MATLAB-users: zero-indexed arrays!

Access columns (note the shape!)

In [102]:

c = A[:, 0]
print(c, c.shape, c.ndim)

[1 4 7] (3,) 1

Extract parts

In [103]:

A[:2, :2] # 2x2 block

Out[103]:

array([[1, 2],
       [4, 5]])

In [104]:

A[:, (0, 2)] # first and third columns

Out[104]:

array([[1, 3],
       [4, 6],
       [7, 9]])

Boolean masks¶

Same size mask¶

In [105]:

y = np.arange(6).reshape(2, 3)
y

Out[105]:

array([[0, 1, 2],
       [3, 4, 5]])

In [106]:

mask = (y % 2 == 0)
mask

Out[106]:

array([[ True, False,  True],
       [False,  True, False]], dtype=bool)

In [107]:

y[mask]

Out[107]:

array([0, 2, 4])

Boolean mask, with different shapes¶

Only first dimensions must match.

Extracting pixels from an image, gives pixel value "vectors"

In [108]:

image = np.random.randint(0, 255, size=(2,2,3)) # 2x2 RGB image
print(image)

[[[ 79 192 241]
  [233 243 237]]

 [[ 74  52 249]
  [149 200  89]]]

In [109]:

mask = np.array([[True, False], [False, True]]) # 2x2 mask
image[mask]

Out[109]:

array([[ 79, 192, 241],
       [149, 200,  89]])

Beware of non-numpy Boolean-masks!¶

In [110]:

y = np.array([[1, 2, 3], [4, 5, 6]])
py_mask = [[True, True, False], [False, False, True]]
np_mask = np.array(py_mask)

In [111]:

print(y[np_mask])
print(y[py_mask])

[1 2 6]
[4 4 2]

/home/hannes/miniconda/envs/pycourse/lib/python3.5/site-packages/ipykernel/__main__.py:2: FutureWarning: in the future, boolean array-likes will be handled as a boolean array index
  from ipykernel import kernelapp as app

What happened?

Most functions/methods return views of an array (sharing data)

In [112]:

A = np.array([[1, 2, 3], [4, 5, 6]])

middle_col = A[:, 1]
middle_col[:] = 99

A

Out[112]:

array([[ 1, 99,  3],
       [ 4, 99,  6]])

Use .copy() to get a copy

In [113]:

A = np.array([[1, 2, 3], [4, 5, 6]])
middle_col = A[:, 1].copy()
middle_col[:] = 99
A

Out[113]:

array([[1, 2, 3],
       [4, 5, 6]])

Example: We want a function that gives a mean-removed point set

In [114]:

def normalized(pts):
    m = np.mean(pts, axis=1).reshape(-1, 1)
    pts -= m
    return pts

orig = np.random.normal(10., 1.0, size=(2, 4))
orig

Out[114]:

array([[  8.55295309,   9.59138014,  10.27300416,  10.50291701],
       [ 11.4171007 ,  11.41576679,   9.91223316,  10.06468581]])

In [115]:

normalized(orig)

Out[115]:

array([[-1.17711051, -0.13868346,  0.54294056,  0.77285341],
       [ 0.71465408,  0.71332017, -0.79021345, -0.6377608 ]])

In [116]:

orig

Out[116]:

array([[-1.17711051, -0.13868346,  0.54294056,  0.77285341],
       [ 0.71465408,  0.71332017, -0.79021345, -0.6377608 ]])

Array operations¶

In [117]:

A = np.array([[1, 2], [3, 4]])
B = np.ones((2, 2)) # 2x2 of ones
print(A)
print(B)

[[1 2]
 [3 4]]
[[ 1.  1.]
 [ 1.  1.]]

Addition and subtraction

In [118]:

A + B

Out[118]:

array([[ 2.,  3.],
       [ 4.,  5.]])

In [119]:

A - B

Out[119]:

array([[ 0.,  1.],
       [ 2.,  3.]])

What about multiplication?

In [120]:

print(A)
print(B)

[[1 2]
 [3 4]]
[[ 1.  1.]
 [ 1.  1.]]

In [121]:

A * B

Out[121]:

array([[ 1.,  2.],
       [ 3.,  4.]])

Operations are element-wise!¶

numpy.ndarray is an array, not a matrix.

For matrix multiplications use np.dot or np.ndarray.dot

In [122]:

np.dot(A, B)

Out[122]:

array([[ 3.,  3.],
       [ 7.,  7.]])

In [123]:

A.dot(B)

Out[123]:

array([[ 3.,  3.],
       [ 7.,  7.]])

Python 3.5 + NumPy 1.10: @-operator

In [124]:

A @ B

Out[124]:

array([[ 3.,  3.],
       [ 7.,  7.]])

Reshaping¶

In [125]:

A = np.arange(10)
print(A, A.shape)

[0 1 2 3 4 5 6 7 8 9] (10,)

In [126]:

B = A.reshape((2, 5))
print(B, B.shape)

[[0 1 2 3 4]
 [5 6 7 8 9]] (2, 5)

A and b points at the same data!

In [127]:

B[0,0] = 100

print(A)

[100   1   2   3   4   5   6   7   8   9]

Reshaping (cont.)¶

Automatic calculation of one dimension by setting it to -1

In [128]:

A = np.arange(24)

B = A.reshape(2, -1, 4)
print(B.shape)

(2, 3, 4)

Illegal shapes are... illegal :)

In [129]:

C = A.reshape(2, 4, 4)

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-129-c08763be3d59> in <module>()
----> 1 C = A.reshape(2, 4, 4)

ValueError: total size of new array must be unchanged

Flattening arrays¶

Flatten an array from ND to 1D

In [130]:

A = np.arange(10).reshape(2,5)

In [131]:

B = A.ravel()
B

Out[131]:

array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

In [132]:

C = A.flatten() # Copies!
C

Out[132]:

array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

In [133]:

B[0] = 99
C[1] = 100
A

Out[133]:

array([[99,  1,  2,  3,  4],
       [ 5,  6,  7,  8,  9]])

Linear indexing with ndarray.flat iterator (no copy)

In [134]:

A.flat[::2] = 0
A

Out[134]:

array([[0, 1, 0, 3, 0],
       [5, 0, 7, 0, 9]])

Joining arrays¶

hstack, vstack, dstack

In [135]:

a = np.array([1, 2])
b = np.array([3, 4])
c = np.array([5, 6])

In [136]:

np.hstack((a, b, c))

Out[136]:

array([1, 2, 3, 4, 5, 6])

In [137]:

np.vstack((a, b, c))

Out[137]:

array([[1, 2],
       [3, 4],
       [5, 6]])

General function: concatenate

In [138]:

np.concatenate((a, b, c), axis=0)

Out[138]:

array([1, 2, 3, 4, 5, 6])

Shape matters¶

In [139]:

A = np.array([[1, 2]])
B = np.array([[3],[4]])
print(A)
print('+')
print(B)

[[1 2]]
+
[[3]
 [4]]

In [140]:

A + B

Out[140]:

array([[4, 5],
       [5, 6]])

We just witnessed the numpy broadcasting system

Broadcasting¶

Quite useful!
Does not copy data!
Example: scale all image RGB values

In [141]:

img = np.random.randint(0, 255, size=(256, 256, 3)) # 256x256 RGB image

scaled = img * np.array([0.2, 0.5, 0.3])
print(scaled.shape)

(256, 256, 3)

Example: Remove mean from point set

In [142]:

points = np.random.normal(loc=3, size=(3, 100)) # N 3D points
m = np.mean(points, axis=1)
print('Original mean: ', m)

shifted = points - m.reshape(3,-1)
print('Shifted mean:', np.mean(shifted, axis=1))

Original mean:  [ 2.86716784  2.93001807  2.98902105]
Shifted mean: [ -5.32907052e-16   4.26325641e-16  -3.99680289e-17]

Broadcasting rules¶

Line up starting from last dimension
Compatible if equal or 1 (else fail!)
Size 1 dimensions are "stretched" to match
Then perform operation

A      (3d array):  15 x 3 x 5
B      (3d array):  15 x 1 x 5
Result (3d array):  15 x 3 x 5

A      (4d array):  8 x 1 x 6 x 1
B      (3d array):      7 x 1 x 5
Result (4d array):  8 x 7 x 6 x 5

Creating arrays¶

Ranges of integers or floats

In [143]:

np.linspace(-2, 2, num=10)

Out[143]:

array([-2.        , -1.55555556, -1.11111111, -0.66666667, -0.22222222,
        0.22222222,  0.66666667,  1.11111111,  1.55555556,  2.        ])

In [144]:

np.arange(10)

Out[144]:

array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

zeros, ones, eye

In [145]:

np.zeros((2, 5))

Out[145]:

array([[ 0.,  0.,  0.,  0.,  0.],
       [ 0.,  0.,  0.,  0.,  0.]])

In [146]:

np.eye(3)

Out[146]:

array([[ 1.,  0.,  0.],
       [ 0.,  1.,  0.],
       [ 0.,  0.,  1.]])

ones_like, zeros_like

In [147]:

A = np.eye(4)
A

Out[147]:

array([[ 1.,  0.,  0.,  0.],
       [ 0.,  1.,  0.,  0.],
       [ 0.,  0.,  1.,  0.],
       [ 0.,  0.,  0.,  1.]])

In [148]:

np.ones_like(A)

Out[148]:

array([[ 1.,  1.,  1.,  1.],
       [ 1.,  1.,  1.,  1.],
       [ 1.,  1.,  1.,  1.],
       [ 1.,  1.,  1.,  1.]])

empty, only allocates memory (very fast). Array items are whatever was in memory.

In [149]:

np.empty((2, 5))

Out[149]:

array([[ 0.,  0.,  0.,  0.,  0.],
       [ 0.,  0.,  0.,  0.,  0.]])

Converting between types¶

Use the ndarray.astype() method:

In [150]:

A = np.array([[1.5, 2.0, 3.7],
              [1.0, 2.0, 9.99]])
B = A.astype('uint8')
B

Out[150]:

array([[1, 2, 3],
       [1, 2, 9]], dtype=uint8)

This produces a copy of the array.

Memory layout¶

Memory is contiguous for new arrays
Row-major ("C-order") is default
Column-major ("Fortran order"/MATLAB) works as well

It is possible to change memory if required (when do you need to?)

np.ascontiguous()
np.asfortranarray() or
A = np.array([1,2], order='F')
np.require() -- The most general way
Check array properties with ndarray.flags attribute

In [151]:

A = np.array([[1,2], [3, 4]])
b = A[:, 1]
c = np.ascontiguousarray(b)

lines = [str(x.flags).splitlines() for x in (A, b, c)]
print('{:<16s} {:^8s} {:^8s} {:^8s}'.format('', 'A', 'b','c'))
for l in zip(*lines):
    #print(l)
    vals = [s.split(':')[-1].strip()for s in l]
    vals = [v + '*' if not v == vals[0] else v for v in vals]
    flagname = l[0].split(':')[0].strip()
    print('{:<16s} {:^8s} {:^8s} {:^8s}'.format(flagname, *vals))

                    A        b        c    
C_CONTIGUOUS       True    False*    True  
F_CONTIGUOUS      False    False    True*  
OWNDATA            True    False*    True  
WRITEABLE          True     True     True  
ALIGNED            True     True     True  
UPDATEIFCOPY      False    False    False

Randomness¶

Module numpy.random has most distributions avilable.

In [152]:

points = np.random.randint(-3, 3, size=(3, 10))

sigma = 0.2
noisy_points = points + np.random.normal(0, sigma, size=points.shape)

print(noisy_points)

[[ 1.29122068 -0.82561563  0.79110243  2.17405314  1.6519741   0.6800585
  -1.83964693  0.99466687 -0.4484195  -0.8430179 ]
 [ 0.20428938 -1.89918263 -3.13698162 -2.35535725 -3.05498032 -0.79690424
  -1.87530157 -1.11215585  0.87633914  1.15527529]
 [ 0.15777087 -2.83102468  1.82822541 -0.06375282  0.03015451 -2.13302273
  -1.22664891 -1.40695077  1.75335996  2.23809561]]

Random sampling¶

Use np.shuffle() or np.choice()

In [153]:

A = np.arange(10)

Randomly select 3 items from A without replacement

In [154]:

np.random.shuffle(A) # shuffle inplace
A[:3]

Out[154]:

array([0, 2, 6])

In [155]:

np.random.choice(A, 3, replace=False)

Out[155]:

array([0, 2, 8])

Finding matching samples¶

numpy.nonzero or numpy.flatnonzero

In [156]:

A = np.random.randint(0, 20, size=10)
indices = np.flatnonzero(A % 3 == 0)

In [157]:

Out[157]:

array([14, 14, 14,  5,  1,  5, 17, 10,  5,  3])

In [158]:

indices

Out[158]:

array([9])

In [159]:

A[indices]

Out[159]:

array([3])

Other functions¶

np.sum()
np.mean()
np.max()
np.argmax()

We can use either max() or numpy.max(), the latter is faster!

In [160]:

N = 10000
%timeit max(np.random.normal(size=N))

1000 loops, best of 3: 1.15 ms per loop

In [161]:

%timeit np.max(np.random.normal(size=N))

1000 loops, best of 3: 456 µs per loop

Plotting: matplotlib¶

Two different API
- matplotlib: Object-oriented interface
- matplotlib.pyplot: MATLAB-like interface
- mix and match as you wish!

`pyplot` example¶

In [162]:

import matplotlib.pyplot as plt

x = np.linspace(0, 4*np.pi)
y = np.sin(x)
plt.figure()
plt.plot(x, y, '--')
plt.show()

Multiple plots¶

There is no need for MATLABs hold on.

In [164]:

plt.figure()
plt.plot(x, np.sin(x))
plt.plot(x, np.sin(2*x+0.5))
plt.show()

Legend¶

Create automatically using label= keyword and legend()

In [165]:

y1 = np.sin(x)
y2 = y + np.random.normal(scale=0.1, size=y1.shape)
plt.figure()
plt.plot(x, y1, label='Model')
plt.scatter(x, y2, c='red', marker='x', label='Measurements')
plt.legend(ncol=2)
plt.show()

`plot()` command¶

plot(x, y)        # default line style and color
plot(x, y, 'bo')  # blue circle markers
plot(y)           # x is index array 0..N-1
plot(y, 'r+')     # ditto, but with red plusses

Some useful keyword arguments

color ('r', 'red', '#ff0000', ...)
linestyle ('-', '--', ...)
linewidth
alpha
label

Subplots¶

In [166]:

plt.figure()
plt.subplot(2, 1, 1)
plt.plot(x, np.sin(x))
plt.subplot(2, 1, 2)
plt.plot(x, np.sin(2 * x))
plt.show()

In [167]:

plt.figure(figsize=(10, 3))
ax1 = plt.subplot(1, 2, 1)
ax2 = plt.subplot(1, 2, 2, sharex=ax1)
ax1.plot(x, np.sin(x))
ax2.plot(x, 5 * np.sin(2 * (x - np.pi / 2)))
plt.show()

We can use $\LaTeX$ as well¶

In [168]:

plt.figure()
y = np.pi * np.sin(3 * x)
plt.plot(x, y, label='$\pi \cdot \sin(3x)$')
plt.legend()
plt.show()

Saving plots¶

Manually¶

Press the "save" button :)

From script¶

Replace show() with savefig(filename)
Output formats depend on backend
PNG, PDF, SVG, EPS, PS should be supported by most
dpi= option

Showing images¶

imshow() defaults to interpolation, but this can be turned off using interpolation='none'.

In [169]:

image = np.random.randint(0, 255, size=(8, 8))
plt.figure(figsize=(6,3))
plt.subplot(1,2,1)
plt.imshow(image)
plt.subplot(1,2,2)
plt.imshow(image, interpolation='none')
plt.show()

SciPy¶

Mixed bag of useful things

Example modules¶

I/O (.MAT-files, etc)
Linear algebra
Signal processing
Image processing
Sparse matrices
Optimization
Interpolation
...

See documentation for full list

Pandas¶

DataFrame table-like object
Merge, join, group data frames
Compute statistics

In [170]:

import pandas as pd
df = pd.read_csv('pojknamn.csv', skiprows=2, encoding='latin1', 
                 index_col='tilltalsnamn', na_values=['..']).transpose()

In [171]:

df.head()

Out[171]:

tilltalsnamn	Aaron	Abbas	Abbe	Abdallah	Abdirahim	Abdirahman	Abdulahi	Abdullahi	Abdullah	Abdulrahman	...	Zakarias	Zakariya	Zakk	Zander	Zebastian	Zeb	Zeke	Zion	Åke	Ömer
1998	NaN	NaN	NaN	NaN	NaN	15	NaN	12	NaN	NaN	...	NaN	NaN	NaN	NaN	23	NaN	NaN	NaN	NaN	NaN
1999	NaN	NaN	NaN	NaN	NaN	15	NaN	NaN	12	NaN	...	10	NaN	NaN	NaN	22	NaN	NaN	NaN	NaN	NaN
2000	12	NaN	NaN	NaN	NaN	21	NaN	17	10	NaN	...	NaN	NaN	NaN	NaN	26	11	NaN	NaN	13	NaN
2001	NaN	NaN	NaN	NaN	NaN	20	NaN	12	NaN	NaN	...	12	NaN	NaN	NaN	29	NaN	NaN	NaN	NaN	NaN
2002	15	NaN	11	NaN	NaN	16	NaN	NaN	NaN	NaN	...	14	NaN	NaN	NaN	26	11	NaN	NaN	10	NaN

5 rows × 824 columns

In [172]:

subset = df[['Hannes', 'Adam']]
subset.sum()

Out[172]:

tilltalsnamn
Hannes    3084
Adam      9735
dtype: float64

Pandas plotting¶

In [173]:

df[['Hannes', 'Andreas', 'Mikael', 'Marcus']].plot()
plt.title('Newly born male names')

Out[173]:

<matplotlib.text.Text at 0x7f3f7f299f28>

HDF5¶

Hierarchical Data Format v5

Dataset: Multidimensional arrays
Group: Set of Datasets
Supports metadata
Path-like identifiers: /camera1/rgb/frame1

In [174]:

import h5py
with h5py.File('hero3_atan.hdf', 'r') as f:
    print('Keys:', list(f.keys()))
    dataset = f['K']
    print(dataset)
    print(dataset.value)

Keys: ['K', 'fps', 'lgamma', 'opt_residual_mean', 'opt_residual_std', 'readout', 'size', 'wc']
<HDF5 dataset "K": shape (3, 3), type "<f8">
[[ 853.12703455    0.          988.06311256]
 [   0.          873.54956631  525.71056312]
 [   0.            0.            1.        ]]

IPython/Jupyter Notebook¶

Write math $x = \frac{1}{2}$ inline or as blocks $$ y = \sum_{i=3} x_i^2 $$
Mix text using markdown or HTML with code
Export to HTML, PDF, presentation, ...
These presentations are notebooks
Language support: Python, R, Julia, ...
Widgets!

SymPy: Symbolic math¶

Example: Derivative of $\sin(x) e^x$

In [175]:

from sympy import init_printing, symbols, exp, sin, diff
init_printing()

x = symbols('x')
diff(sin(x)*exp(x), x)

Out[175]:

$$e^{x} \sin{\left (x \right )} + e^{x} \cos{\left (x \right )}$$

OpenCV¶

Functions / classes same as C++
cv::Mat represented by numpy.ndarray
Channels in the third dimension
Plotting
- OpenCV: cv2.namedWindow(), cv2.imshow(), ...
- matplotlib: plt.imshow(), ... (highly recommended)

In [176]:

import cv2
# White square on black background
image = np.zeros((128, 128), dtype='uint8') # CV_8U
image[50:80, 50:80] = 255
image_blurred = cv2.blur(image, ksize=(11, 11))

plt.figure(figsize=(8, 3))
plt.gray()
plt.subplot(1, 2, 1)
plt.imshow(image, interpolation='none')
plt.subplot(1, 2, 2)
plt.imshow(image_blurred, interpolation='none')
plt.show()

NumPy `dtype`¶

Allows structured data arrays

In [177]:

dt = np.dtype([
        ('name', np.str_, 16),
        ('population', np.uint32)
    ])
print(dt)

[('name', '<U16'), ('population', '<u4')]

In [178]:

cities = np.array([
        ('Stockholm', 851155),
        ('Göteborg', 516532),
        ('Malmö', 293909),
        ('Linköping', 97428)
    ],dtype=dt)

In [179]:

np.sum(cities['population'])

Out[179]:

In [180]:

big_cities = cities[cities['population'] > 500000]
print(big_cities['name'])

['Stockholm' 'Göteborg']