Lecture 3: Random bits of usefulness¶

Today's topics¶

More language features
Python 2 vs Python 3
Improve speed and wrapping C/C++

Some notes on style...¶

PEP8 - Style Guide for Python Code
Not enforced, but encouraged

Indent 4 spaces

Avoid long lines (>79)

Naming things¶

Classes - CapitalizedWords
```
class SomeClassWithLongName:
  pass
```

functions, methods, attributes - lower case with underscore

foo = 2

def do_something_with_foo(foo):
    pass

contants - CAPITAL_LETTERS

SOME_CONSTANT = 3.14

Documentation¶

Classes and functions can be documented using docstrings

In [2]:

def square(x):
    "Calculate the square of a number"
    return x**2

class CameraModel:
    "Camera model base class"

    def project(self, x):
        "Project 3D point to image plane"
        pass

SciPy has its own recommended format

In [3]:

class AtanCameraModel(CameraModel):
    """atan camera model

    This implements the camera model of Devernay and Faugeras ([1]_)

    References
    -----------------------
    ..  [1] F. Devernay and O. Faugeras, “Straight lines have to be straight: Au- tomatic calibration and removal of
        distortion from scenes of structured environments,” Machine Vision and Applications, vol. 13, 2001.
    """
    
    def project(self, points):
        """Project 3D points to image coordinates.

        This projects 3D points expressed in the camera coordinate system to image points.

        Parameters
        --------------------
        points : (3, N) ndarray
            3D points

        Returns
        --------------------
        image_points : (2, N) ndarray
            The world points projected to the image plane
        """
        pass

Where is my `main()` function?¶

Everything in a python file is run when imported or executed from the commandline.

To distinguish check the __name__ variable

In [4]:

%%file testmod.py
import sys

print('This will always print (once)')

def greet(name):
    print('Hello {}!'.format(name))

if __name__ == '__main__':
    name = sys.argv[1]
    greet(name)

Overwriting testmod.py

In [5]:

import testmod

This will always print (once)

In [6]:

testmod.greet('Hannes')

Hello Hannes!

In [7]:

!python3 ./testmod.py Kalle

This will always print (once)
Hello Kalle!

Numpy slicing revisisted¶

We can use the Ellipsis ... to "fill out" the required axis in a slice

In [8]:

sz = (5, 4, 3, 2)
A = np.arange(np.prod(sz)).reshape(sz)

In [9]:

A[1, :, :, 0]

Out[9]:

array([[24, 26, 28],
       [30, 32, 34],
       [36, 38, 40],
       [42, 44, 46]])

In [10]:

A[1, ..., 0]

Out[10]:

array([[24, 26, 28],
       [30, 32, 34],
       [36, 38, 40],
       [42, 44, 46]])

Structured data, tuples vs classes¶

We can store structured data in tuples, but values can only be accessed by index

In [11]:

persons = [('Eric', 72), ('Terry', 74), ('John', 76)]
for p in persons:
    print('{} is {} years old'.format(p[0], p[1]))

Eric is 72 years old
Terry is 74 years old
John is 76 years old

We could make a class, but that would mean lots of overhead for each item (memory). Instead we can use namedtuple.

In [12]:

from collections import namedtuple
Person = namedtuple('Person', ['name', 'age'])
persons = [Person(name, age) for name, age in [('Eric', 72), ('Terry', 74), ('John', 76)]]
for p in persons:
    print('{} is {} years old'.format(p.name, p.age))

Eric is 72 years old
Terry is 74 years old
John is 76 years old

`itertools` example: running experiments¶

We can avoid nesting loops by using the itertools package.

In [13]:

import itertools

def run_experiment(alpha, beta, sigma):
    print('Ran experiment with alpha={:.2f}, beta={:.2f}, sigma={:.2f}'.format(alpha, beta, sigma))

SIGMAS = (0.1, 0.2)
ALPHAS = (1, 2, 3)
BETAS = (10.0, 12.34)

for alpha, beta, sigma in itertools.product(ALPHAS, BETAS, SIGMAS):
    run_experiment(alpha, beta, sigma)

Ran experiment with alpha=1.00, beta=10.00, sigma=0.10
Ran experiment with alpha=1.00, beta=10.00, sigma=0.20
Ran experiment with alpha=1.00, beta=12.34, sigma=0.10
Ran experiment with alpha=1.00, beta=12.34, sigma=0.20
Ran experiment with alpha=2.00, beta=10.00, sigma=0.10
Ran experiment with alpha=2.00, beta=10.00, sigma=0.20
Ran experiment with alpha=2.00, beta=12.34, sigma=0.10
Ran experiment with alpha=2.00, beta=12.34, sigma=0.20
Ran experiment with alpha=3.00, beta=10.00, sigma=0.10
Ran experiment with alpha=3.00, beta=10.00, sigma=0.20
Ran experiment with alpha=3.00, beta=12.34, sigma=0.10
Ran experiment with alpha=3.00, beta=12.34, sigma=0.20

Truthiness¶

Python can interpret a range of things as being True or False.

Empty sequences: [], {}, (), "", '' $\rightarrow$ False

In [14]:

items = []

Instead of this test

In [15]:

if len(items) < 1:
    print('Error: No items')

Error: No items

you can do this

In [16]:

if not items:
    print('Error: No items')

Error: No items

Note that these are only interpreted as True or False, they are not equal to them!

In [17]:

[] == False

Out[17]:

False

In [18]:

[] is False

Out[18]:

False

In [19]:

['item1', 'item2'] == True

Out[19]:

False

In [20]:

if ['item1', 'item2']:
    print('List not empty, i.e. "True"')

List not empty, i.e. "True"

Numpy deals with truthiness differently

In [21]:

a = np.array([1, 2])
if a:
    print('a had values')

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-21-c8a544b8886a> in <module>()
      1 a = np.array([1, 2])
----> 2 if a:
      3     print('a had values')

ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()

In NumPy use the all() or any() methods

In [22]:

a = np.array([1, 2])
if np.all(a):
    print('All values non-zero')

All values non-zero

In [23]:

b = np.array([1, 0])
if np.all(b):
    print('All non-zero')
elif np.any(b):
    print('Some element non-zero')

Some element non-zero

Multiplication of sequences¶

If seq is a sequence, thenseq * N where N is integer, repeats seq N times.

In [24]:

[1, 2, 3] * 5

Out[24]:

[1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3]

In [25]:

'abc' * 5

Out[25]:

'abcabcabcabcabc'

Ternary operator¶

The Python equivalent of the C++ ternary operation

condition ? val_true : val_false

val_true if condition else val_false

In [26]:

vehicle = 'bike'
num_wheels = 2 if vehicle == 'bike' else 4

print('A {} has {} wheels'.format(vehicle, num_wheels))

A bike has 2 wheels

Iteration index: `enumerate()`¶

Sometimes we do want the iteration index as well as the object while iterating. enumerate() solves this!

In [27]:

fruits = ['apple', 'banana', 'grapefruit']
for i, fruit in enumerate(fruits):
    print(i, fruit)

0 apple
1 banana
2 grapefruit

Iterating over multiple things: `zip()`¶

In [28]:

x = np.linspace(0, 2, 100)
data = [np.sin(20*x) * np.exp(x), np.exp(x)]
linestyles = ['-', '--']
labels = [r'$\sin(20x)e^x$', r'$e^x$']

plt.figure(figsize=(8,3))
for y, linestyle, label in zip(data, linestyles, labels):
    plt.plot(x, y, label=label, linestyle=linestyle)
plt.legend(loc='best')
plt.show()

Generators¶

Sometimes we want to generate a sequence of data but...

don't want to create a full list in memory
don't know how long it should be
it might be infinite in length

Solved by the generator pattern

replace return with yield

Example¶

Throw two dice until their sum is larger than some number

In [29]:

def roll_two(max_sum):
    import random
    while True:
        dice_roll = [random.randint(1, 6) for _ in range(2)]
        yield dice_roll
        
        if sum(dice_roll) > max_sum:
            break

In [30]:

for roll in roll_two(8):
    print("Roll: {} + {} = {}".format(*roll, sum(roll)))

Roll: 3 + 5 = 8
Roll: 2 + 2 = 4
Roll: 6 + 5 = 11

Generator shorthand¶

List comprehensions with () instead of []

In [31]:

square_list = [x**2 for x in range(10)]

In [32]:

square_gen = (x**2 for x in range(10))

In [33]:

square_list

Out[33]:

[0, 1, 4, 9, 16, 25, 36, 49, 64, 81]

In [34]:

square_gen

Out[34]:

<generator object <genexpr> at 0x7fe462ba7780>

In [35]:

list(square_gen)

Out[35]:

[0, 1, 4, 9, 16, 25, 36, 49, 64, 81]

Generator: Frames from video¶

This generator produces video frames until the VideoCapture object reports that there are no more frames to have.

In [36]:

def video(path):
    vc = cv2.VideoCapture(path)
    while True:
        ret, im = vc.read()
        if ret:
            yield im
        else:
            break

In [37]:

plt.figure()
VIDEO_PATH = '/home/hannes/Datasets/gopro-gyro-dataset/walk.MP4'
for i, frame in enumerate(video(VIDEO_PATH)):
    plt.subplot(2, 2, i + 1)
    frame_rgb = cv2.cvtColor(frame, cv2.COLOR_BGR2RGB)
    plt.imshow(frame_rgb, interpolation='none')
    if i >= 3:
        break
plt.show()

Python 2 vs Python 3¶

Not backwards compatible
Options:
1. Target only py2 or py3
2. Target both, but separate sources/branches
3. Target both, single source
from __future__ import foo helps a lot!

Differences in short¶

integer division¶

[Python 2] 5 / 2 $\rightarrow$ 2
[Python 3] 5 / 2 $\rightarrow$ 2.5

To get Python 3 behaviour in Python 2:

from __future__ import division

If you want integer division use // (same result in both)

[Python 2/3] 5 // 2 $\rightarrow$ 2

`print`¶

[Python 2] print is a statement
```
print "something"
```
[Python 3] print is a function
```
print("something")
```
To get Python 3 behaviour in Python 2
```
from __future__ import print_function
```

Strings, unicode¶

[Python 2] has both "normal" strings and unicode strings
- s = "a string"
- s = u"a unicode 文字列"
[Python 3] Only unicode strings (no need for u"...")
- s = "a unicode 文字列"

`range()`¶

[Python 2]
- range() returns a list
- xrange() returns an iterator (less memory)
[Python 3]
- range() returns an iterator

In general: a lot of things that returned a list in Python 2 returns an iterator object in 3.

3 vs 2: what should I do?¶

If possible, use Python 3 (and let Python 2 die...)
If you can't, try to at least use Python 2.7
If you release code, consider supporting both
- Porting 3 to 2 or 2 to 3, separate source, or
- Single source compatible with both 3 and 2.7 (e.g. using from __future__ import)
Lot of documentation exist on how to handle the 2/3 split

Modules and Packages¶

single python source file mymodule.py

package directory

mypackage/
   __init__.py
   submodule.py

</pre>
import mypackage 

import mypackage.submodule

(CPython) binary modules, mymodule.so, mymodule.dll, mymodule.dylib

Module search order: `sys.path`¶

In [38]:

import sys
sys.path

Out[38]:

['',
 '/home/hannes/miniconda/envs/pycourse/lib/python35.zip',
 '/home/hannes/miniconda/envs/pycourse/lib/python3.5',
 '/home/hannes/miniconda/envs/pycourse/lib/python3.5/plat-linux',
 '/home/hannes/miniconda/envs/pycourse/lib/python3.5/lib-dynload',
 '/home/hannes/miniconda/envs/pycourse/lib/python3.5/site-packages/setuptools-18.4-py3.5.egg',
 '/home/hannes/miniconda/envs/pycourse/lib/python3.5/site-packages',
 '/home/hannes/miniconda/envs/pycourse/lib/python3.5/site-packages/IPython/extensions',
 '/home/hannes/.ipython']

Instantitated from

Current script directory
$PYTHONPATH environment variable
Installation default (e.g. /usr/lib/python3.5/site-packages/)

Virtual environments¶

Separate packages from system (i.e. no administrator rights necessary)
Separate python interpreters

virtualenv¶

The most commonly used method for virtual environments

conda¶

Aimed at scientists
virtualenv + package manager

Threading¶

Easy
threading module

Some important issues...

Threading example: Listing primes¶

In [39]:

def primes_naive(N):
    found = [2]
    x = 3
    while len(found) < N:
        for p in found:
            if x % p == 0:
                break
        else:
            found.append(x)
        x += 2
    return found

In [40]:

def run_serial(num_primes, repeats):
    for _ in range(repeats):
        primes_naive(num_primes)

%timeit primes_naive(5000)
%timeit run_serial(5000, 2)

1 loops, best of 3: 1.75 s per loop
1 loops, best of 3: 3.25 s per loop

In [41]:

from threading import Thread
def runthreaded(num_primes, repeats):
    threads = [Thread(target=primes_naive, args=(num_primes,)) for _ in range(repeats)]
    for thread in threads:
        thread.start()
    for thread in threads:
        thread.join()

In [42]:

%timeit runthreaded(5000, 2)

1 loops, best of 3: 2.98 s per loop

No speedup at all?

The dreaded GIL¶

GIL = Global Interpreter Lock
prevents Python code to run in parallell
extensions can (and do) disable the lock

`multiprocessing` to the rescue!¶

In [43]:

from multiprocessing import Pool

def run_multiproc(num_primes, repeat):
    with Pool(4) as pool:
        for _ in range(repeat):
            pool.apply_async(primes_naive, (num_primes, ))
        pool.close()
        pool.join()

In [44]:

%timeit run_multiproc(5000, 2)

1 loops, best of 3: 1.65 s per loop

multiprocessing spawns multiple python interpreters and then uses various IPC mechanisms to pass data back and forth.

Describing objects¶

obj.__str__() - what print and friends produce
obj.__repr__() - what is written in the REPL

In [45]:

class Robber:
    def __init__(self, name):
        self.name = name
    
    def __repr__(self):
        return '<Robber: {}>'.format(self.name)
    
    def __str__(self):
        return self.name

In [46]:

robber = Robber('Roger')
robber

Out[46]:

<Robber: Roger>

In [47]:

print('Release {}!'.format(robber))

Release Roger!

Class properties¶

Allows "getter" and "setter" functionality to class attributes without x.get_foo(), x.set_foo()

Example: Read-only property¶

In [48]:

class Point:
    def __init__(self, x, y):
        self.x = x
        self.y = y
    
    @property
    def norm(self):
        from math import sqrt
        return sqrt(self.x ** 2 + self.y ** 2)

In [49]:

p = Point(1, 2)
p.norm

Out[49]:

2.23606797749979

Setter-getter example: Robber language¶

Consonants: k $\rightarrow$ kok
vowels unaltered
"Hello" $\rightarrow$ "Hohelollolo"

In [50]:

class Robber:
    def __init__(self, name):
        self._name = name
    
    @property
    def name(self):
        return ''.join([c if c in 'aeiouyåäö' else c + 'o' + c.lower() 
                        for c in self._name])
    
    @name.setter
    def name(self, name):
        self._name = name

In [51]:

robber = Robber('Roderick') # famous pick-pocket
robber.name = "Brian"
print('Release {}!'.format(robber.name))

Release Bobrorianon!

`@classmethod`¶

In [52]:

class Dog:
    def __init__(self, name, color):
        self.name = name
        self.color = color
    
    def fetch(self, what):
        print('{} fetched {}!'.format(self.name, what))
    
    @classmethod
    def from_name(cls, name):
        instance = cls(name, 'unknown')
        return instance

In [53]:

dog = Dog.from_name('Barry')
dog.fetch('the stick')

Barry fetched the stick!

In [54]:

class Cat(Dog):
    def fetch(self, what):
        print('{} says "fetch {} yourself!"'.format(self.name, what))

In [55]:

cat = Cat.from_name('Catbert')
cat.fetch('the stick')

Catbert says "fetch the stick yourself!"

@staticmethod decorator¶

Create a class attribute that does not take a self or class argument.

In [56]:

class Robber:
    @staticmethod
    def translate(word):
        return ''.join([c if c in 'aeiouyåäö' else c + 'o' + c.lower() 
                        for c in word])
    
    def __init__(self, name):
        self.name = name
        
    @property
    def name(self):
        return Robber.translate(self._name)
    
    @name.setter
    def name(self, name):
        self._name = name

In [57]:

Robber.translate('Hjärttransplantation')

Out[57]:

'Hohjojärortottotroranonsospoplolanontotatotionon'

Decorators in general¶

@staticmethod, @classmethod, and @property are examples of a decorator.

Decorators "decorate" a function by returning a new function.

Decorators are common in many libraries.

In [58]:

import time

def measure_time(func):
    def inner(*args, **kwargs):
        t0 = time.time()
        result = func(*args, **kwargs)
        elapsed = time.time() - t0
        print('Elapsed:', elapsed)
        return result
    return inner

@measure_time
def slow_func(a, b):
    time.sleep(1.01)
    return a + b

In [59]:

slow_func(12, 34)

Elapsed: 1.0110876560211182

Out[59]:

Speeding up¶

There are a few ways to get better performance

Write extensions in (e.g.) C
Numba
Other Python implementations, e.g. PyPy

Numba¶

Numba provides easy JIT-compilation of Python code

In [60]:

import numba
primes_numba = numba.jit(primes_naive)

In [61]:

%timeit primes_naive(200)

100 loops, best of 3: 2.16 ms per loop

In [62]:

%timeit primes_numba(200)

The slowest run took 482.22 times longer than the fastest. This could mean that an intermediate result is being cached 
1000 loops, best of 3: 251 µs per loop

Or use jit() as a decorator

In [63]:

@numba.jit
def quick_sum(N):
    y = 0
    for x in range(N):
        y += x
    return y

Compiling extensions¶

There are numerous ways to write/compile Python extensions in C (or C++):

CPython API (+ NumPy C-API)
boost::python
Cython

Cython¶

Cython defines a new language that

is a superset of Python
allows type declaration
allows disabling of features like range checks on loops
... thus allows you to shoot yourself in the foot
makes it "easy" to wrap existing C/C++ code

In [64]:

%%cython
from cpython cimport array

def primes_cython(int N):
    cdef array.array primes = array.array('i', range(N))
    primes.data.as_ints[0] = 2
    cdef int num_found = 1
    cdef int x = 3
    while num_found < N:
        for i in range(num_found):
            if x % primes.data.as_ints[i] == 0:
                break
        else:
            primes[num_found] = x
            num_found += 1
        x += 2
    return list(primes)

In [65]:

%timeit primes_naive(200)

100 loops, best of 3: 2.23 ms per loop

In [66]:

%timeit primes_cython(200)

10000 loops, best of 3: 140 µs per loop

In [67]:

%timeit primes_numba(200)

1000 loops, best of 3: 247 µs per loop

Wrapping C++ with Cython¶

Let's wrap a simple Rectangle class using Cython

In [68]:

%%file rect.h
namespace shapes {
    class Rectangle {
    public:
        int x0, y0, x1, y1;
        Rectangle(int x0, int y0, int x1, int y1);
        ~Rectangle();
        int getLength();
        int getArea();
    };
}

Overwriting rect.h

In [69]:

%%file rect.cpp
#include "rect.h"

namespace shapes {

    Rectangle::Rectangle(int X0, int Y0, int X1, int Y1) {
        x0 = X0;
        y0 = Y0;
        x1 = X1;
        y1 = Y1;
    }

    Rectangle::~Rectangle() { }

    int Rectangle::getLength() {
        return (x1 - x0);
    }

    int Rectangle::getArea() {
        return (x1 - x0) * (y1 - y0);
    }
}

Overwriting rect.cpp

Compile the C++ code to a static library

In [70]:

%%sh
g++ -c rect.cpp
ar rvs librect.a rect.o 
#g++ -shared -Wl,-soname,librect.so -o librect.so rect.o

r - rect.o

In [71]:

%%cython --cplus -I. -L. -l rect

cdef extern from "rect.h" namespace "shapes":
    cdef cppclass CPP_Rectangle "shapes::Rectangle":
        CPP_Rectangle(int, int, int, int) except +
        int x0, y0, x1, y1
        int getLength()
        int getArea()

cdef class Rectangle:
    cdef CPP_Rectangle *thisptr # wrapped C++ instance
    
    def __cinit__(self, int x0, int y0, int x1, int y1):
        self.thisptr = new CPP_Rectangle(x0, y0, x1, y1)
        
    def __dealloc__(self):
        del self.thisptr
        
    @property
    def length(self):
        return self.thisptr.getLength()
    
    @property
    def area(self):
        return self.thisptr.getArea()
    
    def __repr__(self):
        return '<Rectangle {}, {}, {}, {}>'.format(self.thisptr.x0, self.thisptr.y0, self.thisptr.x1, self.thisptr.y1)

Let's use the wrapped class

In [72]:

rec = Rectangle(1, 2, 3, 40)
print(rec, 'with area', rec.area)

<Rectangle 1, 2, 3, 40> with area 76