Basic Python

This is just to give you a glimpse of what Python can do. We select only subset of the feature we think will be useful for doing analysis. We also leave links in various place in case you want to do your own further study.

For more complete features, you can look at Python official documentation or a book like Think Python

In this tutorial we will be using IPython. To execute current cell and go to the next cell in this tutorial press Shift+Enter. If things go wrong you can restart Kernel by either click on Kernel at the top bar and choose restart or press Ctrl+M+. (Press that DOT symbol too)

Hello World

In [1]:
#press shift+enter to execute this
print 'Hello world'
Hello world
In [2]:
#ipython automatically show representation of 
#the return value of the last command
1+1
Out[2]:
2

Data Type(Usual Stuff)

In [3]:
x = 1 #integer
y = 2.0 #float
t = True #boolean (False)
s = 'hello' #string
s2 = "world" #double quotes works too
#there are also triple quotes google python triple quotes
n=None #Null like variable None.
In [4]:
print x+y #you can refer to previously assigned variable.
3.0
In [5]:
s+' '+s2
Out[5]:
'hello world'
In [6]:
#boolean operations
x>1 and (y>=3 or not t) and not s=='hello' and n is None
Out[6]:
False
In [7]:
#Bonus: The only language I know that can do this
0 < x < 10
Out[7]:
True

Bonus: String formatting

One of the best implentation. There are couple ways to do string formatting in Python. This is the one I personally like.

In [8]:
'x is %d. y is %f'%(x,y)
Out[8]:
'x is 1. y is 2.000000'
In [9]:
#even more advance stuff
#locals returns dictionary of
#local variables which you then use 
#in formatting by name
'x is %(x)d. y is %(y)f'%locals() #easier to read
Out[9]:
'x is 1. y is 2.000000'

List, Set, Tuple, Dictionary, Generator

List

Think of it as std::vector++

In [10]:
l = [1, 2, 3, 4, 5, 6, 7]
print l #[1, 2, 3, 4, 5, 6, 7]
print l[2] #3
print len(l) # list length
print l[-1] #7 negative index works from the back (-1)
l2 = [] #want an empty list?
print l2
[1, 2, 3, 4, 5, 6, 7]
3
7
7
[]
In [11]:
#doesn't really need hold the same type
#but don't recommend. You will just get confused
bad_list = ['dog','cat',1,1.234]
In [12]:
l[1] = 10 #assignment
l
Out[12]:
[1, 10, 3, 4, 5, 6, 7]
In [13]:
l.append(999) #append list
l
Out[13]:
[1, 10, 3, 4, 5, 6, 7, 999]
In [14]:
#can be created from list com
l.sort() #sort
l
Out[14]:
[1, 3, 4, 5, 6, 7, 10, 999]
In [15]:
#searching O(N) use set for O(log(N))
10 in l
Out[15]:
True
In [16]:
11 not in l
Out[16]:
True
In [17]:
#useful list function
range(10) #build it all in memory
Out[17]:
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
In [18]:
#list comprehension
#we will get to for loop later but for simple one
#list comprehension is much more readable
my_list = [2*x for x in l]
print my_list
my_list = [ (2*x,x) for x in range(10)]
print my_list
my_list = [3*x for x in range(10) if x%2==0]
print my_list
[2, 6, 8, 10, 12, 14, 20, 1998]
[(0, 0), (2, 1), (4, 2), (6, 3), (8, 4), (10, 5), (12, 6), (14, 7), (16, 8), (18, 9)]
[0, 6, 12, 18, 24]
In [19]:
#This might come in handy
[1]*10
Out[19]:
[1, 1, 1, 1, 1, 1, 1, 1, 1, 1]

Bonus: Python Autocomplete

In [20]:
#in this cell try my_ and press tab
#IPython knows about local variables and
#can do autocomplete (remember locals()?)
In [21]:
#try type len(<TAB> here
#python can give you documentation/function signature etc.

Tuple

Think of it as immutable list

In [22]:
tu = (1,2,3) #tuple immutable list
print tu
tu2 = tuple(l) #convert list to tuple
print tu2
tu3 = 4,5,6 #parenthesis is actually optional but makes it more readable
print tu3
(1, 2, 3)
(1, 3, 4, 5, 6, 7, 10, 999)
(4, 5, 6)
In [23]:
#access
tu[1]
#you can't assign to it
Out[23]:
2
In [24]:
#tuple expansion
x, y, z = tu
print x #1
print y #2
print z #3
print x, y, z #you can use tuple in print statement too

x, y, z = 10, 20, 30#parenthesis is actually optional
print z, y, x #any order
1
2
3
1 2 3
30 20 10
In [25]:
#useful for returning multiple values
def f(x,y):
    return x+y, x-y #parenthesis is implied
a, b = f(10,5)
print a #15
print b #5
print a, b #works too
15
5
15 5

Dictionary

Think of it as std::map - ish. It's actually a hash table. There is also OrderedDict if you also care about ordering.

In [26]:
d = {'a':1, 'b':10, 'c':100}
print d #{'a': 1, 'c': 100, 'b': 10}
d2 = dict(a=2, b=20, c=200) #using named argument
print d2 #{'a': 2, 'c': 200, 'b': 20}
d3 = dict([('a', 3),('b', 30),('c', 300)]) #list of tuples
print d3 #{'a': 3, 'c': 300, 'b': 30}
d4 = {x:2*x for x in range(10)}#comprehension (key doesn't have to be string)
print d4 #{0: 0, 1: 2, 2: 4, 3: 6, 4: 8, 5: 10, 6: 12, 7: 14, 8: 16, 9: 18}
d5 = {} #empty dict
print d5 #{}
{'a': 1, 'c': 100, 'b': 10}
{'a': 2, 'c': 200, 'b': 20}
{'a': 3, 'c': 300, 'b': 30}
{0: 0, 1: 2, 2: 4, 3: 6, 4: 8, 5: 10, 6: 12, 7: 14, 8: 16, 9: 18}
{}
In [27]:
print d['a'] #access
print len(d) #count element
d['d'] = 1000#insert
print d #{'a': 1, 'c': 100, 'b': 10, 'd': 1000}
del d['c']#remove
print d #{'a': 1, 'b': 10, 'd': 10}
print 'c' in d #keyexists?
1
3
{'a': 1, 'c': 100, 'b': 10, 'd': 1000}
{'a': 1, 'b': 10, 'd': 1000}
False
In [28]:
#use dictionary in comprehension
#d.items() return generator which gives tuple 
#k,v in d.items() does tuple expansion in disguise
new_d = {k:2*v for k,v in d.items()}
print new_d #{'a': 2, 'b': 20, 'd': 20}
{'a': 2, 'b': 20, 'd': 2000}

Set

Binary tree-ish. Very good at membership searching. O(log(N)) instead of O(N) in list.

In [29]:
s = {1,2,3,4,5,6}
print s
s2 = set([4, 5, 6, 7, 8, 9, 9, 9, 9]) #from a list
#duplicated element is ignored
print s2
set([1, 2, 3, 4, 5, 6])
set([4, 5, 6, 7, 8, 9])
In [30]:
#membership this is O(log(n))
print 3 in s
print 10 in s
print 11 not in s
True
False
True
In [31]:
print s | s2 #union
print s & s2 #intersection
print s - s2 #differece
print s ^ s2 #symmetric differnce
print {2,3,4} <= s #subset
print s >= {2,3,4} #superset
set([1, 2, 3, 4, 5, 6, 7, 8, 9])
set([4, 5, 6])
set([1, 2, 3])
set([1, 2, 3, 7, 8, 9])
True
True
In [32]:
#insert
s.update([10,11,12])
print s
print 11 in s
set([1, 2, 3, 4, 5, 6, 10, 11, 12])
True

Control Flow

if else elif

Indentation in python is meaningful. There is "NO" bracket scoping in python.

Recommended indentation is 4 spaces not Tab. Tab works but not recommended Set your text editor to soft tab. PEP8 which list all the recommended style: space, comma, indentation, new line, comment etc. Fun Read.

In [33]:
x = 20
if x>10: #colon
    print 'greater than 10'#don't for get the indentation
elif x>5: #parenthesis is not really needed
    print 'greater than 5'
else:
    print 'not greater than 10'
x+=1#continue your execution with lower indentation
print x
greater than 10
21
In [34]:
#shorthand if
y = 'oh yes' if x>100 else 'oh no' #no colon
print y
oh no
In [35]:
#since indentation matters sometime we don't need any statement
if x>10:
    print 'yes'
else:
    pass #pass keywoard means do nothing
x+=1
print x
yes
22
In [36]:
#why is there no bracket??
from __future__ import braces #easter egg
  File "<ipython-input-36-3845971342cc>", line 2
    from __future__ import braces #easter egg
SyntaxError: not a chance

For loop, While loop, Generator, Iterable

There is actually no for(i=0;i<10;i++) in python. list is an example of iterable.

In [37]:
#iterate over list
for i in range(5): #not recommended use xrange instead
    print i#again indentation is meaningful
0
1
2
3
4

Generator

In previous example, we use range. But range will be evaluated right away and try to put [1,2,3,4,5] in the memory

This is bad if you try to loop over large number. for i in range(100000) will put 100000 numbers in to the memory first. This is very inefficient since you use each one of them once.

To fix this we use generator instead. As far as we are concern they are object that spit out number only when ask and doesn't keep it's previous states which means no access by index nor going backward. You can read more about it from python wiki. Or just google for python yield keyword.

Long story short just use for i in xrange(5) instead

In [38]:
#Lazy programming
#save memory
for i in xrange(5):
    print i
0
1
2
3
4
In [39]:
#looping the list
l = ['a','b','c']
for x in l:
    print x
a
b
c
In [40]:
#you can build your own generator too
l = [1,2,3,4]
#(2*y for y in l) is a generator that split out 2*y
#for every element in l
#not really a good way to write it but just to show it
for x in (2*y for y in l):#notice the brackets
    print x
2
4
6
8
In [41]:
#if you need index
for i,x in enumerate(l):
    print i,x
0 1
1 2
2 3
3 4
In [42]:
#looping dictionary
d = {'a':1,'b':10,'c':100}
#items() returns a generator which return tuple
#k,v in d.items() is tuple expansion in disguise
for k,v in d.items():
    print k,v
a 1
c 100
b 10
In [43]:
#looping over multiple list together
lx = [1,2,3]
ly = [x+1 for x in l]
print l,l2
for x,y in zip(lx,ly): #there is also itertools.izip that does generator
    print x,y
[1, 2, 3, 4] []
1 2
2 3
3 4
In [44]:
#complete the list with while loop
x = 0
while x<5:
    print x
    x+=1
0
1
2
3
4

See Also

For more complex looping you can look at itertools

Function

Functions in python is a first class object(except in a very few cases).

In [45]:
def f(x, y): #remember the colon
    print 'x =',x #again indentation
    print 'y =',y
    return x+y
f(10,20)
x = 10
y = 20
Out[45]:
30
In [46]:
#python is dynamic typing language
#specifically it's Duck Typing(wikipedia it. Fun Read.)
#this means as long as it has the right signature
#Python doesn't care
f('hello','world')
x = hello
y = world
Out[46]:
'helloworld'
In [47]:
#you can pass it by name too
#this is useful since you can't always remember the order
#of the arguments
f(y='y',x='x') # notice i put y before x
x = x
y = y
Out[47]:
'xy'
In [48]:
#default/keyword arguments
def g(x, y, z='hey'):
    #one of the most useful function
    print locals()#return dictionary of all local variables
g(10,20)
g(10,20,30)#can do it positionally
{'y': 20, 'x': 10, 'z': 'hey'}
{'y': 20, 'x': 10, 'z': 30}
In [49]:
g(10,z='ZZZZ',y='YYYY') #or using keyword
{'y': 'YYYY', 'x': 10, 'z': 'ZZZZ'}
In [50]:
def myfunc(x,y,z, long_keyword="111000"):
    return None
In [51]:
#IPython knows about keyword arguments name try type this
#myfunc(x, y, z, lon<TAB>

Be careful

In [52]:
#in your programming life time you might be
#you might be tempting to put a mutable object like list
#as default argument. Just Don't
def f(x,y,z=[]): #Don't do this
    pass
def f(x,y,z=None):
    z = [] if z is None else z

It has to do with closure. If you wonder why, you can read “Least Astonishment” in Python: The Mutable Default Argument.

Bonus

This might comes in handy

In [53]:
#arbitary number of argument C's va_arg
def h(x,y,*arg,**kwd):
    print locals()
h(10,20,30,40,50,custom_kwd='hey')
{'y': 20, 'x': 10, 'kwd': {'custom_kwd': 'hey'}, 'arg': (30, 40, 50)}
In [54]:
#Bonus: more cool stuff.
#argument expansion
def g(x, y, z):
    print locals()
t = (1,2,3)
g(*t)
{'y': 2, 'x': 1, 'z': 3}
In [55]:
#If you know lambda calculus
f = lambda x: x+1
f(3)
Out[55]:
4

Classes, Object etc.

Think about Object as pointer to object in C. This will answer so many question about whether we are passing by reference or value or is it copy or assignment. Internally, it actually is C pointer to struct.

In [56]:
#define a class
class MyClass:
    x = 1 #you can define a field like this
    
    #first argument is called self. It refers to itself
    #think of it as this keyword in C
    def __init__(self, y): #constructor
        self.y = y #or define it here
    
    def do_this(self, z):
        return self.x + self.y + z
In [57]:
a = MyClass(10)
print a.do_this(100)
111
In [58]:
#press a.<TAB> here for IPython autocomplete
In [59]:
#you can even add field to it
a.z = 'haha'
print a.z
haha
In [60]:
#remember when I said think of it as C pointer??
b = a
b.x = 999 #change b
print a.x #printing a.x not b.x
999
In [61]:
#you may think you won't encounter it but...
a = [1,2,3]
b = a
b[1]=10
print a
[1, 10, 3]
In [62]:
#shallow copy is easy
a = [1,2,3]
b = a[:] #remember slicing? it creates a new list
b[1] = 10
print a, b
[1, 2, 3] [1, 10, 3]

Inheritance

Python support multiple inheritance aka mixin. You can read about it here We won't need it in our tutorial. The basic syntax is the following.

In [63]:
class Parent:
    x = 10
    y = 20
    
class Child(Parent):
    x = 30
    z = 50

p = Parent()
c = Child()
print p.x
print c.x
10
30

Bonus

Everything in python is actually an Object.

This includes string, integer, functions etc. I really mean it.

In [64]:
#this might comes in handy
'_'.join(["let's",'join','this'])
Out[64]:
"let's_join_this"
In [65]:
x =1 
x.real
Out[65]:
1
In [66]:
def f(x,y):
    return x+y
f.func_code.co_varnames #This is used in writing fitting library
Out[66]:
('x', 'y')

Introspection

In [67]:
#bonus introspection
a = MyClass(10)
dir(a)
Out[67]:
['__doc__', '__init__', '__module__', 'do_this', 'x', 'y']

Class Method(Static method-ish)

If you need a classmethod(aka static-ish method), there is classmethod decorator. It's actually just a Higher order function: a function that returns function (in disguise). But we won't go into details here.

Modularizing your code: import

You normally want to put these imports at the top of your file.

In [68]:
import math #import module called math
In [69]:
print math.exp(2)
7.38905609893
In [70]:
from math import exp #import exp from math module
print exp(3)
20.0855369232
In [71]:
#not recommended but
from math import * #import every symbol in that module
sin(100)
Out[71]:
-0.5063656411097588
In [72]:
#if you hate typing
#import it into a symbol
import math as m
m.exp(10)
Out[72]:
22026.465794806718

Writing Your Own Module

For basic stuff: you just write your code and name it my_module.py put it in the same directory as the file you want to load my_module then you can do

import my_module

All functions and variables you declared in my_module.py will be in there. Python has pretty advance module system. You can read about it here

Bonus: Module search path

See The Module Search Path. Basically what it says is that it will look for .py file or directory with the same name as module you are trying to import in the following order.

  1. current directory
  2. stuff in PYTHON_PATH environment variable
  3. site-packages directory (those from python setup.py install)

The path is stored in sys.path which you can manipulate at runtime

import sys
print sys.path
sys.path.append(path_to_your_library)

Persistence: Read/Write File.

Python support introspeciton so it can dump object to a file. Most popular way to do this is pickle. We won't go into details about it but if you need it just read about it. If you need to read and write a raw file, look here.

Exception: raise, try catch finally

In [73]:
def f(x):
    #we need x to be >0
    if x<0:
        raise RuntimeError('Dude!! watch what you are doing')
    return 10
In [74]:
f(-1)
---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
<ipython-input-74-512b55eb878d> in <module>()
----> 1 f(-1)

<ipython-input-73-34d8ced2632b> in f(x)
      2     #we need x to be >0
      3     if x<0:
----> 4         raise RuntimeError('Dude!! watch what you are doing')
      5     return 10

RuntimeError: Dude!! watch what you are doing
In [75]:
#we can catch with with try except block
try:
    f(-1)
except RuntimeError as e:
    print e
Dude!! watch what you are doing
In [76]:
try:
    f(-1)
except RuntimeError as e:
    print 'hoooooooooooo'
    print e
    raise #you can reraise it
---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
<ipython-input-76-4d944379e2d6> in <module>()
      1 try:
----> 2     f(-1)
      3 except RuntimeError as e:
      4     print 'hoooooooooooo'
      5     print e

<ipython-input-73-34d8ced2632b> in f(x)
      2     #we need x to be >0
      3     if x<0:
----> 4         raise RuntimeError('Dude!! watch what you are doing')
      5     return 10

RuntimeError: Dude!! watch what you are doing
hoooooooooooo
Dude!! watch what you are doing