Reading ROOT file and plotting Stuff

In this tutorial we will teach you how to read ROOT data file efficiently using root_numpy, some basic numpy that you will be using a lot, and how to make pretty plots with ease using matplotlib. For this tutorial, you will find inumpy very useful.

Reading ROOT file

We have our data stored in data directory

In [1]:
from root_numpy import root2rec
import numpy as np
In [2]:
#if yo do not have inumpy install. Install it by doing uncomment the following line
#execute it and restart kernel(CTRL-M+.)
#%install_ext https://raw.github.com/piti118/inumpy/master/inumpy.py
In [3]:
%load_ext inumpy
%pylab inline
Welcome to pylab, a matplotlib-based Python environment [backend: module://IPython.zmq.pylab.backend_inline].
For more information, type 'help(pylab)'.
In [4]:
ls data
B0B0bar-BtoDpi-all.root  cc-BtoDpi-all.root

This is how you read root files. This will read the whole file into your memory(this allows for fast plotting/replotting).

In [5]:
bb = root2rec('data/B0B0*.root')
cc = root2rec('data/cc*.root')
data = root2rec('data/*.root')

There are many way you can specify how root2rec should read your file: you can pass array for filenames, you can tell it to read specific branch only to save memory. You can read them all in root_numpy documentation.

Accessing Data

If you want to see what's in there bb.dtype.names

inumpy provieds auto complete for recarray you can try. You can never remember what's the field names are.

In [6]:
print bb.dtype.names
('nTracks', 'R2All', 'thrustMagAll', 'sphericityAll', 'mcLen', 'mcLund', 'mothIdx', 'dauLen', 'dauIdx', 'nB', 'BCosSphr', 'BCosThetaS', 'BCosThetaT', 'BCosThrust', 'BLegendreP0', 'BLegendreP2', 'BPFlow0', 'BPFlow1', 'BPFlow2', 'BPFlow3', 'BPFlow4', 'BPFlow5', 'BPFlow6', 'BPFlow7', 'BPFlow8', 'BR2ROE', 'BSphr', 'BSphrROE', 'BThrust', 'BThrustROE', 'BpostFitDeltaE', 'BpostFitMes', 'BpreFitDeltaE', 'BpreFitMes', 'BLund', 'BMCIdx', 'Bd1Lund', 'Bd1Idx', 'Bd2Lund', 'Bd2Idx', 'nD', 'DMass', 'DMassErr', 'DLund', 'DMCIdx', 'Dd1Lund', 'Dd1Idx', 'Dd2Lund', 'Dd2Idx', 'Dd3Lund', 'Dd3Idx', 'nK', 'KLund', 'KMCIdx', 'npi', 'piLund', 'piMCIdx')
In [7]:
#type bb.<TAB> here

By Element

Most basic stuff.

In [8]:
print data.R2All[0]
0.225378
In [9]:
#some of them are array
print data.mcLund[0]
print data.mcLund[0][0]
[   11   -11 70553  -511   511   413   -14    13  -411   211   411   111
  -211  -211   321  -211   211   211   211  -321    22    22  -211    13
   -14  2112  2112  -211  -211   211]
11

Important information about plotting

You should visit matplotlib gallery. Normally what I do when I want to plot something is to find something that resemble what I'm trying to plot and look at their code.

In [10]:
#you can use loadpy to load the code to try out
#%loadpy http://matplotlib.org/mpl_examples/pylab_examples/integral_demo.py

Basic Histogram

We usually want to make some plots from data.

In [11]:
hist(data.R2All); #super basic one
Out[11]:
(array([  29,  132,  382,  725,  932, 1151, 1150, 1184, 1044,  911]),
 array([ 0.01700407,  0.0652969 ,  0.11358973,  0.16188256,  0.21017539,
        0.25846822,  0.30676106,  0.35505389,  0.40334672,  0.45163955,
        0.49993238]),
 <a list of 10 Patch objects>)
In [12]:
#try out hist(<TAB> here
#We can't remember all those arguments name right
In [13]:
#some basic stuff to do with plotting
hist(data.R2All, bins=100, histtype='step', color='green')
title('R2ALL')
xlabel('Some label');
ylabel(r'$\theta \alpha \beta \Upsilon$', fontsize=20)#note the "r" infornt of string this prevent backslash escape (ex \n);
#setting limit
xlim(-1,1)# xlim(xmin=yournumber) works too
#if you want y limit use ylim()
grid(True)
#yscale('log') # if you want log scale

Overlaying histograms

In [14]:
#you can plot them at the same time and have the binning determined simultaneously
hist([data.R2All, bb.R2All, cc.R2All], bins=100, histtype='step', label=['data','bb','cc'])
legend(loc='upper left')
Out[14]:
<matplotlib.legend.Legend at 0x10405e0d0>
In [15]:
#You can plot them one at a time but becareful of binning

#this kinda work but automatic binning will make result hard to interpret.
#notice where red line go over blue line....
#since multiple automatic binning may not get the binsize/edges align.
hist(data.R2All, bins=100, histtype='step', label='data')
hist(bb.R2All, bins=100, histtype='step', label='bb')
hist(cc.R2All, bins=100, histtype='step', label='cc');
legend(fancybox=True, loc='upper left'); #Patented round corner
Out[15]:
<matplotlib.legend.Legend at 0x10a77c250>
In [16]:
#fix the binning #now they are align
h, e, p = hist(data.R2All, bins=100, histtype='step', label='data')
hist(bb.R2All, bins=e, histtype='step', label='bb') #notice bins=e
hist(cc.R2All, bins=e, histtype='step', label='cc');

Plotting Array Type

Some of the data are store in an array. We need to merge them first before plotting.

In [17]:
#for example: mass for every D candidate
#notice all the brackets
data.DMass[:30] #some of them has two.
Out[17]:
array([[ 1.86307085], [ 1.8671999], [ 1.8658011], [ 1.89872372],
       [ 1.8583039], [ 1.86692679], [ 1.86692894], [ 1.87697983],
       [ 1.87069893], [ 1.85590565], [ 1.87297702], [ 1.87504053],
       [ 1.87785125], [ 1.87464869], [ 1.86368835], [ 1.87517965],
       [ 1.86459327], [ 1.87235868], [ 1.8515023], [ 1.86251462],
       [ 1.90835404], [ 1.83234668  1.87968004], [ 1.86203623],
       [ 1.86056828], [ 1.85830772], [ 1.87834036], [ 1.86842954],
       [ 1.87176836], [ 1.85812378], [ 1.86374021]], dtype=object)
In [18]:
#np.hstack flatten our array (there np.flatten but it does something else)
np.hstack(data.DMass)
Out[18]:
array([ 1.86307085,  1.8671999 ,  1.8658011 , ...,  1.83166409,
        1.83930147,  1.87473261], dtype=float32)
In [19]:
hist(np.hstack(data.DMass), bins=100, histtype='stepfilled', alpha=0.5);

Making Simple Cut.

numpy support boolean indexing. Let me show you what that means.

In [20]:
#this gives array of boolean
bb.nTracks < 10
Out[20]:
array([ True, False, False, ..., False,  True, False], dtype=bool)
In [21]:
#those booleans can be used to indicate which one we want
print bb.R2All.size
print bb.R2All[bb.nTracks<10].size
#combining cuts
print bb.R2All[(bb.nTracks>5) & (bb.nTracks<10)].size #you need the parentheses
print bb.R2All[(bb.nTracks<5) | (bb.nTracks>10)].size
3219
1621
1591
1052
In [22]:
#you can store it in a variable to save time
less_track = bb.nTracks < 10
hist([bb.R2All, bb.R2All[less_track]], bins=100, histtype='step', label=['all','<10']);
legend();

Using cuts on Array field

In [23]:
#dmass of every event that has R2All < 0.25
hist(np.hstack(data.DMass[data.R2All<0.25]), bins=100, histtype='step'); #Select before stack;

The other way around

If you want to plot R2All of event where at least one D candidate has Mass within (1.86,1.88). This is a little more complicated. Since each element of R2All is a numpy array, we need to write a function and map them over DMass field. Here is one way to do it.

In [24]:
@np.vectorize
def any_in_DMass(x):
    return np.any((x>1.86) & (x<1.88))

#np.vectorize is a function that returns a function.
#What this does is that it turns your function in to a vectorize version
#Putting it plainly
#g = np.vectorize(f)
#This makes g a fast(and more featureful) version of
#def g(x):
#    ret = np.array(len(x))
#    for i,this_x in enumerate(x):
#        ret[i]=f(this_x)
#    return ret
#
#The decorator is just a short hand:
#@np.vectorize
#def f(x):
#    return something
#
#is equivalent to
#f = np.vectorize(f)
#
In [25]:
#you can use it like this
data_good_DMass = any_in_DMass(data.DMass)
data_good_DMass
Out[25]:
array([ True,  True,  True, ..., False, False,  True], dtype=bool)
In [26]:
hist([data.R2All[data_good_DMass], data.R2All], 
    bins=100, histtype='step');

Saving Them to your favorite format with savefig.

In [27]:
hist([bb.R2All, cc.R2All], bins=100, histtype='stepfilled', alpha=0.5, label=['bb','cc'])
title('Now they are aligned');
legend(loc='upper left');
savefig('bb_cc_r2all_comparison.pdf', bbox_inches='tight'); #bbox tight get rid of all the paddings

Making Them bigger

In [28]:
figure(figsize=(8,6), ) #creates a new figure
hist([bb.R2All, cc.R2All], bins=100, histtype='stepfilled', alpha=0.5, label=['bb','cc'])
legend(loc='upper left');

Showing multiple figure from the same cell

In [29]:
#if you are tired to typing figsize every time
#you can set rc param http://matplotlib.org/users/customizing.html
rcParams['figure.figsize']=(8,6) #now everything after this will be big size

hist([bb.R2All, cc.R2All], bins=100, histtype='stepfilled', label=['bb','cc'], stacked=True)
legend(loc='upper left');
figure()#create new figure
hist([bb.nTracks, cc.nTracks], bins=15, histtype='step', alpha=0.5, label=['bb','cc'])
legend(loc='upper left');
#click/double click to hide the sidebar
Out[29]:
<matplotlib.legend.Legend at 0x10dc5acd0>

Subplots

In [30]:
#ROOT/matlab style and some nifty trick to save typing
mystyle= dict(bins=100, histtype='step')
subplot(221)
hist(bb.R2All, **mystyle)#python keyword argument expansion
subplot(222)
hist(bb.nTracks)
subplot(223)
hist(cc.R2All, **mystyle)
subplot(224)
hist(cc.nTracks);
#subplots_adjust(hspace=0.3) #adjust the spacing like this;
In [31]:
#My favorite way
fig, [[topleft, topright],[bottomleft, bottomright]] = subplots(2,2, figsize=(8,6))
#it returns axes which you can draw stuff on it
topleft.hist(bb.R2All)
bottomleft.hist(cc.R2All)
topright.hist(bb.nTracks)
bottomright.hist(cc.nTracks);

Example of some useful plotting style

In [32]:
# line plot with dots
x = np.linspace(-1, 1, 20)
y = x**2
plot(x,y, marker='^', ls='-', label='line'); #ls stands for linestyle;
plot(x,x, marker='o', ls=':', label='dotted line')
plot(x,-x, marker='x', ls='.', label='noline')
leg = legend(numpoints=1)
leg.get_frame().set_alpha(0.5)

error bar

In [33]:
h, e = np.histogram(bb.R2All, bins=30)
err = np.sqrt(h)
x = (e[1:]+e[:-1])/2.0
errorbar(x, h, err, ls='none', marker='o' ) #ls stands for linestyle;

fill between

In [34]:
h, e = np.histogram(bb.R2All, bins=50)
err = np.sqrt(h)
x = (e[1:]+e[:-1])/2.0
plot(x, h, ls='--') 
fill_between(x, h-err, h+err, alpha=0.5, color='green')
Out[34]:
<matplotlib.collections.PolyCollection at 0x10df18990>

Black and White trick

Sometimes you need your plot to work on black and white paper.

In [35]:
#use hatch
h,e,p = hist([bb.R2All], bins=50, hatch='xx', histtype='step', color='black', label='bb')
#repeat more xxxxxx if you need more frequency
h,e,p = hist([cc.R2All], bins=e, hatch='||--', histtype='step', color='black', label='cc')
#you can also mix hatches
legend(loc='upper left')
Out[35]:
<matplotlib.legend.Legend at 0x108744b50>

2D histogram

see http://www.scipy.org/Cookbook/Matplotlib/Show_colormaps for your favorite colormap

In [36]:
subplot(121)
hist2d(bb.R2All,bb.thrustMagAll, bins=30);
title('bb')
colorbar()
xlabel('R2All')
ylabel('thrust')
subplot(122)
hist2d(cc.R2All,cc.thrustMagAll, bins=30, cmap='hot_r');
title('cc')
xlabel('R2All')
ylabel('thrust')
colorbar()
Out[36]:
<matplotlib.colorbar.Colorbar instance at 0x10e1c53b0>

Density plot

In [37]:
#call it Bayes probability if you like
bb_h ,ex, ey = histogram2d(bb.R2All,bb.thrustMagAll, bins=70)
cc_h ,ex, ey = histogram2d(cc.R2All,cc.thrustMagAll, bins=[ex,ey])
density = bb_h/(cc_h+bb_h)
density[isnan(density)] = 0. #getrid of nan
mid = lambda x: (x[1:]+x[:-1])/2 #function to convert edges to mid points
pcolor(mid(ex), mid(ey), density, cmap='gray_r')
xlabel('R2All')
ylabel('thrust')
colorbar()
-c:4: RuntimeWarning: invalid value encountered in divide
Out[37]:
<matplotlib.colorbar.Colorbar instance at 0x10e9e4e18>

Scatter and Bubble Plot

In [38]:
scatter(bb.R2All,bb.thrustMagAll, marker='.')
Out[38]:
<matplotlib.collections.PathCollection at 0x10f228a10>
In [39]:
hs = np.hstack
scatter(hs(bb.BCosSphr), hs(bb.BLegendreP2), c=hs(bb.BCosThetaT), s=hs(bb.BPFlow0)*100, alpha=0.7 )
colorbar()
xlabel('cosSphr')
ylabel('LegendreP2')
#See only big blue/red appear on the top
#This shows legendre correlates with size variable(Flow0)
#and legendre correlates with |CosThetT|
Out[39]:
<matplotlib.text.Text at 0x10f427950>

Move the legend away from the plot

See second answer from stackoverflow's How to put the legend out of the plot

In [40]:
x = np.arange(10)

figure()
ax=gca()

for i in xrange(5):
    line, = ax.plot(x, i * x, label='$y = %ix$'%i)

# Shink current axis's height by 10% on the bottom
box = ax.get_position()
ax.set_position([box.x0, box.y0 + box.height * 0.1,
                 box.width, box.height * 0.9])

# Put a legend below current axis
ax.legend(loc='upper center', bbox_to_anchor=(0.5, -0.05),
        ncol=5)
Out[40]:
<matplotlib.legend.Legend at 0x10ffb0550>

Tips

Again the best way to figure out how to plot your stuff is to look at gallery and find one that resemble what's in your mind

Note on importing pylab and pyplot

%pylab magic already import matplotlib for us. But if you want to import it manually there are two ways either

Advance Stuff I just pull off gallery

In [41]:
#%loadpy http://matplotlib.org/mpl_examples/axes_grid/scatter_hist.py
In [42]:
import numpy as np
import matplotlib.pyplot as plt

# the random data
x = np.random.randn(1000)
y = np.random.randn(1000)


fig = plt.figure(1, figsize=(5.5,5.5))

from mpl_toolkits.axes_grid1 import make_axes_locatable

# the scatter plot:
axScatter = plt.subplot(111)
axScatter.scatter(x, y)
axScatter.set_aspect(1.)

# create new axes on the right and on the top of the current axes
# The first argument of the new_vertical(new_horizontal) method is
# the height (width) of the axes to be created in inches.
divider = make_axes_locatable(axScatter)
axHistx = divider.append_axes("top", 1.2, pad=0.1, sharex=axScatter)
axHisty = divider.append_axes("right", 1.2, pad=0.1, sharey=axScatter)

# make some labels invisible
plt.setp(axHistx.get_xticklabels() + axHisty.get_yticklabels(),
         visible=False)

# now determine nice limits by hand:
binwidth = 0.25
xymax = np.max( [np.max(np.fabs(x)), np.max(np.fabs(y))] )
lim = ( int(xymax/binwidth) + 1) * binwidth

bins = np.arange(-lim, lim + binwidth, binwidth)
axHistx.hist(x, bins=bins)
axHisty.hist(y, bins=bins, orientation='horizontal')

# the xaxis of axHistx and yaxis of axHisty are shared with axScatter,
# thus there is no need to manually adjust the xlim and ylim of these
# axis.

#axHistx.axis["bottom"].major_ticklabels.set_visible(False)
for tl in axHistx.get_xticklabels():
    tl.set_visible(False)
axHistx.set_yticks([0, 50, 100])

#axHisty.axis["left"].major_ticklabels.set_visible(False)
for tl in axHisty.get_yticklabels():
    tl.set_visible(False)
axHisty.set_xticks([0, 50, 100])

plt.draw()
plt.show()
In [43]:
#%loadpy http://matplotlib.org/mpl_examples/mplot3d/contour3d_demo3.py
In [44]:
from mpl_toolkits.mplot3d import axes3d
import matplotlib.pyplot as plt
from matplotlib import cm

fig = plt.figure()
ax = fig.gca(projection='3d')
X, Y, Z = axes3d.get_test_data(0.05)
ax.plot_surface(X, Y, Z, rstride=8, cstride=8, alpha=0.3)
cset = ax.contour(X, Y, Z, zdir='z', offset=-100, cmap=cm.coolwarm)
cset = ax.contour(X, Y, Z, zdir='x', offset=-40, cmap=cm.coolwarm)
cset = ax.contour(X, Y, Z, zdir='y', offset=40, cmap=cm.coolwarm)

ax.set_xlabel('X')
ax.set_xlim(-40, 40)
ax.set_ylabel('Y')
ax.set_ylim(-40, 40)
ax.set_zlabel('Z')
ax.set_zlim(-100, 100)

plt.show()