A lot of time in our analysis we need to extract observable by fitting to a shape function. In this tutorial we will show you how to fit this in Python. The tools we will focus in this tutorial is iminuit, and probfit.They are heavily influenced by ROOFIT and ROOT's MINUIT(and also PyMinuit).
The basic idea of fitting is quite simple
We will start this tutorial with minimizer. Minuit is hands down(for me) the best minimizer there is for HEP. There are couple wrapper for Minuit in Python. The one we will show here is iminuit. It's relatively new. It has all the function I use but might not have your favorite feature; but, you are welcome to implement it(I'll point you in the right direction).
iminuit has its own quick start tutorial that givens you an overview of its feature and hard core tutorial that teach you how to complex stuff like using cython, make your own costfunction fast, and even parallel computing. We will be showing here basic feature of iminuit. If you need to do advance stuff take a look at those tutorials.
from iminuit import Minuit, describe
%load_ext inumpy
#define a function to minimize
#making x,y correlates on purpose
def f(x,y,z):
return (x-2)**2 + (y-x)**2 + (z-4)**2
#iminuit relies on python introspection to read function signature
describe(f) # one of the most useful function from iminuit
#notice here that it automatically knows about x,y,z
#no RooRealVar etc. needed here
m = Minuit(f)
#it warns about every little thing that might go wrong
#the most frequently asked question is Does my fit converge
#also look at your console it print progress if you use print_level=2
m.migrad();
#bonus: see that link with a plus sign on the top left corner?
#clicking it will give you a latex table that you can copy paste
#to your paper/beamer slide;
print m.values
print m.errors
print m.fval
m.print_param()
#correlation matrix
#Only Chrome/safari gets the vertical writing mode right.
m.print_matrix()
More details on this in the tip section at the end. Basically return value of migrad tells you a bunch of fit status.
print m.migrad_ok()
print m.matrix_accurate()
m.minos(); #do m.minos('x') if you need just 1 of them
m.merrors
m.draw_mncontour('x','y') # you can get the raw value using m.mncontour or m.mncontour_grid;
m.draw_profile('x') # this is 1d evaluation not minos scan;
m = Minuit(f, x=2, y=4, fix_y=True, limit_z=(-10,10), error_z=0.1)
m.migrad();
You could write your always own cost function(see iminuit hardcore tutorial for example). But why should you. probfit provide convenience functor for you to build a simple cost function like UnbinnedLH, BinnedLH, BinnedChi2, Chi2Regression.
Let's try to fit a simple gaussian with unbinned likelihood.
from math import exp, pi, sqrt
from probfit import UnbinnedLH
from iminuit import Minuit
seed(0)
gdata = randn(1000)
hist(gdata,bins=100, histtype='step');
#you can define your pdf manually like this
def my_gauss(x, mu, sigma):
return exp(-0.5*(x-mu)**2/sigma**2)/(sqrt(2*pi)*sigma)
#probfit use the same describe magic as iminuit
#Build your favorite cost function like this
#Notice no RooRealVar etc. It use introspection to
#find parameters
#the only requirement is that the first argument is
#in independent variable the rest are parameters
ulh = UnbinnedLH(my_gauss, gdata)
describe(ulh)
m = Minuit(ulh, mu=0.2, sigma=1.5)
m.set_up(0.5) #remember up is 0.5 for likelihood and 1 for chi^2
ulh.show(m)
m.migrad();
ulh.show(m)
probfit comes with a bunch of builtin functions so you don't have to write your own pdf. If you can't find your favorite pdf there is nothing preventing you from doing:
def my_secret_pdf(x,y,z):
return secret_formula(x,y,z)
But, it's better if you fork our project, implement it and submit a pull request.
from probfit import gaussian
ulh = UnbinnedLH(gaussian, gdata)
describe(ulh)
m = Minuit(ulh, mean=0.2, sigma =0.3)
m.migrad()
ulh.show(m)
Just a $\chi^2$ with symmetric poisson error assumption. Currently doesn't support unextended fit yet(easy fix, anyone wanna do it?). But, binned likelihood support both extended and unextended fit.
from probfit import Extended, BinnedChi2
seed(0)
gdata = randn(10000)
mypdf = Extended(gaussian)
describe(mypdf) # just basically N*gaussian(x,mean,sigma)
bx2 = BinnedChi2(mypdf, gdata, bound=(-3,3))#create cost function
bx2.show(args={'mean':1.0, 'sigma':1.0, 'N':10000}) #another way to draw it
m = Minuit(bx2, mean=1.0, sigma=1.0, N=1000.)
m.migrad()
bx2.show(m)
Poisson binned log likelihood with minimum subtractacted(aka likelihood ratio).
from probfit import Extended, BinnedLH
seed(0)
gdata = randn(10000)
mypdf = gaussian
describe(mypdf) # just basically N*gaussian(x,mean,sigma)
blh = BinnedLH(mypdf, gdata, bound=(-3,3))#create cost function
#it can also do extended one if you pass it an extended pdf and pass extended=True to BinnedLH
blh.show(args={'mean':1.0, 'sigma':1.0})
m = Minuit(blh, mean=1.0, sigma=1)
m.set_up(0.5)
m.migrad()
blh.show(m)
Some time you just want a simple line fit as opposed to fitting pdf.
from probfit import Chi2Regression
x = linspace(-10,10,30)
y = 3*x**2 +2*x + 1
#add some noise
y = y+randn(30)*10
errorbar(x,y,10, fmt='b.')
#there is a poly2 builtin but just to remind you that you can do this
def my_poly(x, a, b, c):
return a*x**2+ b*x+ c
err = np.array([10]*30)
x2reg= Chi2Regression(my_poly, x, y, error=err)
x2reg.draw(args={'a':1,'b':2,'c':3})
m = Minuit(x2reg, a=1, b=2, c=3)
m.migrad()
x2reg.show(m)
Remeber the D mass?? Let's try to fit relativistic Breit-Wigner to it.
from root_numpy import root2rec
data = root2rec('data/*.root')
bb = root2rec('data/B*.root')
cc = root2rec('data/cc*.root')
hs = np.hstack
hist([hs(data.DMass), hs(bb.DMass), hs(cc.DMass)], bins=50, histtype='step');
First lets fit bb's DMass alone with a Breit-Wigner.
from probfit import UnbinnedLH, draw_compare_hist,\
vector_apply, Normalized, rtv_breitwigner,\
linear, rename, AddPdfNorm
from iminuit import Minuit
bb_dmass = hs(bb.DMass)
print bb.DMass.size
#you can compare them like this
draw_compare_hist(rtv_breitwigner, {'m':1.87, 'gamma':0.01}, bb_dmass, normed=True);
#Our DMass is a truncated one so we need to have it normalized properly
#this is easy with Normalized functor which normalized your pdf
#this might seems a bit unusual if you never done functinal programming
#but normalize just wrap the function around and return a new function
signalpdf = Normalized(rtv_breitwigner,(1.83,1.91))
ulh = UnbinnedLH(signalpdf, bb_dmass)
m = Minuit(ulh, m=1.875, gamma=0.01)#I shift it on purpose
m.set_up(0.5)
ulh.show(m) #you can see it before the fit begins;
%timeit -n1 -r1 m.migrad();
ulh.show(m) #looks good
m.minos();#do m.minos('m') if you need just 1 parameter
m.print_param()
Now let's add the background and fit it. Looks like a job for linear + breitwigner.
bound = (1.83,1.91)
bgpdf = Normalized(linear,bound)
describe(bgpdf)
#remember our breit wigner also has m argument which means different thing
describe(rtv_breitwigner)
#renaming is easy
signalpdf = Normalized(rename(rtv_breitwigner,['x','mass','gamma']),(1.83,1.91))
describe(signalpdf)
#now we can add them
total_pdf = AddPdfNorm(signalpdf,bgpdf)
#if you want to just directly add them up with out the factor(eg. adding extended pdf)
#use AddPdf
describe(total_pdf)
ulh = UnbinnedLH(total_pdf, np.hstack(data.DMass))
m = Minuit(ulh, mass=1.87, gamma=0.01, m=0., c=1, f_0=0.7)
ulh.show(m)
m.migrad();
ulh.show(m, parts=True)
m.print_matrix()
There is nothing preventing you from doing something like this:
def mypdf(x,mass, gamma, m, c, f_0):
return brietwigner(x, mass, gamma) + f_0*(m*x+c)/normalization
ulh=UnbinnedLH(mypdf, data)
m=Minuit(ulh, **initial_values)
m.migrad()
If your PDF is more complicated than what you can do with AddPDF and AddPDFNorm, it might be easier to write it out manually.
Let try to simultaneous fit 2 gaussians with the same width but different mean.
from probfit import SimultaneousFit
data1 = randn(10000)
data2 = randn(10000)+10
hist([data1,data2], histtype='step', bins=100);
#note here that they share the same sigma
g1 = rename(gaussian, ['x','mu1','sigma'])
g2 = rename(gaussian, ['x','mu2','sigma'])
print describe(g1)
print describe(g2)
#make two likelihood and them up
ulh1 = UnbinnedLH(g1,data1)
ulh2 = UnbinnedLH(g2,data2)
sim = SimultaneousFit(ulh1,ulh2)
print describe(sim) #note the sigma merge
sim.draw(args=(0.5, 1.5, 10.5))
m = Minuit(sim, mu1=0.5, sigma=1.5, mu2=10.5)
m.migrad();
sim.show(m)
Again there is nothing preventing you from doing
def my_cost_function(mu1, mu2, sigma):
return ulh1(mu1,sigma)+ulh2(mu2.sigma)
m=Minuit(my_cost_function, **initial_values) m.migrad()
If your cost function is more complex than adding them together this is the to do it.
This is invert CDF implementation (not accept reject). Large overhead but fast element-wise. Anyone want to signup for accept/reject?
from probfit import gen_toy
toy = gen_toy(total_pdf, 1000, (1.83,1.91), mass=1.87, gamma=0.01, c=1.045, m=-0.43, f_0=0.5, quiet=False)
hist(toy, bins=100, histtype='step');
ulh = UnbinnedLH(total_pdf, toy)
m = Minuit(ulh, mass=1.87, gamma=0.01, c=1.045, m=-0.43, f_0=0.5)
m.migrad();
ulh.show(m)
A lot of time you want to generate toy from fitted parameters. Retyping/Repeating yourself is not the best use of time. Use python dictionary expansion.
m.fitarg
toy = gen_toy(total_pdf, 1000, (1.83,1.91), quiet=False, **m.fitarg)#note the double star
using np.save
np.save('mytoy.npy', toy)
loaded_toy = np.load('mytoy.npy')
hist(loaded_toy, bins=100, histtype='step');
I won't cover it but just something you may find useful.
Skip this part if you don't have cython
A much more comprehensive example of how to use cython is provided in iminuit hard core tutorial. We will show a simple example here.
%load_ext cythonmagic
%%cython
from libc.math cimport sqrt
cimport cython
cdef double pi = 3.14159265358979323846264338327
@cython.embedsignature #you need this or experimental @cython.binding.
cpdef double cython_bw(double x, double m, double gamma):
cdef double mm = m*m
cdef double xm = x*x-mm
cdef double gg = gamma*gamma
cdef double s = sqrt(mm*(mm+gg))
cdef double N = (2*sqrt(2)/pi)*m*gamma*s/sqrt(mm+s)
return N/(xm*xm+mm*gg)
cython_bw(1,2,3)
from probfit import describe
describe(cython_bw)
ulh = UnbinnedLH(cython_bw, bb_dmass)
m = Minuit(ulh, m=1.875, gamma = 0.01)
ulh.show(m)
%timeit -r1 -n1 m.migrad()
See iminuit hardcore tutorial.
status, param = m.migrad()
print status.has_covariance
print status.is_valid
#or an umbrella call
m.migrad_ok()
#minos
results = m.minos('m')
print results['m']
print results['m'].upper_valid
print results['m'].lower_valid
Some time we want to resue fitting argument.
m.fitarg
old_fitarg = m.fitarg
ulh2 = UnbinnedLH(cython_bw, bb_dmass)
m2 = Minuit(ulh2, **old_fitarg)
m2.print_param()
#since fitarg is just a dictionary you can dump it via pickle
import pickle
out = open('my_fitarg.pck','w')
pickle.dump(old_fitarg,out)
out.close()
fin = open('my_fitarg.pck','r')
loaded_fitarg = pickle.load(fin)
fin.close()
print loaded_fitarg
m3 = Minuit(ulh2, **loaded_fitarg)
m3.migrad();