Fitting

A lot of time in our analysis we need to extract observable by fitting to a shape function. In this tutorial we will show you how to fit this in Python. The tools we will focus in this tutorial is iminuit, and probfit.They are heavily influenced by ROOFIT and ROOT's MINUIT(and also PyMinuit).

The basic idea of fitting is quite simple

We have PDF and data.
We build our cost function.
We use minimizer to find shape parameters and one of those is our physics observable.

Minimizer

We will start this tutorial with minimizer. Minuit is hands down(for me) the best minimizer there is for HEP. There are couple wrapper for Minuit in Python. The one we will show here is iminuit. It's relatively new. It has all the function I use but might not have your favorite feature; but, you are welcome to implement it(I'll point you in the right direction).

iminuit has its own quick start tutorial that givens you an overview of its feature and hard core tutorial that teach you how to complex stuff like using cython, make your own costfunction fast, and even parallel computing. We will be showing here basic feature of iminuit. If you need to do advance stuff take a look at those tutorials.

In [1]:

from iminuit import Minuit, describe

In [2]:

%load_ext inumpy

In [3]:

#define a function to minimize
#making x,y correlates on purpose
def f(x,y,z):
    return (x-2)**2 + (y-x)**2 + (z-4)**2

In [4]:

#iminuit relies on python introspection to read function signature
describe(f) # one of the most useful function from iminuit

Out[4]:

['x', 'y', 'z']

In [5]:

#notice here that it automatically knows about x,y,z
#no RooRealVar etc. needed here
m = Minuit(f)
#it warns about every little thing that might go wrong

-c:3: InitialParamWarning: errordef is not given. Default to 1.
-c:3: InitialParamWarning: Parameter x does not have initial value. Assume 0.
-c:3: InitialParamWarning: Parameter x is floating but does not have initial step size. Assume 1.
-c:3: InitialParamWarning: Parameter y does not have initial value. Assume 0.
-c:3: InitialParamWarning: Parameter y is floating but does not have initial step size. Assume 1.
-c:3: InitialParamWarning: Parameter z does not have initial value. Assume 0.
-c:3: InitialParamWarning: Parameter z is floating but does not have initial step size. Assume 1.

In [6]:

#the most frequently asked question is Does my fit converge
#also look at your console it print progress if you use print_level=2
m.migrad();
#bonus: see that link with a plus sign on the top left corner?
#clicking it will give you a latex table that you can copy paste
#to your paper/beamer slide;

FCN = 6.57752842029e-23	NFCN = 52	NCALLS = 52
EDM = 6.57730573206e-23	GOAL EDM = 1e-05	UP = 1.0

Valid	Valid Param	Accurate Covar	PosDef	Made PosDef
True	True	True	True	False
Hesse Fail	HasCov	Above EDM		Reach calllim
False	True	False		False

+	Name	Value	Parab Error	Minos Error-	Minos Error+	Limit-	Limit+	FIXED
1	x	2.000000e+00	1.000000e+00	0.000000e+00	0.000000e+00
2	y	2.000000e+00	1.414214e+00	0.000000e+00	0.000000e+00
3	z	4.000000e+00	1.000000e+00	0.000000e+00	0.000000e+00

            \begin{tabular}{|c|r|r|r|r|r|r|r|c|}
\hline
 & Name & Value & Para Error & Error+ & Error- & Limit+ & Limit- & FIXED\\
\hline
1 & x & 2.000e+00 & 1.000e+00 &  &  &  &  & \\
\hline
2 & y & 2.000e+00 & 1.414e+00 &  &  &  &  & \\
\hline
3 & z & 4.000e+00 & 1.000e+00 &  &  &  &  & \\
\hline
\end{tabular}

Accessing Value/Error

In [7]:

print m.values
print m.errors
print m.fval

{'y': 1.999999999999814, 'x': 1.999999999997652, 'z': 3.999999999992544}
{'y': 1.414213562372341, 'x': 0.9999999999995464, 'z': 0.9999999999998791}
6.57752842029e-23

In [8]:

m.print_param()

+	Name	Value	Parab Error	Minos Error-	Minos Error+	Limit-	Limit+	FIXED
1	x	2.000000e+00	1.000000e+00	0.000000e+00	0.000000e+00
2	y	2.000000e+00	1.414214e+00	0.000000e+00	0.000000e+00
3	z	4.000000e+00	1.000000e+00	0.000000e+00	0.000000e+00

            \begin{tabular}{|c|r|r|r|r|r|r|r|c|}
\hline
 & Name & Value & Para Error & Error+ & Error- & Limit+ & Limit- & FIXED\\
\hline
1 & x & 2.000e+00 & 1.000e+00 &  &  &  &  & \\
\hline
2 & y & 2.000e+00 & 1.414e+00 &  &  &  &  & \\
\hline
3 & z & 4.000e+00 & 1.000e+00 &  &  &  &  & \\
\hline
\end{tabular}

In [9]:

#correlation matrix
#Only Chrome/safari gets the vertical writing mode right.
m.print_matrix()

+	x	y	z
x	1.00	0.71	-0.00
y	0.71	1.00	-0.00
z	-0.00	-0.00	1.00

            %\usepackage[table]{xcolor} % include this for color
%\documentclass[xcolor=table]{beamer} % for beamer
\begin{tabular}{|c|c|c|c|}
\hline
 & x & y & z\\
\hline
x & \cellcolor[RGB]{255,117,117} 1.00 & \cellcolor[RGB]{228,157,137} 0.71 & \cellcolor[RGB]{163,254,186} -0.00\\
\hline
y & \cellcolor[RGB]{228,157,137} 0.71 & \cellcolor[RGB]{255,117,117} 1.00 & \cellcolor[RGB]{163,254,186} -0.00\\
\hline
z & \cellcolor[RGB]{163,254,186} -0.00 & \cellcolor[RGB]{163,254,186} -0.00 & \cellcolor[RGB]{255,117,117} 1.00\\
\hline
\end{tabular}

Checking convergence

More details on this in the tip section at the end. Basically return value of migrad tells you a bunch of fit status.

In [10]:

print m.migrad_ok()
print m.matrix_accurate()

True
True

In [11]:

m.minos(); #do m.minos('x') if you need just 1 of them

Minos status for x: VALID

Error	-0.999999999998	1.0
Valid	True	True
At Limit	False	False
Max FCN	False	False
New Min	False	False

Minos status for y: VALID

Error	-1.41421356237	1.41421356237
Valid	True	True
At Limit	False	False
Max FCN	False	False
New Min	False	False

Minos status for z: VALID

Error	-0.999999999993	1.00000000001
Valid	True	True
At Limit	False	False
Max FCN	False	False
New Min	False	False

Out[11]:

{'x': {'lower_new_min': False, 'upper': 1.0000000000023477, 'lower': -0.9999999999976521, 'at_lower_limit': False, 'min': 1.999999999997652, 'at_lower_max_fcn': False, 'is_valid': True, 'upper_new_min': False, 'at_upper_limit': False, 'lower_valid': True, 'upper_valid': True, 'at_upper_max_fcn': False, 'nfcn': 18L},
 'y': {'lower_new_min': False, 'upper': 1.414213562373281, 'lower': -1.4142135623729089, 'at_lower_limit': False, 'min': 1.999999999999814, 'at_lower_max_fcn': False, 'is_valid': True, 'upper_new_min': False, 'at_upper_limit': False, 'lower_valid': True, 'upper_valid': True, 'at_upper_max_fcn': False, 'nfcn': 28L},
 'z': {'lower_new_min': False, 'upper': 1.000000000007456, 'lower': -0.9999999999925441, 'at_lower_limit': False, 'min': 3.999999999992544, 'at_lower_max_fcn': False, 'is_valid': True, 'upper_new_min': False, 'at_upper_limit': False, 'lower_valid': True, 'upper_valid': True, 'at_upper_max_fcn': False, 'nfcn': 18L}}

In [12]:

m.merrors

Out[12]:

{('x', -1.0): -0.9999999999976521,
 ('x', 1.0): 1.0000000000023477,
 ('y', -1.0): -1.4142135623729089,
 ('y', 1.0): 1.414213562373281,
 ('z', -1.0): -0.9999999999925441,
 ('z', 1.0): 1.000000000007456}

Contour and Profile(Scan)

In [13]:

m.draw_mncontour('x','y') # you can get the raw value using m.mncontour  or m.mncontour_grid;

In [14]:

m.draw_profile('x') # this is 1d evaluation not minos scan;

Initial value, Limit, initial error, fixing

In [15]:

m = Minuit(f, x=2, y=4, fix_y=True, limit_z=(-10,10), error_z=0.1)

-c:1: InitialParamWarning: errordef is not given. Default to 1.
-c:1: InitialParamWarning: Parameter x is floating but does not have initial step size. Assume 1.
-c:1: InitialParamWarning: Parameter z does not have initial value. Assume 0.

In [16]:

m.migrad();

FCN = 2.00000011701	NFCN = 30	NCALLS = 30
EDM = 1.17011093083e-07	GOAL EDM = 1e-05	UP = 1.0

Valid	Valid Param	Accurate Covar	PosDef	Made PosDef
True	True	True	True	False
Hesse Fail	HasCov	Above EDM		Reach calllim
False	True	False		False

+	Name	Value	Parab Error	Minos Error-	Minos Error+	Limit-	Limit+	FIXED
1	x	3.000070e+00	7.071068e-01	0.000000e+00	0.000000e+00
2	y	4.000000e+00	1.000000e+00	0.000000e+00	0.000000e+00			FIXED
3	z	3.999673e+00	9.980094e-01	0.000000e+00	0.000000e+00	-10.0	10.0

            \begin{tabular}{|c|r|r|r|r|r|r|r|c|}
\hline
 & Name & Value & Para Error & Error+ & Error- & Limit+ & Limit- & FIXED\\
\hline
1 & x & 3.000e+00 & 7.071e-01 &  &  &  &  & \\
\hline
2 & y & 4.000e+00 & 1.000e+00 &  &  &  &  & FIXED\\
\hline
3 & z & 4.000e+00 & 9.980e-01 &  &  & -1.000e+01 & 1.000e+01 & \\
\hline
\end{tabular}

Building cost fuction using probfit

You could write your always own cost function(see iminuit hardcore tutorial for example). But why should you. probfit provide convenience functor for you to build a simple cost function like UnbinnedLH, BinnedLH, BinnedChi2, Chi2Regression.

Let's try to fit a simple gaussian with unbinned likelihood.

In [17]:

from math import exp, pi, sqrt
from probfit import UnbinnedLH
from iminuit import Minuit

In [18]:

seed(0)
gdata = randn(1000)
hist(gdata,bins=100, histtype='step');

In [19]:

#you can define your pdf manually like this
def my_gauss(x, mu, sigma):
    return exp(-0.5*(x-mu)**2/sigma**2)/(sqrt(2*pi)*sigma)

In [20]:

#probfit use the same describe magic as iminuit
#Build your favorite cost function like this
#Notice no RooRealVar etc. It use introspection to
#find parameters
#the only requirement is that the first argument is
#in independent variable the rest are parameters
ulh = UnbinnedLH(my_gauss, gdata)
describe(ulh)

Out[20]:

['mu', 'sigma']

In [21]:

m = Minuit(ulh, mu=0.2, sigma=1.5)
m.set_up(0.5) #remember up is 0.5 for likelihood and 1 for chi^2
ulh.show(m)

-c:1: InitialParamWarning: Parameter mu is floating but does not have initial step size. Assume 1.
-c:1: InitialParamWarning: Parameter sigma is floating but does not have initial step size. Assume 1.

In [22]:

m.migrad();
ulh.show(m)

FCN = 1405.88688927	NFCN = 67	NCALLS = 67
EDM = 7.65781084211e-07	GOAL EDM = 5e-06	UP = 0.5

Valid	Valid Param	Accurate Covar	PosDef	Made PosDef
True	True	True	True	False
Hesse Fail	HasCov	Above EDM		Reach calllim
False	True	False		False

+	Name	Value	Parab Error	Minos Error-	Minos Error+	Limit-	Limit+	FIXED
1	mu	-4.528735e-02	3.121221e-02	0.000000e+00	0.000000e+00
2	sigma	9.870165e-01	2.206980e-02	0.000000e+00	0.000000e+00

            \begin{tabular}{|c|r|r|r|r|r|r|r|c|}
\hline
 & Name & Value & Para Error & Error+ & Error- & Limit+ & Limit- & FIXED\\
\hline
1 & $\mu$ & -4.529e-02 & 3.121e-02 &  &  &  &  & \\
\hline
2 & $\sigma$ & 9.870e-01 & 2.207e-02 &  &  &  &  & \\
\hline
\end{tabular}

Another way to fit gaussian

probfit comes with a bunch of builtin functions so you don't have to write your own pdf. If you can't find your favorite pdf there is nothing preventing you from doing:

def my_secret_pdf(x,y,z): return secret_formula(x,y,z)

But, it's better if you fork our project, implement it and submit a pull request.

In [23]:

from probfit import gaussian

In [24]:

ulh = UnbinnedLH(gaussian, gdata)
describe(ulh)

Out[24]:

['mean', 'sigma']

In [25]:

m = Minuit(ulh, mean=0.2, sigma =0.3)
m.migrad()
ulh.show(m)

-c:1: InitialParamWarning: Parameter mean is floating but does not have initial step size. Assume 1.
-c:1: InitialParamWarning: Parameter sigma is floating but does not have initial step size. Assume 1.

FCN = 1405.8868885	NFCN = 84	NCALLS = 84
EDM = 2.0619807636e-10	GOAL EDM = 5e-06	UP = 0.5

Valid	Valid Param	Accurate Covar	PosDef	Made PosDef
True	True	True	True	False
Hesse Fail	HasCov	Above EDM		Reach calllim
False	True	False		False

+	Name	Value	Parab Error	Minos Error-	Minos Error+	Limit-	Limit+	FIXED
1	mean	-4.525667e-02	3.121275e-02	0.000000e+00	0.000000e+00
2	sigma	9.870336e-01	2.207075e-02	0.000000e+00	0.000000e+00

            \begin{tabular}{|c|r|r|r|r|r|r|r|c|}
\hline
 & Name & Value & Para Error & Error+ & Error- & Limit+ & Limit- & FIXED\\
\hline
1 & mean & -4.526e-02 & 3.121e-02 &  &  &  &  & \\
\hline
2 & $\sigma$ & 9.870e-01 & 2.207e-02 &  &  &  &  & \\
\hline
\end{tabular}

Other Cost functions

Binned $\chi^2$

Just a $\chi^2$ with symmetric poisson error assumption. Currently doesn't support unextended fit yet(easy fix, anyone wanna do it?). But, binned likelihood support both extended and unextended fit.

In [26]:

from probfit import Extended, BinnedChi2
seed(0)
gdata = randn(10000)

In [27]:

mypdf = Extended(gaussian)
describe(mypdf) # just basically N*gaussian(x,mean,sigma)

Out[27]:

['x', 'mean', 'sigma', 'N']

In [28]:

bx2 = BinnedChi2(mypdf, gdata, bound=(-3,3))#create cost function
bx2.show(args={'mean':1.0, 'sigma':1.0, 'N':10000}) #another way to draw it

In [29]:

m = Minuit(bx2, mean=1.0, sigma=1.0, N=1000.)
m.migrad()
bx2.show(m)

-c:1: InitialParamWarning: Parameter mean is floating but does not have initial step size. Assume 1.
-c:1: InitialParamWarning: Parameter sigma is floating but does not have initial step size. Assume 1.
-c:1: InitialParamWarning: Parameter N is floating but does not have initial step size. Assume 1.

FCN = 33.9840496564	NFCN = 87	NCALLS = 87
EDM = 2.21724517031e-07	GOAL EDM = 1e-05	UP = 1.0

Valid	Valid Param	Accurate Covar	PosDef	Made PosDef
True	True	True	True	False
Hesse Fail	HasCov	Above EDM		Reach calllim
False	True	False		False

+	Name	Value	Parab Error	Minos Error-	Minos Error+	Limit-	Limit+	FIXED
1	mean	-1.856158e-02	1.003145e-02	0.000000e+00	0.000000e+00
2	sigma	9.883410e-01	7.437716e-03	0.000000e+00	0.000000e+00
3	N	9.970810e+03	9.970433e+01	0.000000e+00	0.000000e+00

            \begin{tabular}{|c|r|r|r|r|r|r|r|c|}
\hline
 & Name & Value & Para Error & Error+ & Error- & Limit+ & Limit- & FIXED\\
\hline
1 & mean & -1.856e-02 & 1.003e-02 &  &  &  &  & \\
\hline
2 & $\sigma$ & 9.883e-01 & 7.438e-03 &  &  &  &  & \\
\hline
3 & N & 9.971e+03 & 9.970e+01 &  &  &  &  & \\
\hline
\end{tabular}

Binned Likelihood

Poisson binned log likelihood with minimum subtractacted(aka likelihood ratio).

In [30]:

from probfit import Extended, BinnedLH
seed(0)
gdata = randn(10000)

In [31]:

mypdf = gaussian
describe(mypdf) # just basically N*gaussian(x,mean,sigma)

Out[31]:

['x', 'mean', 'sigma']

In [32]:

blh = BinnedLH(mypdf, gdata, bound=(-3,3))#create cost function
#it can also do extended one if you pass it an extended pdf and pass extended=True to BinnedLH
blh.show(args={'mean':1.0, 'sigma':1.0})

In [33]:

m = Minuit(blh, mean=1.0, sigma=1)
m.set_up(0.5)
m.migrad()
blh.show(m)

-c:1: InitialParamWarning: Parameter mean is floating but does not have initial step size. Assume 1.
-c:1: InitialParamWarning: Parameter sigma is floating but does not have initial step size. Assume 1.

FCN = 16.9210948316	NFCN = 59	NCALLS = 59
EDM = 1.35282725541e-06	GOAL EDM = 5e-06	UP = 0.5

Valid	Valid Param	Accurate Covar	PosDef	Made PosDef
True	True	True	True	False
Hesse Fail	HasCov	Above EDM		Reach calllim
False	True	False		False

+	Name	Value	Parab Error	Minos Error-	Minos Error+	Limit-	Limit+	FIXED
1	mean	-1.944627e-02	1.005891e-02	0.000000e+00	0.000000e+00
2	sigma	9.895137e-01	7.536312e-03	0.000000e+00	0.000000e+00

            \begin{tabular}{|c|r|r|r|r|r|r|r|c|}
\hline
 & Name & Value & Para Error & Error+ & Error- & Limit+ & Limit- & FIXED\\
\hline
1 & mean & -1.945e-02 & 1.006e-02 &  &  &  &  & \\
\hline
2 & $\sigma$ & 9.895e-01 & 7.536e-03 &  &  &  &  & \\
\hline
\end{tabular}

$\chi^2$ Regression

Some time you just want a simple line fit as opposed to fitting pdf.

In [34]:

from probfit import Chi2Regression

In [35]:

x = linspace(-10,10,30)
y = 3*x**2 +2*x + 1
#add some noise
y = y+randn(30)*10
errorbar(x,y,10, fmt='b.')

Out[35]:

<Container object of 3 artists>

In [36]:

#there is a poly2 builtin but just to remind you that you can do this
def my_poly(x, a, b, c):
    return a*x**2+ b*x+ c

In [37]:

err = np.array([10]*30)
x2reg= Chi2Regression(my_poly, x, y, error=err)
x2reg.draw(args={'a':1,'b':2,'c':3})

In [38]:

m = Minuit(x2reg, a=1, b=2, c=3)
m.migrad()
x2reg.show(m)

-c:1: InitialParamWarning: Parameter a is floating but does not have initial step size. Assume 1.
-c:1: InitialParamWarning: Parameter b is floating but does not have initial step size. Assume 1.
-c:1: InitialParamWarning: Parameter c is floating but does not have initial step size. Assume 1.

FCN = 34.2023676505	NFCN = 52	NCALLS = 52
EDM = 3.89079131844e-07	GOAL EDM = 1e-05	UP = 1.0

Valid	Valid Param	Accurate Covar	PosDef	Made PosDef
True	True	True	True	False
Hesse Fail	HasCov	Above EDM		Reach calllim
False	True	False		False

+	Name	Value	Parab Error	Minos Error-	Minos Error+	Limit-	Limit+	FIXED
1	a	3.011252e+00	5.738228e-02	0.000000e+00	0.000000e+00
2	b	2.121267e+00	3.058568e-01	0.000000e+00	0.000000e+00
3	c	2.134416e+00	2.741159e+00	0.000000e+00	0.000000e+00

            \begin{tabular}{|c|r|r|r|r|r|r|r|c|}
\hline
 & Name & Value & Para Error & Error+ & Error- & Limit+ & Limit- & FIXED\\
\hline
1 & a & 3.011e+00 & 5.738e-02 &  &  &  &  & \\
\hline
2 & b & 2.121e+00 & 3.059e-01 &  &  &  &  & \\
\hline
3 & c & 2.134e+00 & 2.741e+00 &  &  &  &  & \\
\hline
\end{tabular}

Let's do some physics

Remeber the D mass?? Let's try to fit relativistic Breit-Wigner to it.

In [39]:

from root_numpy import root2rec

In [40]:

data = root2rec('data/*.root')
bb = root2rec('data/B*.root')
cc = root2rec('data/cc*.root')

In [41]:

hs = np.hstack
hist([hs(data.DMass), hs(bb.DMass), hs(cc.DMass)], bins=50, histtype='step');

Simple fit

First lets fit bb's DMass alone with a Breit-Wigner.

In [42]:

from probfit import UnbinnedLH, draw_compare_hist,\
                    vector_apply, Normalized, rtv_breitwigner,\
                    linear, rename, AddPdfNorm
from iminuit import Minuit

In [43]:

bb_dmass = hs(bb.DMass)
print bb.DMass.size

In [44]:

#you can compare them like this
draw_compare_hist(rtv_breitwigner, {'m':1.87, 'gamma':0.01}, bb_dmass, normed=True);

In [45]:

#Our DMass is a truncated one so we need to have it normalized properly
#this is easy with Normalized functor which normalized your pdf
#this might seems a bit unusual if you never done functinal programming
#but normalize just wrap the function around and return a new function
signalpdf = Normalized(rtv_breitwigner,(1.83,1.91))

In [46]:

ulh = UnbinnedLH(signalpdf, bb_dmass)

In [47]:

m = Minuit(ulh, m=1.875, gamma=0.01)#I shift it on purpose
m.set_up(0.5)
ulh.show(m) #you can see it before the fit begins;

-c:1: InitialParamWarning: Parameter m is floating but does not have initial step size. Assume 1.
-c:1: InitialParamWarning: Parameter gamma is floating but does not have initial step size. Assume 1.

In [48]:

%timeit -n1 -r1 m.migrad();

-c:257: SmallIntegralWarning: (1.8630708456039429, 1.875, -0.05716159358979792)

FCN = -10458.6241946	NFCN = 53	NCALLS = 53
EDM = 6.52822663213e-07	GOAL EDM = 5e-06	UP = 0.5

Valid	Valid Param	Accurate Covar	PosDef	Made PosDef
True	True	True	True	False
Hesse Fail	HasCov	Above EDM		Reach calllim
False	True	False		False

+	Name	Value	Parab Error	Minos Error-	Minos Error+	Limit-	Limit+	FIXED
1	m	1.869087e+00	1.344898e-04	0.000000e+00	0.000000e+00
2	gamma	1.151186e-02	3.003198e-04	0.000000e+00	0.000000e+00

            \begin{tabular}{|c|r|r|r|r|r|r|r|c|}
\hline
 & Name & Value & Para Error & Error+ & Error- & Limit+ & Limit- & FIXED\\
\hline
1 & m & 1.869e+00 & 1.345e-04 &  &  &  &  & \\
\hline
2 & $\gamma$ & 1.151e-02 & 3.003e-04 &  &  &  &  & \\
\hline
\end{tabular}

1 loops, best of 1: 142 ms per loop

In [49]:

ulh.show(m) #looks good

In [50]:

m.minos();#do m.minos('m') if you need just 1 parameter

Minos status for m: VALID

Error	-0.000134384231326	0.000134627933923
Valid	True	True
At Limit	False	False
Max FCN	False	False
New Min	False	False

Minos status for gamma: VALID

Error	-0.000295920358877	0.000304715369101
Valid	True	True
At Limit	False	False
Max FCN	False	False
New Min	False	False

Out[50]:

{'gamma': {'lower_new_min': False, 'upper': 0.00030471536910134004, 'lower': -0.00029592035887724555, 'at_lower_limit': False, 'min': 0.01151186448673394, 'at_lower_max_fcn': False, 'is_valid': True, 'upper_new_min': False, 'at_upper_limit': False, 'lower_valid': True, 'upper_valid': True, 'at_upper_max_fcn': False, 'nfcn': 12L},
 'm': {'lower_new_min': False, 'upper': 0.00013462793392262257, 'lower': -0.00013438423132646268, 'at_lower_limit': False, 'min': 1.8690871087182035, 'at_lower_max_fcn': False, 'is_valid': True, 'upper_new_min': False, 'at_upper_limit': False, 'lower_valid': True, 'upper_valid': True, 'at_upper_max_fcn': False, 'nfcn': 14L}}

In [51]:

m.print_param()

+	Name	Value	Parab Error	Minos Error-	Minos Error+	Limit-	Limit+	FIXED
1	m	1.869087e+00	1.344898e-04	-1.343842e-04	1.346279e-04
2	gamma	1.151186e-02	3.003198e-04	-2.959204e-04	3.047154e-04

            \begin{tabular}{|c|r|r|r|r|r|r|r|c|}
\hline
 & Name & Value & Para Error & Error+ & Error- & Limit+ & Limit- & FIXED\\
\hline
1 & m & 1.869e+00 & 1.345e-04 & -1.344e-04 & 1.346e-04 &  &  & \\
\hline
2 & $\gamma$ & 1.151e-02 & 3.003e-04 & -2.959e-04 & 3.047e-04 &  &  & \\
\hline
\end{tabular}

More ComplexPDF

Now let's add the background and fit it. Looks like a job for linear + breitwigner.

In [52]:

bound = (1.83,1.91)
bgpdf = Normalized(linear,bound)

In [53]:

describe(bgpdf)

Out[53]:

['x', 'm', 'c']

In [54]:

#remember our breit wigner also has m argument which means different thing
describe(rtv_breitwigner)

Out[54]:

['x', 'm', 'gamma']

In [55]:

#renaming is easy
signalpdf = Normalized(rename(rtv_breitwigner,['x','mass','gamma']),(1.83,1.91))

In [56]:

describe(signalpdf)

Out[56]:

['x', 'mass', 'gamma']

In [57]:

#now we can add them
total_pdf = AddPdfNorm(signalpdf,bgpdf)
#if you want to just directly add them up with out the factor(eg. adding extended pdf)
#use AddPdf
describe(total_pdf)

Out[57]:

['x', 'mass', 'gamma', 'm', 'c', 'f_0']

In [58]:

ulh = UnbinnedLH(total_pdf, np.hstack(data.DMass))
m = Minuit(ulh, mass=1.87, gamma=0.01, m=0., c=1, f_0=0.7)
ulh.show(m)

-c:2: InitialParamWarning: Parameter mass is floating but does not have initial step size. Assume 1.
-c:2: InitialParamWarning: Parameter gamma is floating but does not have initial step size. Assume 1.
-c:2: InitialParamWarning: Parameter m is floating but does not have initial step size. Assume 1.
-c:2: InitialParamWarning: Parameter c is floating but does not have initial step size. Assume 1.
-c:2: InitialParamWarning: Parameter f_0 is floating but does not have initial step size. Assume 1.

In [59]:

m.migrad();

-c:1: SmallIntegralWarning: (1.8630708456039429, 1.87, -0.09079225177349604)
-c:1: SmallIntegralWarning: (1.8630708456039429, 0.0, -0.007922517734960222)

FCN = -21505.9979204	NFCN = 248	NCALLS = 248
EDM = 4.16787410874e-09	GOAL EDM = 5e-06	UP = 0.5

Valid	Valid Param	Accurate Covar	PosDef	Made PosDef
True	True	True	True	False
Hesse Fail	HasCov	Above EDM		Reach calllim
False	True	False		False

+	Name	Value	Parab Error	Minos Error-	Minos Error+	Limit-	Limit+	FIXED
1	mass	1.869398e+00	1.414321e-04	0.000000e+00	0.000000e+00
2	gamma	1.081224e-02	3.908496e-04	0.000000e+00	0.000000e+00
3	m	-4.677901e-01	3.999174e-02	0.000000e+00	0.000000e+00
4	c	1.041123e+00	8.896113e-02	0.000000e+00	0.000000e+00
5	f_0	5.347971e-01	1.279253e-02	0.000000e+00	0.000000e+00

            \begin{tabular}{|c|r|r|r|r|r|r|r|c|}
\hline
 & Name & Value & Para Error & Error+ & Error- & Limit+ & Limit- & FIXED\\
\hline
1 & mass & 1.869e+00 & 1.414e-04 &  &  &  &  & \\
\hline
2 & $\gamma$ & 1.081e-02 & 3.908e-04 &  &  &  &  & \\
\hline
3 & m & -4.678e-01 & 3.999e-02 &  &  &  &  & \\
\hline
4 & c & 1.041e+00 & 8.896e-02 &  &  &  &  & \\
\hline
5 & $f_{0}$ & 5.348e-01 & 1.279e-02 &  &  &  &  & \\
\hline
\end{tabular}

In [60]:

ulh.show(m, parts=True)

In [61]:

m.print_matrix()

+	mass	gamma	m	c	f_0
mass	1.00	-0.02	-0.05	-0.05	-0.02
gamma	-0.02	1.00	-0.01	-0.01	0.61
m	-0.05	-0.01	1.00	-0.85	-0.00
c	-0.05	-0.01	-0.85	1.00	-0.00
f_0	-0.02	0.61	-0.00	-0.00	1.00

            %\usepackage[table]{xcolor} % include this for color
%\documentclass[xcolor=table]{beamer} % for beamer
\begin{tabular}{|c|c|c|c|c|c|}
\hline
 & mass & $\gamma$ & m & c & $f_{0}$\\
\hline
mass & \cellcolor[RGB]{255,117,117} 1.00 & \cellcolor[RGB]{165,252,185} -0.02 & \cellcolor[RGB]{168,247,182} -0.05 & \cellcolor[RGB]{168,247,183} -0.05 & \cellcolor[RGB]{165,251,185} -0.02\\
\hline
$\gamma$ & \cellcolor[RGB]{165,252,185} -0.02 & \cellcolor[RGB]{255,117,117} 1.00 & \cellcolor[RGB]{163,253,186} -0.01 & \cellcolor[RGB]{163,253,186} -0.01 & \cellcolor[RGB]{219,171,144} 0.61\\
\hline
m & \cellcolor[RGB]{168,247,182} -0.05 & \cellcolor[RGB]{163,253,186} -0.01 & \cellcolor[RGB]{255,117,117} 1.00 & \cellcolor[RGB]{241,137,127} -0.85 & \cellcolor[RGB]{163,253,186} -0.00\\
\hline
c & \cellcolor[RGB]{168,247,183} -0.05 & \cellcolor[RGB]{163,253,186} -0.01 & \cellcolor[RGB]{241,137,127} -0.85 & \cellcolor[RGB]{255,117,117} 1.00 & \cellcolor[RGB]{163,254,186} -0.00\\
\hline
$f_{0}$ & \cellcolor[RGB]{165,251,185} -0.02 & \cellcolor[RGB]{219,171,144} 0.61 & \cellcolor[RGB]{163,253,186} -0.00 & \cellcolor[RGB]{163,254,186} -0.00 & \cellcolor[RGB]{255,117,117} 1.00\\
\hline
\end{tabular}

Note on complex PDF

There is nothing preventing you from doing something like this: def mypdf(x,mass, gamma, m, c, f_0): return brietwigner(x, mass, gamma) + f_0*(m*x+c)/normalization ulh=UnbinnedLH(mypdf, data) m=Minuit(ulh, **initial_values) m.migrad()

If your PDF is more complicated than what you can do with AddPDF and AddPDFNorm, it might be easier to write it out manually.

Simultaneous Fit

Let try to simultaneous fit 2 gaussians with the same width but different mean.

In [64]:

from probfit import SimultaneousFit

In [65]:

data1 = randn(10000)
data2 = randn(10000)+10
hist([data1,data2], histtype='step', bins=100);

In [66]:

#note here that they share the same sigma
g1 = rename(gaussian, ['x','mu1','sigma'])
g2 = rename(gaussian, ['x','mu2','sigma'])
print describe(g1)
print describe(g2)

['x', 'mu1', 'sigma']
['x', 'mu2', 'sigma']

In [67]:

#make two likelihood and them up
ulh1 = UnbinnedLH(g1,data1)
ulh2 = UnbinnedLH(g2,data2)
sim = SimultaneousFit(ulh1,ulh2)
print describe(sim) #note the sigma merge

['mu1', 'sigma', 'mu2']

In [68]:

sim.draw(args=(0.5, 1.5, 10.5))

In [69]:

m = Minuit(sim, mu1=0.5, sigma=1.5, mu2=10.5)

-c:1: InitialParamWarning: errordef is not given. Default to 1.
-c:1: InitialParamWarning: Parameter mu1 is floating but does not have initial step size. Assume 1.
-c:1: InitialParamWarning: Parameter sigma is floating but does not have initial step size. Assume 1.
-c:1: InitialParamWarning: Parameter mu2 is floating but does not have initial step size. Assume 1.

In [70]:

m.migrad();

FCN = 28276.585769	NFCN = 81	NCALLS = 81
EDM = 1.53288894679e-08	GOAL EDM = 1e-05	UP = 1.0

Valid	Valid Param	Accurate Covar	PosDef	Made PosDef
True	True	True	True	False
Hesse Fail	HasCov	Above EDM		Reach calllim
False	True	False		False

+	Name	Value	Parab Error	Minos Error-	Minos Error+	Limit-	Limit+	FIXED
1	mu1	1.023630e-02	1.407006e-02	0.000000e+00	0.000000e+00
2	sigma	9.949032e-01	7.035022e-03	0.000000e+00	0.000000e+00
3	mu2	9.994690e+00	1.407006e-02	0.000000e+00	0.000000e+00

            \begin{tabular}{|c|r|r|r|r|r|r|r|c|}
\hline
 & Name & Value & Para Error & Error+ & Error- & Limit+ & Limit- & FIXED\\
\hline
1 & mu1 & 1.024e-02 & 1.407e-02 &  &  &  &  & \\
\hline
2 & $\sigma$ & 9.949e-01 & 7.035e-03 &  &  &  &  & \\
\hline
3 & mu2 & 9.995e+00 & 1.407e-02 &  &  &  &  & \\
\hline
\end{tabular}

In [71]:

sim.show(m)

Note on simultaneous fit

Again there is nothing preventing you from doing def my_cost_function(mu1, mu2, sigma): return ulh1(mu1,sigma)+ulh2(mu2.sigma)

m=Minuit(my_cost_function, **initial_values) m.migrad()

If your cost function is more complex than adding them together this is the to do it.

Toy generation.

This is invert CDF implementation (not accept reject). Large overhead but fast element-wise. Anyone want to signup for accept/reject?

In [72]:

from probfit import gen_toy

In [73]:

toy = gen_toy(total_pdf, 1000, (1.83,1.91), mass=1.87, gamma=0.01, c=1.045, m=-0.43, f_0=0.5, quiet=False)

['x', 'mass', 'gamma', 'm', 'c', 'f_0']

In [74]:

hist(toy, bins=100, histtype='step');

In [75]:

ulh = UnbinnedLH(total_pdf, toy)
m = Minuit(ulh, mass=1.87, gamma=0.01, c=1.045, m=-0.43, f_0=0.5)
m.migrad();
ulh.show(m)

-c:2: InitialParamWarning: Parameter mass is floating but does not have initial step size. Assume 1.
-c:2: InitialParamWarning: Parameter gamma is floating but does not have initial step size. Assume 1.
-c:2: InitialParamWarning: Parameter m is floating but does not have initial step size. Assume 1.
-c:2: InitialParamWarning: Parameter c is floating but does not have initial step size. Assume 1.
-c:2: InitialParamWarning: Parameter f_0 is floating but does not have initial step size. Assume 1.

FCN = -2756.9986626	NFCN = 472	NCALLS = 472
EDM = 3.10277679189e-07	GOAL EDM = 5e-06	UP = 0.5

Valid	Valid Param	Accurate Covar	PosDef	Made PosDef
True	True	True	True	False
Hesse Fail	HasCov	Above EDM		Reach calllim
False	True	False		False

+	Name	Value	Parab Error	Minos Error-	Minos Error+	Limit-	Limit+	FIXED
1	mass	1.870038e+00	3.294864e-04	0.000000e+00	0.000000e+00
2	gamma	9.423832e-03	9.778754e-04	0.000000e+00	0.000000e+00
3	m	4.031586e+03	1.414212e+00	0.000000e+00	0.000000e+00
4	c	6.188177e+03	1.414212e+00	0.000000e+00	0.000000e+00
5	f_0	5.332493e-01	3.463870e-02	0.000000e+00	0.000000e+00

            \begin{tabular}{|c|r|r|r|r|r|r|r|c|}
\hline
 & Name & Value & Para Error & Error+ & Error- & Limit+ & Limit- & FIXED\\
\hline
1 & mass & 1.870e+00 & 3.295e-04 &  &  &  &  & \\
\hline
2 & $\gamma$ & 9.424e-03 & 9.779e-04 &  &  &  &  & \\
\hline
3 & m & 4.032e+03 & 1.414e+00 &  &  &  &  & \\
\hline
4 & c & 6.188e+03 & 1.414e+00 &  &  &  &  & \\
\hline
5 & $f_{0}$ & 5.332e-01 & 3.464e-02 &  &  &  &  & \\
\hline
\end{tabular}

Tips

A lot of time you want to generate toy from fitted parameters. Retyping/Repeating yourself is not the best use of time. Use python dictionary expansion.

In [76]:

m.fitarg

Out[76]:

{'c': 6188.177096551795,
 'error_c': 1.414212148161654,
 'error_f_0': 0.03463870349452257,
 'error_gamma': 0.0009778753714724175,
 'error_m': 1.4142121450350984,
 'error_mass': 0.0003294864038653029,
 'f_0': 0.533249323216673,
 'fix_c': False,
 'fix_f_0': False,
 'fix_gamma': False,
 'fix_m': False,
 'fix_mass': False,
 'gamma': 0.009423831866177658,
 'limit_c': None,
 'limit_f_0': None,
 'limit_gamma': None,
 'limit_m': None,
 'limit_mass': None,
 'm': 4031.5859291021125,
 'mass': 1.8700381598984956}

In [77]:

toy = gen_toy(total_pdf, 1000, (1.83,1.91), quiet=False, **m.fitarg)#note the double star

['x', 'mass', 'gamma', 'm', 'c', 'f_0']

Saving/Loading your toy

using np.save

In [78]:

np.save('mytoy.npy', toy)

In [79]:

loaded_toy = np.load('mytoy.npy')
hist(loaded_toy, bins=100, histtype='step');

Recipe

I won't cover it but just something you may find useful.

Using Cython to write pdf for speed
Checking convergence programatically
Saving and reusing fit argument

Using Cython

Skip this part if you don't have cython

A much more comprehensive example of how to use cython is provided in iminuit hard core tutorial. We will show a simple example here.

In [80]:

%load_ext cythonmagic

In [81]:

%%cython
from libc.math cimport sqrt
cimport cython

cdef double pi = 3.14159265358979323846264338327

@cython.embedsignature #you need this or experimental @cython.binding.
cpdef double cython_bw(double x, double m, double gamma):
    cdef double mm = m*m
    cdef double xm = x*x-mm
    cdef double gg = gamma*gamma
    cdef double s = sqrt(mm*(mm+gg))
    cdef double N = (2*sqrt(2)/pi)*m*gamma*s/sqrt(mm+s)
    return N/(xm*xm+mm*gg)

In [82]:

cython_bw(1,2,3)

Out[82]:

0.2585302502852219

In [83]:

from probfit import describe
describe(cython_bw)

Out[83]:

['x', 'm', 'gamma']

In [84]:

ulh = UnbinnedLH(cython_bw, bb_dmass)

In [85]:

m = Minuit(ulh, m=1.875, gamma = 0.01)
ulh.show(m)

-c:1: InitialParamWarning: Parameter m is floating but does not have initial step size. Assume 1.
-c:1: InitialParamWarning: Parameter gamma is floating but does not have initial step size. Assume 1.

In [86]:

%timeit -r1 -n1 m.migrad()

FCN = -10170.1470088	NFCN = 56	NCALLS = 56
EDM = 9.31305324182e-07	GOAL EDM = 5e-06	UP = 0.5

Valid	Valid Param	Accurate Covar	PosDef	Made PosDef
True	True	True	True	False
Hesse Fail	HasCov	Above EDM		Reach calllim
False	True	False		False

+	Name	Value	Parab Error	Minos Error-	Minos Error+	Limit-	Limit+	FIXED
1	m	1.869105e+00	1.236187e-04	0.000000e+00	0.000000e+00
2	gamma	9.609371e-03	2.250473e-04	0.000000e+00	0.000000e+00

            \begin{tabular}{|c|r|r|r|r|r|r|r|c|}
\hline
 & Name & Value & Para Error & Error+ & Error- & Limit+ & Limit- & FIXED\\
\hline
1 & m & 1.869e+00 & 1.236e-04 &  &  &  &  & \\
\hline
2 & $\gamma$ & 9.609e-03 & 2.250e-04 &  &  &  &  & \\
\hline
\end{tabular}

1 loops, best of 1: 68.8 ms per loop

Building Cost Functions Manually

See iminuit hardcore tutorial.

Checking convergence programatically

In [87]:

status, param = m.migrad()

FCN = -10170.1470097	NFCN = 10	NCALLS = 66
EDM = 4.27228260843e-13	GOAL EDM = 5e-06	UP = 0.5

Valid	Valid Param	Accurate Covar	PosDef	Made PosDef
True	True	True	True	False
Hesse Fail	HasCov	Above EDM		Reach calllim
False	True	False		False

+	Name	Value	Parab Error	Minos Error-	Minos Error+	Limit-	Limit+	FIXED
1	m	1.869105e+00	1.236193e-04	0.000000e+00	0.000000e+00
2	gamma	9.609357e-03	2.250544e-04	0.000000e+00	0.000000e+00

            \begin{tabular}{|c|r|r|r|r|r|r|r|c|}
\hline
 & Name & Value & Para Error & Error+ & Error- & Limit+ & Limit- & FIXED\\
\hline
1 & m & 1.869e+00 & 1.236e-04 &  &  &  &  & \\
\hline
2 & $\gamma$ & 9.609e-03 & 2.251e-04 &  &  &  &  & \\
\hline
\end{tabular}

In [88]:

print status.has_covariance
print status.is_valid

True
True

In [89]:

#or an umbrella call
m.migrad_ok()

Out[89]:

True

In [90]:

#minos
results = m.minos('m')
print results['m']
print results['m'].upper_valid
print results['m'].lower_valid

Minos status for m: VALID

Error	-0.000123642360121	0.000123614309471
Valid	True	True
At Limit	False	False
Max FCN	False	False
New Min	False	False

{'lower_new_min': False, 'upper': 0.0001236143094705069, 'lower': -0.00012364236012110086, 'at_lower_limit': False, 'min': 1.8691050836421013, 'at_lower_max_fcn': False, 'is_valid': True, 'upper_new_min': False, 'at_upper_limit': False, 'lower_valid': True, 'upper_valid': True, 'at_upper_max_fcn': False, 'nfcn': 12L}
True
True

Saving/Reuse fit argument

Some time we want to resue fitting argument.

In [91]:

m.fitarg

Out[91]:

{'error_gamma': 0.0002250543924548983,
 'error_m': 0.000123619287240136,
 'fix_gamma': False,
 'fix_m': False,
 'gamma': 0.00960935659431392,
 'limit_gamma': None,
 'limit_m': None,
 'm': 1.8691050836421013}

In [92]:

old_fitarg = m.fitarg
ulh2 = UnbinnedLH(cython_bw, bb_dmass)
m2 = Minuit(ulh2, **old_fitarg)

In [93]:

m2.print_param()

+	Name	Value	Parab Error	Minos Error-	Minos Error+	Limit-	Limit+	FIXED
1	m	1.869105e+00	1.236193e-04	0.000000e+00	0.000000e+00
2	gamma	9.609357e-03	2.250544e-04	0.000000e+00	0.000000e+00

            \begin{tabular}{|c|r|r|r|r|r|r|r|c|}
\hline
 & Name & Value & Para Error & Error+ & Error- & Limit+ & Limit- & FIXED\\
\hline
1 & m & 1.869e+00 & 1.236e-04 &  &  &  &  & \\
\hline
2 & $\gamma$ & 9.609e-03 & 2.251e-04 &  &  &  &  & \\
\hline
\end{tabular}

In [94]:

#since fitarg is just a dictionary you can dump it via pickle
import pickle
out = open('my_fitarg.pck','w')
pickle.dump(old_fitarg,out)
out.close()

In [95]:

fin = open('my_fitarg.pck','r')
loaded_fitarg = pickle.load(fin)
fin.close()

In [96]:

print loaded_fitarg

{'fix_m': False, 'm': 1.8691050836421013, 'limit_m': None, 'error_m': 0.000123619287240136, 'fix_gamma': False, 'error_gamma': 0.0002250543924548983, 'limit_gamma': None, 'gamma': 0.00960935659431392}

In [97]:

m3 = Minuit(ulh2, **loaded_fitarg)

In [98]:

m3.migrad();

FCN = -10170.1470097	NFCN = 29	NCALLS = 29
EDM = 1.23520523748e-10	GOAL EDM = 5e-06	UP = 0.5

Valid	Valid Param	Accurate Covar	PosDef	Made PosDef
True	True	True	True	False
Hesse Fail	HasCov	Above EDM		Reach calllim
False	True	False		False

+	Name	Value	Parab Error	Minos Error-	Minos Error+	Limit-	Limit+	FIXED
1	m	1.869105e+00	1.236185e-04	0.000000e+00	0.000000e+00
2	gamma	9.609347e-03	2.250465e-04	0.000000e+00	0.000000e+00

            \begin{tabular}{|c|r|r|r|r|r|r|r|c|}
\hline
 & Name & Value & Para Error & Error+ & Error- & Limit+ & Limit- & FIXED\\
\hline
1 & m & 1.869e+00 & 1.236e-04 &  &  &  &  & \\
\hline
2 & $\gamma$ & 9.609e-03 & 2.250e-04 &  &  &  &  & \\
\hline
\end{tabular}