The ChiSquare "Miracle"

This is an illustration of how the ChiSquare minimum (and hence Minuit) obtains uncertainties on the estimates (in fact a full covariance matrix between fitting parameters).

Converted to (scrollable) html-slides as follows:
jupyter nbconvert --to slides --SlidesExporter.reveal_scroll=True TheChiSquareMiracle_original.ipynb


Authors:

  • Troels C. Petersen (Niels Bohr Institute)

Date:

  • 16-11-2022 (latest update)
In [1]:
import numpy as np
import matplotlib.pyplot as plt
from iminuit import Minuit
import sys
from scipy import stats
In [2]:
sys.path.append('../../External_Functions')
from ExternalFunctions import Chi2Regression
from ExternalFunctions import nice_string_output, add_text_to_ax

Generating and plotting data:

In [3]:
r = np.random # Random generator
r.seed(42)    # Set a random seed (but a fixed one)

Npoints = 10000     # Number of random points produced
x_all = r.normal(loc=0.2, scale=1.1, size=Npoints)

Nbins = 100
xmin, xmax = -5, 5
binwidth = (xmax-xmin)/Nbins
In [4]:
fig, ax = plt.subplots(figsize=(14, 6))
hist = ax.hist(x_all, bins=Nbins, range=(xmin, xmax), histtype='step', \
               linewidth=2, color='blue', label='Data')
ax.set(xlabel="Value", ylabel="Frequency", title="")
ax.legend(loc='upper left', fontsize=20);

Fitting data:

In [5]:
Minuit.print_level = 1      # Printing result of the fit

# Define Gaussian PDF:
def func_gauss(x, N, mu, sigma):
    z = (x - mu) / sigma
    return N * binwidth / np.sqrt(2 * np.pi) / sigma * np.exp(-0.5 * z**2)

# Turning histogram data into x, y, and sigma_y values
# for all non-zero entries (not considered in Chi2 fit):
counts, bin_edges = np.histogram(x_all, bins=Nbins, range=(xmin, xmax), \
                                 density=False)
x = 0.5*(bin_edges[:-1] + bin_edges[1:])[counts>0]
y = counts[counts>0]
sy = np.sqrt(y)            # We'll talk more about this choice in the course!
In [6]:
# Using implemented Chi2 calculation
chi2_object = Chi2Regression(func_gauss, x, y, sy)
chi2_object.errordef = Minuit.LEAST_SQUARES  # == 1

# Defining own Chi2 calculation:
def chi2_owncalc(N, mu, sigma) :
    y_fit = func_gauss(x, N, mu, sigma)
    chi2 = np.sum(((y - y_fit) / sy)**2)
    return chi2
chi2_owncalc.errordef = Minuit.LEAST_SQUARES  # == 1

minuit_fit = Minuit(chi2_owncalc, N=1000, mu=0.0, sigma=1.0)     # Setup the fit
minuit_fit.migrad()                                              # Run the fit
Out[6]:
Migrad
FCN = 63.52 Nfcn = 77
EDM = 2.66e-07 (Goal: 0.0002)
Valid Minimum No Parameters at limit
Below EDM threshold (goal x 10) Below call limit
Covariance Hesse ok Accurate Pos. def. Not forced
Name Value Hesse Error Minos Error- Minos Error+ Limit- Limit+ Fixed
0 N 9.94e3 0.10e3
1 mu 0.200 0.011
2 sigma 1.096 0.008
N mu sigma
N 9.95e+03 -0.00034 0.00447 (0.006)
mu -0.00034 0.000122 1.98e-07 (0.002)
sigma 0.00447 (0.006) 1.98e-07 (0.002) 6.4e-05
In [7]:
# Print fit result (Notice: Incorrect number of decimals!):
for name in minuit_fit.parameters :
    value, error = minuit_fit.values[name], minuit_fit.errors[name]
    print(f"Fit value: {name} = {value:.4f} +/- {error:.4f}")

Chi2_value = minuit_fit.fval           # The minimisation (here Chi2) value
Ndof = len(x) - len(minuit_fit.values[:])
Chi2_prob = stats.chi2.sf(Chi2_value, Ndof)
print(f"\nProb(Chi2 = {Chi2_value:.1f}, Ndof = {Ndof:d}) = {Chi2_prob:6.4f}")
Fit value: N = 9943.1350 +/- 99.7506
Fit value: mu = 0.1999 +/- 0.0110
Fit value: sigma = 1.0958 +/- 0.0080

Prob(Chi2 = 63.5, Ndof = 77) = 0.8649
In [8]:
# Produce the points for drawing the fit:
x_fit = np.linspace(xmin, xmax, 500)
y_fit = func_gauss(x_fit, minuit_fit.values[0], minuit_fit.values[1], minuit_fit.values[2])
In [9]:
# Produce a nice writeup of the fitting result in the figure:
d = {'Entries':  Npoints,
     'Chi2':     Chi2_value,
     'ndf':      Ndof,
     'Prob':     Chi2_prob,
    }
for name in minuit_fit.parameters:
    d[name] = [minuit_fit.values[name], minuit_fit.errors[name]]
text = nice_string_output(d, extra_spacing=2, decimals=3)
In [10]:
# Produce figure with histogram (with error bars) and fit overlayed:
fig2, ax2 = plt.subplots(figsize=(14, 6))
ax2.errorbar(x, y, sy, fmt='.', color='blue', label='Data')
ax2.plot(x_fit, y_fit, '-', color='red', label='Fit')
ax2.set(xlabel="Value", ylabel="Frequency", title="")
ax2.legend(loc='upper left', fontsize=24)
add_text_to_ax(0.68, 0.95, text, ax2, fontsize=16)
fig2.tight_layout()

How does Minuit determine the uncertainties?

In [11]:
# Comparing fit result to the non-fit estimate of the mean and its uncertainty:
mu = np.mean(x_all)
error_mu = np.std(x_all) / np.sqrt(len(x_all))
print(f"The mean of x_all was = {mu:.3f} +- {error_mu:.3f}")
print(f"Sigma(mean) from the CALCULATION is:   {error_mu:.5f}")
The mean of x_all was = 0.198 +- 0.011
Sigma(mean) from the CALCULATION is:   0.01104
In [12]:
print(f"Sigma(mean) from the CHISQUARE FIT is: {minuit_fit.errors[1]:.5f}")
Sigma(mean) from the CHISQUARE FIT is: 0.01104
In [13]:
print(f"HOW THE H*** DID MINUIT/CHISQUARE 'KNOW' THIS?!?")
HOW THE H*** DID MINUIT/CHISQUARE 'KNOW' THIS?!?
In [14]:
# Exploring the value of the Chi2 as a function of the value of the mean:
x_chi2 = np.linspace(mu-2*error_mu, mu+2*error_mu, 100)
y_chi2 = np.zeros(len(x_chi2))
for i in range(len(y_chi2)) :
    y_chi2[i] = chi2_owncalc(minuit_fit.values[0], \
                             x_chi2[i], minuit_fit.values[2])

chi2min = np.min(y_chi2)
print(f"  The minimal Chi2 value is: {chi2min:5.2f}")
  The minimal Chi2 value is: 63.52
In [15]:
fig3, ax3 = plt.subplots(figsize=(14, 6))
ax3.plot(x_chi2, y_chi2, '-', color='blue', label='Chi2 value')
ax3.set(xlabel="Values of the mean estimate", ylabel="Chi2 value", title="")
ax3.legend(loc='upper right', fontsize=24);
In [16]:
fig3, ax3 = plt.subplots(figsize=(14, 6))
ax3.plot(x_chi2, y_chi2, '-', color='blue', label='Chi2 value')
ax3.set(xlabel="Values of the mean estimate", ylabel="Chi2 value", title="")
ax3.legend(loc='upper right', fontsize=24)

# Draw horizontal lines at minimum Chi2 and minimum Chi2+1:
plt.axhline(y=chi2min); plt.axhline(y=chi2min+1)

# Draw vertical lines where the Chi2 reaches the minimum Chi2+1:
x_chi2min = minuit_fit.values[1]
dx        = minuit_fit.errors[1]    # X-value at Chi2+1
plt.plot([x_chi2min-dx, x_chi2min-dx], [chi2min, chi2min+2], 'r', linewidth=2)
plt.plot([x_chi2min,    x_chi2min],    [chi2min, chi2min+2], 'r', linewidth=2)
plt.plot([x_chi2min+dx, x_chi2min+dx], [chi2min, chi2min+2], 'r', linewidth=2)
plt.show()
In [17]:
! jupyter nbconvert --to slides --SlidesExporter.reveal_scroll=True TheChiSquareMiracle_original.ipynb
[NbConvertApp] Converting notebook TheChiSquareMiracle_original.ipynb to slides
[NbConvertApp] Writing 439792 bytes to TheChiSquareMiracle_original.slides.html