Applied Statistics - Project 1


Project description:
Background description:
The ATLAS detector at CERNs new LHC accelerator is a very large and complex detector consisting of several layers of different detector types, which each perform a series of tasks, which in combination will allow the ATLAS collaboration to explore the proton collisions from the LHC. In order to test these detectors, and to charactarize and optimize them, all parts of the detector were put together as a "slice" of the ATLAS detector, and submitted to an extensive testbeam program. One of the main problems of the test beam was, that the composition of the beams in terms of particle types is not easily controlled by the accelerator people. It contains both electrons and pions, and so for each particle, it will have to be determined in situ which of the two it is. And that is now the task before you...


Layout of setup (not showing the Cherenkov detector) and the photo of actual situation at CERNs H8 testbeam facility.

The data:
The testbeam data for this project was taken at a beam energy of 2 GeV. Single particles were sent through the detector layers, and for each particle passing into the detector slice, the following numbers (among 100s of others!) were recorded:
- Cherenkov counts (1 number). The Cherenkov counter reacts on particles, which travels faster than the speed of light IN THAT MEDIUM, thus electrons will tend to give a larger signal.
- Transition Radiation Tracker (2 numbers, both integer). Number of hits for tracking in the TRT (first) and number of High Threshold (HT) hits (second) for identifying electrons, since these have a higher chance of creating such a hit.
- ElectroMagnetic Calorimeter (4 numbers). This type of detector stops particles interacting electromagnetically (i.e. electrons), which thus tend to deposit their energy AS EARLY as possible. The four numbers correspond to the energy deposit (in GeV) in each of the 4 layers of the ATLAS EM calorimeter.
- Hadronic Calorimeter (3 numbers). Much thicker than the EM calorimeter, this detector stops all particles except muons with more than 3-4 GeV in energy. Gives no signal, if no particle reaches the detector. Electrons hardly ever reach this detector.
- Muon detector (1 number). If a muons passes through, this detector gives a higher signal than when not. However, the coverage of the muon detector used was very small, and thus not all muons gives a signal.

Note:
The Cherenkov counter is not part of the ATLAS detector, but simply put into the beam in order to make the task of identifying particles easier.

For an illustration of particle interactions:


Suggestive project questions:
Generally, this project is about separating electrons and pions (and determining how well this can be done), followed by a few questions characterizing the detector response to each type of particle.
Below are some leading questions, some/most of which your project should cover, but you do not have to follow them blindly (I've put "Optional" on those that are not essential).
Your report should be very short (3 pages), explaining how you analysed this data, and what your conclusions were. Possibly a few figures and a table of efficiencies, rejection factors, sample sizes, etc. should/could be included.
  1. Start by considering the data, and get a feel for the typical range of each variable. Plot the variables! From considering these plots, guess/estimate an approximate knowledge of how electrons and pions distribute themselves in the variables above, and how to make a selection of these.
    As described above, the three detectors, Cherenkov, TRT and ECal are each capable of separating electrons and pions. As they are INDEPENDENT (three separate detectors), they may be used to cross check each other, and this is what you should use!

    Find for each of these three detector systems one variable, which seem to separate electrons and pions best. For example, start with the Cherenkov, which is only a single number, and assume/guess that the large peak at low values is mainly from pions, while the upper tail is from electrons.
    Now plot the TRT and ECal distributions when the Cherenkov selects a pion and afterwards an electron. This should give you a hint of how to separate pions and electrons using the TRT and ECal.

    Hint: Sometimes variables from a single detector are more powerful, when they are taken ratios of. For the TRT this may be somewhat obvious, but for the LAr, it is not as simple. While one may simply use the second layer, involving this layer in a ratio may enhance the separation power.

  2. Next you should try to see, if you can make a selection, which gives you a fairly large and clean electron and pion sample, respectively. The question is, how can you know how clean your sample is and how efficient your selection is?
    This can actually be measured in the data itself, using the fact that there are three independent detectors. For example, start by making an electron and a pion selection using two of the three variables, and plot the third variable for each of these selections. Now you can directly see, how electrons and pions will distribute themselves in this third variable. Are you worried, that there are pions in your electron sample, and vice versa? Well, there will probably be, but so few, that it won't matter, at least not to begin with.
    For let us assume that for each detector, 80% of electrons pass your requirement, but also 10% of pions do. Assuming an even number of electrons and pions (which is not really the case), then with two detector cuts, you should get a sample, which is: 0.8*0.8 / (0.8*0.8 + 0.1*0.1) = 98.5% pure

    Now with this sample based on cuts on the two other detectors, ask what fraction of electrons and pions passes your electron selection. The fraction of electrons, that are not selected as electrons will be your TYPE I errors, denoted alpha, while the fraction of pions, that do pass the cut will be your TYPE II errors, denoted beta.
    Measure these for each of the two cuts in the three detector types, and ask yourself if they are "reasonable", i.e. something like in the example above. If not, then you should perhaps reconsider adjusting your cuts.
    By now, you should for each detector have 6 numbers:
    - The electron cut value above which you accept an electron.
    - The efficiency (i.e. 1-alpha) for electrons of this cut.
    - The fake rate (i.e. beta) for pions of this cut.
    - The pion cut value below which you accept a pion.
    - The efficiency (i.e. 1-alpha) for pions of this cut.
    - The fake rate (i.e. beta) for electrons of this cut.
    Put these 18 numbers into a nice table in your report. They sum up what you have done, and how well it works!

  3. Given the efficiencies and fake rates of each cut, try to combine these (again assuming that they are independent) into knowledge of your sample purities and also the total number of electrons and pions in the whole sample. Do the sum of estimated electrons and pions added actually match the number of particles in total? This is a good cross check!

  4. If the number of pions was suddenly 1.000 that of elections, would you still be able to get a sample of fairly pure electrons? And if so, what would the efficiency for these electrons be?

  5. (Optional) The three detectors, Cherenkov, TRT and ECal are each capable of separating electrons and pions. But which one is best? Try to get a ROC-curve for each of the three detectors (i.e. by varying the selection cut and recalculating the alpha and beta), and compare these.

  6. One of the purposes of the testbeam what to measure the response of the TRT detector to exactly electrons and pions. Omitting the TRT from the particle selection, how good is the pion rejection in the TRT, when a 90% electron efficiency is required? That is, when alpha is 0.10, how low a value for beta do you get?

  7. Try to select pions using three different (mutually exclusive) techniques:
    a) Passing only a hadronic calorimeter requirement (which is that the sum of the three HCal values is above some minimum energy).
    b) Passing only Cherenkov AND ECal requirements.
    c) Passing both a) and b).
    Try to measure the HT probability (i.e. fraction of High-Threshold hits) for each of these pion samples. Do they agree with each other?

  8. (Optinal) Try to take a sample of events, where there are exactly 33 TRT hits, and plot the number of HT hits. Given that there are both electrons and pions in the sample (each with a different HT probability), this should be a combination of two Poisson distributions. Try to fit the number of HT hits with two Poisson distributions combined. Do they fit the data? And what is then the probability for a HT hit for pions and electrons (including errors)?
    These probabilities could also be measured simply by selecting pure samples of pions and electrons using other detectors, and then looking at the HT probability in these. Which is more precise?
    Given your knowledge of the sample purities, you might add a systematic uncertainty (and correction?) to the latter estimate.

    Finally, assuming that the number of TRT hits is independent of particle type, the double Poisson fit can actually be used to estimate the fraction of pions and electrons in the sample. Do this, and compare it to your answer in 3). Do they agree?
    Perhaps retry with other values for the number of TRT hits...

Data sets:
Below are the files needed for starting the project. The ROOT macro will loop over the data files, but it is for you to write the analysis part.
  • ROOT macro for reading the data.
  • Data in ascii format and root ntuple format.
  • Quick ROOT macro for getting initial idea of the data.

    PRL Latex template:
    The project should be written in Physical Review Letter style (or something close to it, if you don't like Latex), and below you can find the files needed (works with pdftex as well, except for the figures, which needs to be converted into .pdf or .png):
  • PRL Latex template.
  • Test figure 1.
  • Test figure 2 (wide).
  • Result using current template.

    Comments:
    Enjoy, have fun, and throw yourself without worries at the data.


    Last updated 19th of September 2011.