// ----------------------------------------------------------------------------------- // /* ROOT macro for testing Raphael Weldon's dices. Walter Frank Raphael Weldon DSc FRS (15 March 1860 - 13 April 1906) generally called Raphael Weldon, was an English evolutionary biologist and a founder of biometry. He was the joint founding editor of Biometrika, with Francis Galton and Karl Pearson. http://en.wikipedia.org/wiki/Walter_Frank_Raphael_Weldon Weldon and his two collegues (Francis Galton and Karl Pearson) were interested in statistics (to say the least!), and in order to do a simple hypothesis test, Weldon rolled 12 dices 26306 times (i.e. 315672 throws), which he wrote about in a letter to Galton dated 2nd of February 1894. There were actually four data sets as follows: I: 12 dice were rolled 26306 times, and number of 5s and 6s was counted with the following result: 0 1 2 3 4 5 6 7 8 9 10 11 12 185 1149 3265 5475 6114 5194 3067 1331 403 105 14 4 0 II: 7006 of the 26306 experiments were performed by a clerk, deemed by Galton to be "reliable and accurate", yielding: 0 1 2 3 4 5 6 7 8 9 10 11 12 45 327 886 1475 1571 1404 787 367 112 29 2 1 0 III: In this subset of the data, 4096 of the rolls were scrutinized with only a 6 counting as a success: 0 1 2 3 4 5 6 7 8 9 10 11 12 447 1145 1181 796 380 115 24 8 0 0 0 0 0 IV: Finally, a subset of 4096 rolls were considered counting 4s, 5s and 6s as successes: 0 1 2 3 4 5 6 7 8 9 10 11 12 0 7 60 198 430 731 948 847 536 257 71 11 0 References: Kemp, A.W. and Kemp, C.D. (1991), "Weldon's dice data revisited", American Statistician, 45, 216-222. Author: Troels C. Petersen (NBI) Email: petersen@nbi.dk Date: 29th of September 2011 */ // ----------------------------------------------------------------------------------- // // ----------------------------------------------------------------------------------- // double sqr(double a) { // ----------------------------------------------------------------------------------- // return a*a; } // ----------------------------------------------------------------------------------- // void WeldonsDices() { // ----------------------------------------------------------------------------------- // gROOT->Reset(); // Setting of general plotting style: gStyle->SetCanvasColor(0); gStyle->SetFillColor(0); // Setting what to be shown in statistics box: gStyle->SetOptStat("e"); gStyle->SetOptFit(1111); // Random numbers from the Mersenne-Twister: TRandom3 r; r.SetSeed(0); // ------------------------------------------------------------------ // // Data: // ------------------------------------------------------------------ // const int Noutcome = 13; int outcome[Noutcome]; int data1[Noutcome] = {185, 1149, 3265, 5475, 6114, 5194, 3067, 1331, 403, 105, 14, 4, 0}; int data2[Noutcome] = { 45, 327, 886, 1475, 1571, 1404, 787, 367, 112, 29, 2, 1, 0}; int data3[Noutcome] = {447, 1145, 1181, 796, 380, 115, 24, 8, 0, 0, 0, 0, 0}; int data4[Noutcome] = { 0, 7, 60, 198, 430, 731, 948, 847, 536, 257, 71, 11, 0}; printf(" N data1 data2 data3 data4 \n"); for (int i=0; i < Noutcome; i++) { outcome[i] = i; printf(" %2d: %5d %5d %5d %5d \n", i, data1[i], data2[i], data3[i], data4[i]); } // ------------------------------------------------------------------ // // Data analysis: // ------------------------------------------------------------------ // // This is up to you!!! } //---------------------------------------------------------------------------------- /* Questions: ---------- 1) Consider first just the first (and largest) dataset. Plot the data along with the "naive" prediction, and calculate the Chi2. What is the probability of this Chi2 given the number of degrees of freedom? Is this very likely? 2) Next, consider an alternative hypothesis of a "non-naive" value of the probability for a 5 or a 6, and find the optimal value of this probability by scanning through the possible values and calculating the Chi2 for each of them. Fitting (the minimum of) this Chi2 distribution, what is the uncertainty on this probability? And how probable is the Chi2 of this new hypothesis? 3) Repeat the above scan, but now calculating the likelihood (or rather -2*log(llh)). Does this improve the measurement of the probability? 4) Do the same exercise for the other three datasets. Do they all show the same behavior, and is the large statistics needed? Advanced questions: ------------------- 1) Assuming that the dices rolled all had the same probabilities associated to each value, do a "global fit" of all four datasets, finding the probabilities which best explains all four datasets. */ //----------------------------------------------------------------------------------