// ----------------------------------------------------------------------------------- //
/*
  ROOT macro for testing Raphael Weldon's dices.

  Walter Frank Raphael Weldon DSc FRS (15 March 1860 - 13 April 1906)
  generally called Raphael Weldon, was an English evolutionary
  biologist and a founder of biometry. He was the joint founding
  editor of Biometrika, with Francis Galton and Karl Pearson.

  http://en.wikipedia.org/wiki/Walter_Frank_Raphael_Weldon

  Weldon and his two collegues (Francis Galton and Karl Pearson) were
  interested in statistics (to say the least!), and in order to do a
  simple hypothesis test, Weldon rolled 12 dices 26306 times
  (i.e. 315672 throws), which he wrote about in a letter to Galton
  dated 2nd of February 1894.

  There were actually four data sets as follows:

  I: 12 dice were rolled 26306 times, and number of 5s and 6s was
     counted with the following result:
  0	1	2	3	4	5	6	7	8	9	10	11	12
  185	1149	3265	5475	6114	5194	3067	1331	403	105	14	4	0

  II: 7006 of the 26306 experiments were performed by a clerk, deemed
      by Galton to be "reliable and accurate", yielding:
  0	1	2	3	4	5	6	7	8	9	10	11	12
  45	327	886	1475	1571	1404	787	367	112	29	2	1	0

  III: In this subset of the data, 4096 of the rolls were scrutinized
       with only a 6 counting as a success:
  0	1	2	3	4	5	6	7	8	9	10	11	12
  447	1145	1181	796	380	115	24	8	0	0	0	0	0

  IV: Finally, a subset of 4096 rolls were considered counting 4s, 5s
      and 6s as successes:
  0	1	2	3	4	5	6	7	8	9	10	11	12
  0	7	60	198	430	731	948	847	536	257	71	11	0

  References:
    Kemp, A.W. and Kemp, C.D. (1991), "Weldon's dice data revisited",
    American Statistician, 45, 216-222.

  Author: Troels C. Petersen (NBI)
  Email:  petersen@nbi.dk
  Date:   29th of September 2011
*/
// ----------------------------------------------------------------------------------- //


// ----------------------------------------------------------------------------------- //
double sqr(double a) {
// ----------------------------------------------------------------------------------- //
  return a*a;
}


// ----------------------------------------------------------------------------------- //
void WeldonsDices() {
// ----------------------------------------------------------------------------------- //
  gROOT->Reset();

  // Setting of general plotting style:
  gStyle->SetCanvasColor(0);
  gStyle->SetFillColor(0);
  // Setting what to be shown in statistics box:
  gStyle->SetOptStat("e");
  gStyle->SetOptFit(1111);

  // Random numbers from the Mersenne-Twister:
  TRandom3 r;
  r.SetSeed(0);


  // ------------------------------------------------------------------ //
  // Data:
  // ------------------------------------------------------------------ //

  const int Noutcome = 13;

  int outcome[Noutcome];
  int data1[Noutcome] = {185, 1149, 3265, 5475, 6114, 5194, 3067, 1331, 403, 105, 14,  4, 0};
  int data2[Noutcome] = { 45,  327,  886, 1475, 1571, 1404,  787,  367, 112,  29,  2,  1, 0};
  int data3[Noutcome] = {447, 1145, 1181,  796,  380,  115,   24,    8,   0,   0,  0,  0, 0};
  int data4[Noutcome] = {  0,    7,   60,  198,  430,  731,  948,  847, 536, 257, 71, 11, 0};

  printf("   N   data1  data2  data3  data4 \n");
  for (int i=0; i < Noutcome; i++) {
    outcome[i] = i;
    printf("  %2d:  %5d  %5d  %5d  %5d \n", i, data1[i], data2[i], data3[i], data4[i]);
  }


  // ------------------------------------------------------------------ //
  // Data analysis:
  // ------------------------------------------------------------------ //

  // This is up to you!!!

}


//---------------------------------------------------------------------------------- 
/*

Questions:
----------
 1) Consider first just the first (and largest) dataset. Plot the data along with the
    "naive" prediction, and calculate the Chi2. What is the probability of this Chi2
    given the number of degrees of freedom? Is this very likely?

 2) Next, consider an alternative hypothesis of a "non-naive" value of the probability
    for a 5 or a 6, and find the optimal value of this probability by scanning through
    the possible values and calculating the Chi2 for each of them.
    Fitting (the minimum of) this Chi2 distribution, what is the uncertainty on this
    probability? And how probable is the Chi2 of this new hypothesis?

 3) Repeat the above scan, but now calculating the likelihood (or rather -2*log(llh)).
    Does this improve the measurement of the probability?

 4) Do the same exercise for the other three datasets. Do they all show the same
    behavior, and is the large statistics needed?


Advanced questions:
-------------------
 1) Assuming that the dices rolled all had the same probabilities associated to each
    value, do a "global fit" of all four datasets, finding the probabilities which
    best explains all four datasets.

*/
//----------------------------------------------------------------------------------