#!/usr/bin/env python

# ----------------------------------------------------------------------------------- #
#  Python script testing randomness of single digits (i.e. [0-9]).
#  
#  Given a series of seemingly random digits, how would you go about testing, if they
#  were truly random? This exercise explores the many tests that such data can be
#  submitted to.
#  
#  Author: Troels C. Petersen (NBI)
#  Email:  petersen@nbi.dk
#  Date:   7th of December 2017
#  
# ----------------------------------------------------------------------------------- #

# Load modules here
from __future__ import print_function, division        #
import numpy as np                                     # Matlab like syntax for linear algebra and functions
import matplotlib.pyplot as plt                        # Plots and figures like you know them from Matlab


# ----------------------------------------------------------------------------------- #
# Reading data files:
# ----------------------------------------------------------------------------------- #

# Write extensive output
verbose = True

# Define list of input files
infiles = ["./data_RandomDigits2017A.txt"]
#infiles = ["./data_RandomDigits2017B.txt"]
#infiles = ["./data_RandomDigits2017C.txt"]
#infiles = ["./data_RandomDigits2017D.txt"]
#infiles = ["./data_RandomDigits2017E.txt"]

# List containing all digits
numbers = []

# Loop over input files open them in read mode
for ifile in infiles : 
    with open( ifile, "r" ) as current_file : 
        # Extract current file info:
        # Loop through each line in the file, loop through each character in the line,
        # demand character is not empty ("") and convert the result to an integer.
        # Finally add result to the numbers list
        numbers += [int(char) for line in current_file for char in line.strip() if char is not ""]

numbers = np.array(numbers)

# Print out your digits, to see everything works correctly
if verbose : 
    for i, _ in enumerate( numbers ) : 
        if (i % 50 == 49):
            print( ' '.join(map(str, numbers[i-50:i])))


print()
print("The total number of digits is ", len(numbers))


# ----------------------------------------------------------------------------------- #
# Your analysis:
# ----------------------------------------------------------------------------------- #


# ----------------------------------------------------------------------------------- #
# 
# First look at the random digits, and see if you see any patterns? Probably not, mostly
# because there are many numbers, and patterns of this kind are not that visible to the
# human eye. So, you will have to work a bit...  with statistical tests.
# 
# Before even looking at programming, think about what statistical tests you could
# submit the samples to. Then consider, how to actually carry out these tests. We will
# try in class to compile and discuss a list, before we start working on it!
# 
# Naturally, I've been so "kind" to put five data samples for 2017! One is from a random
# number generator, one is a series of digits from pi, two are pseudo-random numbers,
# which contain elements of non-randomness, and finally one is from the 50 (or so)
# digits that (almost) all of you put into the questionaire! Your job is to use
# statistical tests to try to find out, which one is human, and which two are from
# mathematics/computers, and which two are almost random, with "human intervention".
#
# 
# Questions:
# ----------
# 1) Are each digit represented roughly equally many times? Try to count/plot the
#    frequency of each digit, and ask yourself what the chance is, that they come
#    from a uniform distribution.
# 
# 2) Are there as many even as odd digits? How about low (0-4) vs. high (5-9) digits?
#    And do people have a tedency to choose an even digit after an odd and vice versa?
#    And similarly for low/high digits.
# 
# 3) Are people "afraid" of putting the same digit twice in a row? And how about three
#    or four identical digits in a row? How many would you expect to have of these, and
#    how many do you observe?
# 
# 4) Try to count, which digits follows which digits. That should be 100 counts in total.
#    If the digits were truly random, what would you expect then? Is that what you
#    observe? Can you test this for example with a Chi-Square (think about how many
#    entries you expect in each bin) and/or a likelihood?
#    NOTE: Here is actually a case, where it would not be too hard to simulate the
#          process and find out which distribution of likelihood values to expect,
#          if the distribution was truly random.
# 
# 5) Do you throughout the above process find out, which sample you contributed to,
#    and which are truly random and pseudo-random? How certain are you for each test?
# 
# Advanced:
# ---------
# 6) Can you from 50 digits alone tell, if the numbers are truly random or produced by
#    a human? And so would you be able to tell, who in class are good at producing them?
# 
# 7) The "DieHard Tests" is a series of test, which has been considered a good basis
#    for testing randomness: https://en.wikipedia.org/wiki/Diehard_tests
#    Consider it, and see if you can find/implement and use some of these.
#    You may want to check:
#     - https://github.com/reubenhwk/diehard
#     - http://webhome.phy.duke.edu/~rgb/General/dieharder.php
# ----------------------------------------------------------------------------------- #