#!/usr/bin/env python # ----------------------------------------------------------------------------------- # # Python script testing randomness of single digits (i.e. [0-9]). # # Given a series of seemingly random digits, how would you go about testing, if they # were truly random? This exercise explores the many tests that such data can be # submitted to. # # Author: Troels C. Petersen (NBI) # Email: petersen@nbi.dk # Date: 7th of December 2017 # # ----------------------------------------------------------------------------------- # # Load modules here from __future__ import print_function, division # import numpy as np # Matlab like syntax for linear algebra and functions import matplotlib.pyplot as plt # Plots and figures like you know them from Matlab # ----------------------------------------------------------------------------------- # # Reading data files: # ----------------------------------------------------------------------------------- # # Write extensive output verbose = True # Define list of input files infiles = ["./data_RandomDigits2017A.txt"] #infiles = ["./data_RandomDigits2017B.txt"] #infiles = ["./data_RandomDigits2017C.txt"] #infiles = ["./data_RandomDigits2017D.txt"] #infiles = ["./data_RandomDigits2017E.txt"] # List containing all digits numbers = [] # Loop over input files open them in read mode for ifile in infiles : with open( ifile, "r" ) as current_file : # Extract current file info: # Loop through each line in the file, loop through each character in the line, # demand character is not empty ("") and convert the result to an integer. # Finally add result to the numbers list numbers += [int(char) for line in current_file for char in line.strip() if char is not ""] numbers = np.array(numbers) # Print out your digits, to see everything works correctly if verbose : for i, _ in enumerate( numbers ) : if (i % 50 == 49): print( ' '.join(map(str, numbers[i-50:i]))) print() print("The total number of digits is ", len(numbers)) # ----------------------------------------------------------------------------------- # # Your analysis: # ----------------------------------------------------------------------------------- # # ----------------------------------------------------------------------------------- # # # First look at the random digits, and see if you see any patterns? Probably not, mostly # because there are many numbers, and patterns of this kind are not that visible to the # human eye. So, you will have to work a bit... with statistical tests. # # Before even looking at programming, think about what statistical tests you could # submit the samples to. Then consider, how to actually carry out these tests. We will # try in class to compile and discuss a list, before we start working on it! # # Naturally, I've been so "kind" to put five data samples for 2017! One is from a random # number generator, one is a series of digits from pi, two are pseudo-random numbers, # which contain elements of non-randomness, and finally one is from the 50 (or so) # digits that (almost) all of you put into the questionaire! Your job is to use # statistical tests to try to find out, which one is human, and which two are from # mathematics/computers, and which two are almost random, with "human intervention". # # # Questions: # ---------- # 1) Are each digit represented roughly equally many times? Try to count/plot the # frequency of each digit, and ask yourself what the chance is, that they come # from a uniform distribution. # # 2) Are there as many even as odd digits? How about low (0-4) vs. high (5-9) digits? # And do people have a tedency to choose an even digit after an odd and vice versa? # And similarly for low/high digits. # # 3) Are people "afraid" of putting the same digit twice in a row? And how about three # or four identical digits in a row? How many would you expect to have of these, and # how many do you observe? # # 4) Try to count, which digits follows which digits. That should be 100 counts in total. # If the digits were truly random, what would you expect then? Is that what you # observe? Can you test this for example with a Chi-Square (think about how many # entries you expect in each bin) and/or a likelihood? # NOTE: Here is actually a case, where it would not be too hard to simulate the # process and find out which distribution of likelihood values to expect, # if the distribution was truly random. # # 5) Do you throughout the above process find out, which sample you contributed to, # and which are truly random and pseudo-random? How certain are you for each test? # # Advanced: # --------- # 6) Can you from 50 digits alone tell, if the numbers are truly random or produced by # a human? And so would you be able to tell, who in class are good at producing them? # # 7) The "DieHard Tests" is a series of test, which has been considered a good basis # for testing randomness: https://en.wikipedia.org/wiki/Diehard_tests # Consider it, and see if you can find/implement and use some of these. # You may want to check: # - https://github.com/reubenhwk/diehard # - http://webhome.phy.duke.edu/~rgb/General/dieharder.php # ----------------------------------------------------------------------------------- #