Book a Demo!
CoCalc Logo Icon
StoreFeaturesDocsShareSupportNewsAboutPoliciesSign UpSign In
afeiguin
GitHub Repository: afeiguin/comp-phys
Path: blob/master/09_01_random_numbers.ipynb
374 views
Kernel: Python 3

Random sequences

Imagine a bingo draw, where numbered balls are picked randomly. If you want to reproduce this process with a computer, you will find that it is not as easy as you might think. Computing is completely deterministic by nature, and reproducing or simulating naturally random processes is a particularly delicate matter.

The problem is how to use a computer to generate random numbers. In fact, this is impossible! We can program a computer to generate a sequence of numbers, following certain law. Although the output values of this sequence might look random (according to some rules that we will discuss in this section), the existence of a deterministic law behind them is telling us that preciselly, these are not random numbers at all!

We define a sequence of numbers {r1,r2,...rn}\{r_1,r_2,...r_n\} as “random” if there are no correlations among the numbers in the sequence. A random sequence can have a distribution, i. e. the probability of a number to appear in the sequence would correspond to some distribution. If the distribution is uniform, all numbers are equally probable to appear. Mathematically, the likehood of a number to occur is described by a distribution function P(r)P(r). This means that the probability of finding rir_i in the interval [r,r+dr][r,r+dr] is given by P(r)drP(r)dr. The usual random number generators provided by compilers or libraries generate a uniform distribution between 0 and 1, that means P(r)=1P(r)=1. Ideally this numbers have equal probability, and it is independent of the previous one. In other words, a random sequence should satify:

  • unpredictability,

  • independence,

  • without pattern.

In a computer, the sequences are “pseudo-random” because knowing a number rmr_m and the preceeding rir_i, we can predict the next one rm+1r_{m+1}. This is evident in the correlations. Some sophisticated psudo-random number generators do a good job hiding this fact from our eyes, although if you look hard enough, you will eventually figure it out.

Pseudo-random number generators

In this section we will simply review some possible alternatives for pseudo-random number generators. These go from the simplest “congruential” or “power residue” algorithm to more sophisticated ones that can be found in the literature. We will limit ourselves to understand this simple example. We want to generate a sequence {ri}\{r_i\} over an interval [0,M1][0,M-1]. You multiply the previous random number rir_i by a constant aa, add on another constant cc, take the modulus by MM, and then keep just the fractional part, the reminder, as the next random number ri+1r_{i+1}: ri+1=(ari+c)modM=remainder(ari+cM).r_{i+1}=(ar_i+c)\mathrm{mod}{M}=\mathrm{remainder}\left(\frac{ar_i+c}{M}\right). The value r1r_1 has to be supplied by the user, and it is called the “seed” of the sequence. The sequence will be uniquely pre-determined by the seed.

As an example, let us pick c=1c=1,a=4a=4,M=9M=9 and r1=3r_1=3. We obtain the sequence:

r1=3,r2=(4×3+1)mod9=4,r3=(4×4+1)mod9=8,r4=(4×8+1)mod9=6,,7,2,0,1,5,3,...\begin{aligned} r_1 &=& & & 3,\\ r_2 &=& (4\times 3+1)\mathrm{mod}{9}&=&4,\\ r_3 &=& (4\times 4+1)\mathrm{mod}{9}&=&8,\\ r_4 &=& (4\times 8+1)\mathrm{mod}{9}&=&6,\\ & & & &, 7, 2, 0, 1, 5, 3, ...\end{aligned}

We get a sequence of length M=9M=9 after which the entire sequence repeats. This means that the “period” of the sequence is M1M-1. If we want the numbers in the range [0,1][0,1] we would divide these values by MM. This algorithm is extremely simple and portable, and it’s particularly suitable for simple applications. As we have seen, the longer the MM, the longer the “period” of the sequence. Using large integer raises the problem of protability. Most processors use 32-bit representation for integers (some 64). This limits the largest possible integer that can be used. However, there are ways to work around this issue.

A common way to generate independent sequences is to use the internal clock of the computer to generate a relatively random seed.

Period

The period of a general linear congruantial generator (LCG) is at most MM, and for some choices of factor a much less than that. The LCG will have a full period for all seed values if and only if:

  • MM and the offset cc are relatively prime,

  • aa - 1 is divisible by all prime factors of MM

  • a1a - 1 is divisible by 4 if MM is divisible by 4.

These three requirements are referred to as the Hull-Dobell Theorem. While LCGs are capable of producing pseudorandom numbers which can pass formal tests for randomness, this is extremely sensitive to the choice of the parameters cc, MM, and aa.

Historically, poor choices had led to ineffective implementations of LCGs. A particularly illustrative example of this is RANDU, which was widely used in the early 1970s and led to many results which are currently being questioned because of the use of this poor LCG.

For more on random number generators read Knuth, Numerical Recipes, and Wikipedia.

Testing for randomness and uniformity

Since the random numbers are generated according to a deterministic rule, the numbers in the sequence must be correlated in a certain way. This can affect a simulation that assumes random events.Therefore it is wise to test a generator before you use it.

Moments

The kthk-th moment of a distribution is defined as: xk=1Ni=1Nxik.\langle x^k \rangle=\frac{1}{N} \sum_{i=1}^N x_i^k. If the numbers are distributed with a uniform probability distribution P(x)P(x), then ([random_moment]) must correspond to the moment of PP: 01xkP(x)dx1k+1.\int _0^1 {x^kP(x)dx} \sim \frac{1}{k+1}. If this holds for your generator, then you know that the distribution is uniform. If the deviation from this varies as 1/(N)1/\sqrt(N), then you also know that the distribution is random.

Autocorrelation

Another simple test determines the near-neighbor correlation in your random sequence by taking the sume of products at a “distance” mm: C(m)=xixi+m=1Ni=1Nxixi+m.C(m)=\langle x_i x_{i+m} \rangle = \frac{1}{N} \sum_{i=1}^N {x_ix_{i+m}}.

If your random numbers are distributed with a joint probability distribution P(xi,xi+m)P(x_i,x_{i+m}) and are independent and uniform, then ([random_corr]) must compare to the integral: 01dx01dyP(x,y)xy.\int _0^1 {dx \int _0^1{dy P(x,y)xy}}.

Visual test

If you plot the points (r2i,r2i+1)(r_{2i},r_{2i+1}) and you notice some sort of regularity, the sequence is not random. If the points are random, they should uniforminly fill a square with no discernible pattern.

Statistical errors

If you run a simulation using a pesudo-random generator, you may expect your measurements (of some quantity, and observable for instance) to be distributed according to some statistics. If your results do not satisfy these statistics, then you should pick another generator. If you know a good random-number generator (you usually try to avoid this kind because they are time-consumming), it is a good idea to compare your results for a short run, to see if they agree or not.

Random generator wish list

  • Efficient (you’ll need lots of them)

  • Long cycle: many techniques require a lot of generated random deviates, and you don’t want them to repeat any earlier than necessary.

  • Repeatable (no fair starting with a clock input—must be able to debug).

  • Resumeable: must be able to recall location in the sequence

  • Splittable: would like to be able to start parallel jobs with different sequences.

  • No correlations

http://www.drdobbs.com/testing-random-number-generators/184403185?pgno=1

%matplotlib inline import numpy as np from matplotlib import pyplot N = 50002 r = np.zeros(N+1) # This is my arbitrary pick (large, and *probably* primes -- haven't checked) bigint = 6537169170218971 coef = 52361276312121 # This the infamous RANDU #coef = 65539 #bigint = 2**31 #coef = 112 #bigint=2555 seed = 1. r[0] = seed for i in range(1,N+1): r[i] = (coef*r[i-1])%bigint r1 = np.zeros(N//2) r2 = np.zeros(N//2) for i in range(0,N,2): r1[i//2] = float(r[i])/float(bigint) r2[i//2] = float(r[i+1])/float(bigint) pyplot.plot(r1,r2,marker='o',linestyle='None');
Image in a Jupyter notebook
x = np.random.random(N+1) x1 = np.zeros(N//2) x2 = np.zeros(N//2) for i in range(0,N,2): x1[i//2] = x[i] x2[i//2] = x[i+1] pyplot.plot(x1,x2,marker='o',linestyle='None');
Image in a Jupyter notebook
from mpl_toolkits.mplot3d import Axes3D r1 = np.zeros(N//3) r2 = np.zeros(N//3) r3 = np.zeros(N//3) for i in range(0,N-1,3): r1[i//3] = r[i]/float(bigint) r2[i//3] = r[i+1]/float(bigint) r3[i//3] = r[i+2]/float(bigint) fig = pyplot.figure() ax = Axes3D(fig) ax.view_init(elev=10., azim=60) ax.scatter(r1,r2,r3,marker="o");
Image in a Jupyter notebook