Probability. Robert P. Dobrow
the random experiment using random numbers on the computer. One iteration of the experiment is called a “trial.”
2 Determine success: Based on the outcome of the trial, determine whether or not the event occurs. If yes, call that a “success.”
3 Replication: Repeat the aforementioned two steps many times. The proportion of successful trials is the simulated estimate of
Monte Carlo simulation is intuitive and matches up with our sense of how probabilities “should” behave. We give a theoretical justification for the method in Chapter 10, where we study limit theorems and the law of large numbers.
Here is a most simple, even trivial, starting example.
Example 1.32 Consider simulating the probability that an ideal fair coin comes up heads. One could do a physical simulation by just flipping a coin many times and taking the proportion of heads to estimate .Using a computer, choose the number of trials (the larger the better) and type the R command> sample(0:1, n, replace = T)The command samples with replacement from times such that outcomes are equally likely. Let 0 represent tails and 1 represent heads. The output is a sequence of ones and zeros corresponding to heads and tails. The average, or mean, of the list is precisely the proportion of ones. To simulate , type> mean(sample(0:1, n, replace = T))Repeat the command several times (use the up arrow key). These give repeated Monte Carlo estimates of the desired probability. Observe the accuracy in the estimate with one million trials:> mean(sample(0:1, 1000000, replace = T)) [1] 0.500376 > mean(sample(0:1, 1000000, replace = T)) [1] 0.499869 > mean(sample(0:1, 1000000, replace = T)) [1] 0.498946 > mean(sample(0:1, 1000000, replace = T)) [1] 0.500115
The R script CoinFlip.R simulates a familiar probability—the probability of getting three heads in three coin tosses.
R: SIMULATING THE PROBABILITY OF THREE HEADS IN THREE COIN TOSSES
# CoinFlip.R # Trial > trial <- sample(0:1, 3, replace = TRUE) # Success > if (sum(trial) == 3) 1 else 0 # Replication > n <- 10000 # Number of repetitions > simlist <- numeric(n) # Initialize vector > for (i in 1:n) { trial <- sample(0:1, 3, replace = TRUE) success <- if (sum(trial) == 3) 1 else 0 simlist[i] <- success } > mean(simlist) # Proportion of trials with 3 heads [1] 0.1293
The script is divided into three parts to illustrate (i) coding the trial, (ii) determining success, and (iii) implementing the replication.
To simulate three coin flips, use the sample
command. Again letting 1 represent heads and 0 represent tails, the command
> trial <- sample(0:1, 3, replace = TRUE)
chooses a head or tails three times. The three results are stored as a three-element list (called a vector in R) in the variable trial
.
After flipping three coins, the routine must decide whether or not they are all heads. This is done by summing the outcomes. The sum will equal three if and only if all flips are heads. This is checked with the command
> if (sum(trial) == 3) 1 else 0
which returns a 1 for success, and 0, otherwise.
For the actual simulation, the commands are repeated simlist
. This vector will consist of
Finally, after repeating simlist
. Given a list of zeros and ones, the average, or mean, of the list is precisely the proportion of ones in the list. The command mean(simlist)
finds this average giving the simulated probability of getting three heads.
Run the script via the script file or R supplement to see that the resulting estimate is fairly close to the exact solution
Reproducibility. If you and a classmate both ran the code above exactly as presented, you would get different estimates of the probability via the simulation. This is due to having different random numbers in your simulations. Some readers may know that computer random number generators are really only pseudorandom. The computer is using an algorithm that generates numbers which behave like random numbers would. It turns out that this is actually a plus when writing code or doing simulations in the sense that you can make the computer regenerate the same random numbers by “setting a seed.” You can think of a seed as telling the computer where to start the algorithm for generating the random numbers. Then, if anyone else runs the code (including the seed), they would get the same result that you did with that seed (provided they have the same software and version). In R this is accomplished with the command
> set.seed(360)
where any number can be used as the input. We will use a seed of 360 unless otherwise specified and will not show this command in the text (though it is shown in the supplements and scripts), but it is used before every script where random numbers are generated. Have fun and set seeds using numbers you enjoy!
The reason that setting seeds is important is that this makes the work reproducible. Reproducibility means that others can take the code and obtain the same results, lending credibility to the work. While writing simulations, you should aim to make sure all your code is reproducible.
Example 1.33 The script Divisible356.R simulates the divisibility problem in Example 1.30 that a random integer from is divisible by 3, 5, or 6. The problem, and the resulting code, is more complex.The function simdivis() simulates one trial. Inside the function, the expression num%%x==0 checks whether num is divisible by . The if statement checks whether num is divisible by 3, 5, or 6, returning 1 if it is, and 0, otherwise.After defining the function, typing simdivis() will simulate one trial. By repeatedly typing simdivis() on your computer, you get a feel for how this random experiment behaves over repeated trials.In this script, instead of writing a loop, we use the replicate command. This powerful R command is an alternative to writing loops for simple expressions. The syntax is replicate(n,expr) . The expression expr is repeated times creating an -element vector. Thus, the result of typing> simlist <- replicate(1000, simdivis())is a vector of 1000 ones and zeros stored in the variable simlist corresponding to success or failure in the divisibility experiment. The average mean(simlist) gives the simulated probability.Play with this script. Based on 1000 trials, you might guess that the true probability is between 0.45 and 0.49. Increase the number of trials to 10,000 and the estimates are roughly between 0.46 and 0.48. At 100,000, the estimates become even more precise between 0.465 and 0.468.We can actually quantify this increase in precision in Monte Carlo simulation as gets large. But that is a topic that will have to wait until Chapter 11.
R: SIMULATING THE DIVISIBILITY PROBABILITY
# Divisible356.R # simdivis() simulates one trial > simdivis <- function() { num <- sample(1:1000,1) if (num%%3==0 || num%%5==0 || num%%6==0) 1 else 0 } > simlist <- replicate(10000, simdivis()) mean(simlist) [1] 0.4707