Probability with R. Jane M. Horgan
All errors in the first edition have hopefully been corrected. I apologize in advance for any new errors that may escape my notice in this edition; should they arise, they will be corrected in the companion website.
Jane M. Horgan
Dublin City University
Ireland
2019
Preface to the First Edition
This book is offered as a first introduction to probability, and its application to computer disciplines. It has grown from a one‐semester course delivered over the past several years to students reading for a degree in computing at Dublin City University. Students of computing seem to be able happily to think about Database, Computer Architecture, Language Design, Software Engineering, Operating Systems, and then to freeze up when it comes to “Probability,” and to wonder what it might have to do with computing. Convincing undergraduates of the relevance of probability to computing is one of the objectives of this book.
One reason for writing this has been my inability to find a good text in which probability is applied to problems in computing at the appropriate level. Most existing texts on probability seem to be overly rigorous, too mathematical for the typical computing student. While some computer students may be adept at mathematics, there are many who resist the subject. In this book, we have largely replaced the mathematical approach to probability by one of simulation and experimentation, taking advantage of the powerful graphical and simulation facilities of the statistical system R, which is freely available, and downloadable, from the web. The text is designed for students who have taken a first course in mathematics, involving just a little calculus, as is usual in most degree courses in computing. Mathematical derivations in the main text are kept to a minimum: when we think it necessary, algebraic details are provided in the appendices. To emphasize our attitude to the simulation and experimentation approach, we have chosen to incorporate instructions in R throughout the text, rather than put them back to an appendix.
Features of the book which distinguish it from other texts in probability include
R is used not only as a tool for calculation and data analysis, but also to illustrate the concepts of probability, to simulate distributions, and to explore by experimentation different scenarios in decision‐making. The R books currently available skim over the concepts of probability, and concentrate on using it for statistical inference and modelling.
Recognizing that the student better understands definitions, generalizations and abstractions after seeing the applications, almost all new ideas are introduced and illustrated by real, computer‐related, examples, covering a wide range of computer science applications.
Although we have addressed in the first instance computer scientists, we believe that this book should also be suitable for students of engineering and mathematics.
There are in all five parts to the book, starting in Part I with an introduction to R. This presents the procedures of R needed to summarize and provide graphical displays of statistical data. An introduction to programming in R is also included. Not meant to be a manual, this part is intended only to get the student started. As we progress, more procedures of R are introduced as the need arises.
Part II sets the foundations of probability, and introduces the functions available in R for examining them. R is used not only for calculating probabilities involving unwieldy computations but also for obtaining probabilities through simulation. Probability events and sample spaces are illustrated with the usual gambling experiments, as well as inspection of integrated‐circuit chips, and observation of randomness in computer programming. A discussion of the “Intel Chip Fiasco” leads on to the “balls and bins” problem, which in turn is applied to assigning jobs to processors. It is shown how Bayes' Theorem has important applications in modern‐day computer science such as machine learning and machine translation. Methods to assess reliability of a computer containing many systems, which in turn contain many components, are considered.
Part III deals with discrete random variables. Nearly every chapter opens with a sequence of examples, designed to motivate the detail that follows. Techniques are developed for examining discrete variables by simulation in R. The objective is to empower students to be able to approximate parameters without having sufficient mathematical knowledge to derive them exactly. The Bernoulli, geometric, binomial, hypergeometric and Poisson distributions are each dealt with in a similar fashion, beginning with a set of examples with different parameters and using the graphical facilities in R to examine their distributions. Limiting distributions are exhibited through simulation, and the students use R to obtain rules of thumb to establish when these approximations are valid. R is also used to design single‐ and double‐sampling inspection schemes.
Part IV deals with continuous random variables. The exponential distribution is introduced as the waiting time between Poisson occurrences, and the graphical facilities of R illustrate the models. The Markov memoryless property is simulated using R. Some applications of the exponential distribution are investigated, notably in the areas of reliability and queues. R is used to model response times with varying traffic intensities. We have examined models for server queue lengths without using any of the formulae typical in a traditional approach. The normal distribution and some of its applications are discussed. It is shown how R can be used, both to illustrate limiting distributions and as a set of statistical tables.
Part V addresses the problem of obtaining probability bounds on the runtime of new algorithms when the distribution is unknown. Here Markov and Chebyshev inequalities provide estimates of probability when the information about the random variable is limited.
The exercises and projects at the end of each chapter are an integral part of the exposition. Many of them require the use of R, in some cases to perform routine calculations and in others to conduct experiments on simulated data. The projects are designed to improve the students' understanding of how probability interacts with computing.
This is a self‐contained book with no need for ancillary material, other than, of course, the programming language R. There is a freely downloadable manual (Venables, W.N., Smith, D.M., and the R Development Core Team (2004). An Introduction to R: A Programming Environment for Data Analysis and Graphics, Version 2.6.2).
Students should be encouraged to use this in conjunction with the text. One of the big attractions of R is that it is open source. Most other systems, such as Matlab and Mathematica, require a license; apart from the expense, this makes access more difficult, and the student more likely not to use them.
Jane M. Horgan
Dublin City University 2008
Acknowledgments
The generous contributions of James Power from Maynooth University and Charlie Daly from Dublin City University have much improved this second edition. I am deeply appreciative of their advice and help, and their provision of many relevant examples in areas of computing that have evolved since the first edition was published. They also read much of the new material and supplied valuable feedback. As I write this, we are in a state of shock at the sudden and untimely death of our close friend and colleague James Power. Maynooth University, Computer Science in Ireland and the Academic World generally are greatly diminished by his departure. It is difficult to accept his absence. We had so much more to learn from him.
I owe a huge debt of gratitude to Helen Fallon, Deputy Librarian