Applied Biostatistics for the Health Sciences. Richard J. Rossi
Figure 2.24 Three binomial distributions: (a) n=25,p=0.1; (b) n=25,p=0.5; (c) n=25,p=0.9.
Because the computations for the probabilities associated with a binomial random variable are tedious, it is best to use a statistical computing package such as MINITAB for computing binomial probabilities.
Example 2.31
Hair loss is a common side effect of chemotherapy. Suppose that there is an 80% chance that an individual will lose their hair during or after receiving chemotherapy. Let X be the number of individuals who retain their hair during or after receiving chemotherapy. If 10 individuals are selected at random, use the MINITAB output given in Table 2.10 to determine
Table 2.10 The Binomial Distribution for n= 10 Trials and p = 0.20
Binomial with n = 10 and p = 0.2
|
|
---|---|
x
|
P( X = x )
|
0
|
0.107374
|
1
|
0.268435
|
2
|
0.301990
|
3
|
0.201327
|
4
|
0.088080
|
5
|
0.026424
|
6
|
0.005505
|
7
|
0.000786
|
8
|
0.000074
|
9
|
0.000004
|
10
|
0.000000
|
1 the probability that exactly seven will retain their hair (i.e., X = 7),
2 the probability that between four and eight (inclusive) will retain their hair (i.e., 4≤X≤8),
3 the probability that at most three will retain their hair (i.e., X≤3),
4 the probability that at least six will retain their hair (i.e., X≥6),
5 the most likely number of patients to retain their hair (i.e., the mode).
Solutions
Based on the MINITAB output in Table 2.10, the probability that
1 exactly seven will retain their hair (i.e., X = 7) is
2 between four and eight (inclusive) will retain their hair (i.e., 4≤X≤8) is
3 at most three will retain their hair (i.e., X≤3) is
4 at least six will retain their hair (i.e., X≥6) is
5 the most likely number of patients to retain their hair is X = 2.
The mean of a binomial random variable based on n trials and probability of success p is μ=np and the standard deviation is σ=n⋅p⋅(1−p). The mean of a binomial is the expected number of successes in n trials, and the values of a binomial random variable are concentrated near its mean. The standard deviation measures the spread about the mean and is largest when p = 0.5; as p moves away from 0.5 toward 0 or 1, the variability of a binomial random variable decreases. Furthermore, when np and n(1−p) are both greater than 5, the apply and
roughly 68% of the binomial distribution lies between the values closest to the np−n⋅p⋅(1−p) and np+n⋅p⋅(1−p),
roughly 95% of the binomial distribution lies between the values closest to np−2n⋅p⋅(1−p) and np+2n⋅p⋅(1−p),
roughly 99% of the binomial distribution lies between the values closest to np−3n⋅p⋅(1−p) and np+3n⋅p⋅(1−p).
Example 2.32
Suppose the relapse rate within 3 months of treatment at a drug rehabilitation clinic is known to be 40%. If the clinic has 25 patients, then the mean number of patients to relapse within 3 months is μ=25⋅0.40=10 and the standard deviation is σ=25⋅0.40⋅(1−0.40)=2.45. Now, since np=25(0.4)=10 and n(1−p)=25(0.6)=15, by applying the Empirical Rules roughly 95% of the time between 5 and 15 patients will relapse within 3 months of treatment. Using MINITAB, the actual percentage of a binomial distribution with n = 25 and p = 0.40 falling between 5 and 15 is 98%.
An important restriction in the setting for a binomial random variable is that the probability of success remains constant over the n trials. In many biomedical studies, the probability of success will be different for each individual in the experiment because the individuals are different. For example, in a study of the survival of patients having suffered heart attacks, the probability of survival will be influenced by many factors including severity of heart attack, delay in treatment, age, and ability to change diet and lifestyle following a heart attack. Because each individual is different, the probability of survival is not going to be constant over the n individuals in the study, and hence, the binomial probability model does not apply.
2.4.2 The Normal Probability Model
The choice of a probability model for continuous variables is generally based on historical data rather than a particular set of conditions. Just as there are many discrete probability models, there are also many different probability models that can be used to model the distribution of a continuous variable. The most commonly used continuous probability model in statistics is the normal probability model.
The normal probability model is often used to model distributions that are expected to be unimodal and symmetric, and the normal probability model forms the foundation for many of the classical statistical methods used in biostatistics. Moreover, the distribution of many natural phenomena can be modeled very well with the normal distribution. For example, the weights, heights, and IQs of adults are often modeled with normal distributions.
Several properties of a normal distribution are listed below.
PROPERTIES OF A NORMAL DISTRIBUTION
A normal distribution
is a bell- or mound-shaped distribution.
is completely characterized by its