Applied Biostatistics for the Health Sciences. Richard J. Rossi
correlations to consider are those between the response variable and each of the explanatory variables.
Finally, correlation should not be confused with causation. A causal relationship exists when changing the value of X directly causes a change in the value of Y or vice versa. The correlation coefficient only measures the tendency for the value of Y to increase or decrease linearly with the values of X. Thus, a high correlation between X and Y does not necessarily indicate that changes in X will cause changes in Y. For example, there is a positive correlation between the number of times an individual on a diet weighs themselves in a week and their weight loss. Clearly, the number of times an individual weighs themselves does not cause a change in their weight. Causal relationships must be supported by honest logical and scientific reasoning. With the proper use of scientific principles and well-designed experiments, high correlations can often be used as evidence supporting a causal relationship.
2.3 Probability
In a data-based biomedical study, a random sample will be selected from the target population and a well-designed sampling plan requires knowing the chance of drawing a particular observation or set of observations. For example, it might be important to know the chance of drawing a female individual or an individual between the ages of 30 and 60. In other studies, it might be important to determine the likelihood that a particular genetic trait will be passed from the parents to their offspring.
A probability is a number between 0 and 1 that measures how likely it is for an event to occur. Probabilities are associated with tasks or experiment where the outcome cannot be determined without actually carrying out the task. A task where the outcome cannot be predetermined is called a random experiment or a chance experiment. For example, prior to treatment it cannot be determined whether chemotherapy will improve a cancer patient’s health. Thus, the result of a chemotherapy treatment can be treated as a chance experiment before chemotherapy is started. Similarly, when drawing a random sample from the target population, the actual values of the sample will not be known until the sample is actually collected. Hence, drawing a random sample from the target population is a chance experiment.
Because statistical inferences are based on a sample from the population rather than a census of the population, the statistical inferences will have a degree of uncertainty associated with them. The measures of reliability for statistical inferences drawn from a sample are based on the underlying probabilities associated with the target population.
In a chance experiment, the actual outcome of the experiment cannot be predetermined, but it is important for the experimenter to identify all of the possible outcomes of the experiment before it is carried out. The set of all possible outcomes of a chance experiment is called the sample space and will be denoted by S. A subcollection of the outcomes in the sample space is called an event, and the probability of an event measures how likely the event is. An event is said to occur when a chance experiment is carried out and the chance experiment results in one of the outcomes in the event. For example, in a chance experiment consisting of randomly selecting an adult from a well-defined population, if A is the event that an individual between the ages of 30 and 60 is selected, then the event A will occur if and only if the age of the individual selected is between 30 and 60; if the age of the individual is not between 30 and 60, then the event A will not occur.
Probabilities are often used to determine the most likely outcome of a chance experiment and for assessing how likely it is for an observed data set to support a research hypothesis. The probability of an event A is denoted by P(A), and the probability of an event is always a number between 0 and 1. Probabilities near 0 indicate an event rarely occurs and probabilities near 1 indicate an event is likely to occur. Probabilities are sometimes also expressed in terms of percentages in which case the percentage is simply the probability of the event times 100. When probabilities are expressed in terms of percentages, they will be between 0 and 100%.
Example 2.19
Suppose an individual is to be drawn at random and their blood type is identified. Prior to drawing a blood sample and typing it, an individual’s blood type is unknown, and thus, this can be treated as a chance experiment. The four possible blood types are O, A, B, and AB, and hence, the sample space is S={O,A,B,AB}. Furthermore, according to the American Red Cross, the probabilities of each blood type are
Thus, if a person is drawn at random the probability that the person will have blood type AB is 0.04.
The probabilities associated with a chance experiment and a sample space S must satisfy the following four properties known as the Axioms of Probability.
THE AXIOMS OF PROBABILITY
Probabilities are always greater than or equal to 0. That is, P(A)≥0 for any event A.
The probability of the sample space is 1. That is, P(S)=1, which means when the experiment is carried out it will result in one of the outcomes in S.
The probability of every event is between 0 and 1. That is, 0≤P(A)≤1 for every event A.
When two events have no outcomes in common, then the probability that at least one of the two events occurs is the sum of their probabilities. That is,when A and B have no outcomes in common.
Events that have no outcomes in common are called disjoint events or mutually exclusive events. If A and B are disjoint events, then the probability of the event “A and B” is 0. That is, it is impossible for the events to occur simultaneously.
2.3.1 Basic Probability Rules
Determining the probabilities associated with complex real-life events often requires a great deal of information and an extensive scientific understanding of the structure of the chance experiment being studied. In fact, even when the sample space and event are easily identified, the determination of the probability of an event can be an extremely difficult task. For example, in studying the side effects of a drug, the possible side effects can generally be anticipated and the sample space will be known. However, because humans react differently to drugs, the probabilities of the occurrence of the side effects are generally unknown. The probabilities of the side effects are often estimated in clinical trials.
The following basic probability rules are often useful in determining the probability of an event.
1 When the outcomes of a random experiment are equally likely to occur, the probability of an event A is the number of outcomes in A divided by the number of simple events in S. That is,
2 For every event A, the probability of A is the sum of the probabilities of the outcomes comprising A. That is, when an event A is comprised of the outcomes O1,O2,…,Ok, the probability of the event A is
3 For any two events A and B, the probability that either event A or event B occurs is
4 The probability that the event A does not occur is 1 minus the probability that the event A does occur. That is,
Example 2.20
Table 2.8 gives a breakdown of the pool of 242 volunteers for a university study on rapid eye movement (REM). Use Table 2.8 to determine the probability that