Biostatistics Decoded. A. Gouveia Oliveira

Biostatistics Decoded - A. Gouveia Oliveira


Скачать книгу
the population mean has value close to the value of the sample mean. Those are the reasons why we cannot assume that the value of the population mean is the same as the sample mean. So an important conclusion is that one must never, ever draw conclusions about a population based on the value of sample means. Sample means only describe the sample, never the population.

      Actually, if we went around taking some kind of interval‐based measurements (e.g. length, weight, concentration) from samples of any type of biological materials and plotted them in a histogram, we would find this shape almost everywhere. This pattern is so repetitive that it has been compared to familiar shapes, like bells or Napoleon hats.

      In other circumstances, outside the world of mathematics, people would say that we have here some kind of natural phenomenon. It seems as if some law, of physics or whatever, dictates the rules that variation must follow. This would imply that the variation we observe in everyday life is not chaotic in nature, but actually ruled by some universal law. If this were true, and if we knew what that law says, perhaps we could understand why, and especially how, variation appears.

Graph depicts the frequency distributions of some biological variables.

      So, what would be the nature of that law and is it known already? Yes it is, and it is actually very easy to understand how it works. Let us conduct a little experiment to see if we can create something whose values have a bell‐shaped distribution.

      We can, and the result is also presented in Figure 1.24. We simply write down all the possible combinations of values of the four equal variables and see in each case what the value of the fifth variable is. If all four variables have value 1, then the fifth variable will have value 4. If three variables have value 1 and one has value 2, then the fifth variable will have value 5. This may occur in four different ways – either the first variable had the value 2, or the second, or the third, or the fourth. If two variables have the value 1 and two have the value 2, then the sum will be 6, and this may occur in six different ways. If one variable has value 1 and three have value 2, then the result will be 7 and this may occur in four different ways. Finally, if all four variables have value 2, the result will be 8 and this can occur in only one way.

An illustration of the origin of the normal distribution.

      If we repeat the experiment with not two, but a much larger number of variables, the variable that results from adding all those variables will have not just five different values, but many more. Consequently, the graph will be smoother and more bell‐shaped. The same will happen if we add variables taking more than two values.

      If we have a very large number of variables, then the variable resulting from adding those variables will take an infinite number of values and the graph of its probability distribution will be a perfectly smooth curve. This curve is called the normal curve. It is also called the Gaussian curve after the German mathematician Karl Gauss who described it.

      What was presented in the previous section is known as the central limit theorem. This theorem simply states that the sum of a large number of independent variables with identical distribution has a normal distribution. The central limit theorem plays a major role in statistical theory, and the following experiment illustrates how the theorem operates.

      With a computer, we generated random numbers between 0 and 1, obtaining observations from two continuous variables with the same distribution. The variables had a uniform probability distribution, which is a probability distribution where all values occur with exactly the same probability.

Graphs depict the frequency distribution of sums of identical variables with uniform distribution.

      Notice that the more variables we add together, the more the shape of the frequency distribution approaches the normal curve. The fit is already fair for the sum of four variables. This result is a consequence of the central limit theorem.

      The normal distribution has many interesting properties, but we will present just a few of them. They are very simple to understand and, occasionally, we will have to call on them further on in this book.

      First property. The normal curve is a function solely of the mean and the variance. In other words, given only a mean and a variance of a normal distribution, we can find all the values of the distribution and plot its curve using the equation of the normal curve (technically,


Скачать книгу