Biostatistics Decoded. A. Gouveia Oliveira
it is not so widely recognized that biostatistics is of critical importance in the decision‐making process. Clinical practice is largely involved in taking actions to prevent, correct, remedy, or cure diseases. But before each action is taken, a decision must be made as to whether an action is required and which action will benefit the patient most. This is, of course, the most difficult part of clinical practice, simply because people can make decisions about alternative actions only if they can predict the likely outcome of each action. In other words, to be able to make decisions about the care of a patient, a clinician needs to be able to predict the future, and it is precisely here that resides the central role of biostatistics in decision making.
Actually, biostatistics can be thought of as the science that allows us to predict the future. How is this magic accomplished? Simply by considering that, for any given individual, the expectation is that his or her features and outcomes are the same, on average, as those of the population to which the individual belongs. Therefore, once we know the average features of a given population, we are able to make a reasonable prediction of the features of each individual belonging to that population.
Let us take a further look at how biostatistics allows us to predict the future using, as an example, personal data from a nationwide survey of some 45 000 people in the population. The survey estimated that 27% of the population suffers from chronic venous insufficiency (CVI) of the lower limbs. With this information we can predict, for each member of the population, knowing nothing else, that such a person has a 27% chance of suffering from CVI. We can further refine our prediction about that person if we know more about the population. Figure 1.1 shows the prevalence of CVI by sex and by age group. With this information we can predict, for example, for a 30‐year‐old woman, that she has a 40% chance of having CVI and that in, say, 30 years she will have a 60% chance of suffering from CVI.
Figure 1.1 Using statistics for predictions. Age‐ and sex‐specific prevalence rates of chronic venous insufficiency.
Therefore, the key to prediction is to know about the characteristics of individuals and of disease and treatment outcomes in the population. So we need to study, measure, and evaluate populations. However, this is not easily accomplished. The problem is that, in practice, most populations of interest to biomedical research have no material existence. Patient populations are very dynamic entities. For example, the populations of patients with acute myocardial infarction, with flu, or with bacterial pneumonia are changing at every instant, because new cases are entering the population all the time, while patients resolving the episode or dying from it are leaving the population. Therefore, at any given instant there is one population of patients, but in practice there is no possible way to identify and evaluate each and every member of the population. Populations have no actual physical existence, they are only conceptual.
So, if we cannot study the whole population, what can we do? Well, the most we can do is to study, measure, and evaluate a sample of the population. We may then use the observations we made in the sample to estimate what the population is like. This is what biostatistics is about, sampling. Biostatistics studies the sampling process and the phenomena associated with sampling, and by doing so it gives us a method for studying populations which are immaterial. Knowledge of the features and outcomes of a conceptual population allows us to predict the features and future behavior of an individual known to belong to that population, making it possible for the health professional to make informed decisions.
Biostatistics is involved not only in helping to build knowledge and to make individual predictions, but also in measurement. Material things have weight and volume and are usually measured with laboratory equipment, but what about things that we know to exist which have no weight, no volume, and cannot be seen? Like pain, for example. One important area of research in biostatistics is on methods for the development and evaluation of instruments to measure virtually anything we can think of. This includes not just things that we know to exist but are not directly observable, like pain or anxiety, but also things that are only conceptual and have no real existence in the physical world, such as quality of life or beliefs about medications.
In summary, biostatistics not only gives an enormous contribution to increase our knowledge in the biosciences, it also provides us with methods that allow us to measure things that may not even exist in the physical world, in populations that are only conceptual, in order to enable us to predict the future and to make the best decisions.
This dual role of biostatistics has correspondence with its application in clinical research and in basic science research. In the former, the main purpose of biostatistics is to determine the characteristics of defined populations and the main concern is in obtaining correct values of those characteristics. In basic science, biostatistics is mainly used to take into account the measurement error, through the analysis of the variability of replicate measurements, and to control the effect of factors that may influence measurement error.
1.2 Scales of Measurement
Biostatistical methods require that everything is measured. It is of great importance to select and identify the scale used for the measurement of each study variable, or attribute, because the scale determines the statistical methods that will be used for the analysis. There are only four scales of measurement.
The simplest scale is the binary scale, which has only two values. Patient sex (female, male) is an example of an attribute measured in a binary scale. Everything that has a yes/no answer (e.g. obesity, previous myocardial infarction, family history of hypertension, etc.) was measured in a binary scale. Very often the values of a binary scale are not numbers but terms, and this is why the binary scale is also a nominal scale. However, the values of any binary attribute can readily be converted to 0 and 1. For example, the attribute sex, with values female and male, can be converted to the attribute female sex with values 0 meaning no and 1 meaning yes.
Next in complexity is the categorical scale. This is simply a nominal scale with more than two values. In common with the binary scale, the values in the categorical scale are usually terms, not numbers, and the order of those terms is arbitrary: the first term in the list of values is not necessarily smaller than the second. Arithmetic operations with categorical scales are meaningless, even if the values are numeric. Examples of attributes measured on a categorical scale are profession, ethnicity, and blood type.
It is important to note that in a given person an attribute can have only a single value. However, sometimes we see categorical attributes that seem to take several values for the same person. Consider, for example, an attribute called cardiovascular risk factors with values arterial hypertension, hypercholesterolemia, diabetes mellitus, obesity, and tabagism. Obviously, a person can have more than one risk factor and this attribute is called a multi‐valued attribute. This attribute, however, is just a compact presentation of a set of related attributes grouped under a heading, which is commonly used in data forms. For analysis, these attributes must be converted into binary attributes. In the example, cardiovascular risk factors is the heading, while arterial hypertension, hypercholesterolemia, diabetes mellitus, obesity, and tabagism are binary variables that take the values 0 and 1.
When values can be ordered, we have an ordinal scale. An ordinal scale may have any number of values, the values may be terms or numbers, and the values must have a natural order. An example of an ordinal scale is the staging of a tumor (stage I, II, III, IV). There is a natural order of the values, since stage II is more invasive than stage I and less than stage III. However, one cannot say that the difference, either biological or clinical, between stage I and stage II is larger or smaller than the difference between stage II and stage III. In ordinal scales, arithmetic operations are meaningless.