Applied Biostatistics for the Health Sciences. Richard J. Rossi

Applied Biostatistics for the Health Sciences - Richard J. Rossi


Скачать книгу
itself. That is, complete information on the target population is required to answer the research question, and because a sample is only a subset of the target population, it can only provide information about the answer. For this reason, statistics is often referred to as “the science of describing populations in the presence of uncertainty.”

      The first thing a biostatistician generally must do is to take the research question and determine a particular set of characteristics of the target population that are related to the research question being studied. A biostatistician then must determine the relevant statistical questions about these population characteristics that will provide answers or the best information about the research questions. A characteristic of the target population that can be summarized numerically is called a parameter. For example, in a study of the body mass index (BMI) of teenagers, the average BMI value for the target population is a parameter, as is the percentage of teenagers having a BMI value less than 25. The parameters of the target population are based on the information about the entire population, and hence, their values will be unknown to the researcher.

      Once a representative sample is obtained, any quantity computed from the information in the sample and known values is called statistic. Thus, because any estimate of the unknown parameters will be based only on the information in the sample, the estimates are also statistics. Statements made by extrapolating from the sample information (i.e., statistics) about the parameters of the population are called statistical inferences, and good statistical inferences will be based on sound statistical and scientific reasoning. Thus, the statistical methods used by a biostatistician for making inferences need to be based on sound statistical and scientific reasoning. Furthermore, statistical inferences are meaningful only when they are based on data that are truly representative of the target population. Statistics that are computed from a sample are often used for estimating the unknown values of the parameters of interest, for testing claims about the unknown parameters, and for modeling the unknown parameters.

      1.2.1 The Basic Biostatistical Terminology

      In developing the statistical protocol to be used in a research study, biostatisticians use the basic terminology listed below.

       The target population is the population that is being studied in the research project.

       The units of a target population are the objects on which the measurements will be taken. When the units of the population are human beings, they are referred to as subjects or individuals.

       A subpopulation of the target population is a well-defined subset of the population units.

       A parameter is a numerical measure of a characteristic of the target population.

       A sample is a subset of the target population units. A census is sample consisting of the entire set of population units.

       The sample size is the number of units observed in the sample.

       A random sample is a sample that is chosen according to a sampling plan where the probability of each possible sample that can be drawn from the target population is known.

       A statistic is any value that is computed using only the sample observations and known values.

       A cohort is a group of subjects having similar characteristics.

       A variable is a characteristic that will be recorded or measured on a unit in the target population.

       A response variable or outcome variable is the variable in a research study that is of primary interest or the variable that is being modeled. The response variable is also sometimes called the dependent variable.

       An explanatory variable is a variable that is used to explain or is believed to cause changes in the response variable. The explanatory variables are also called independent variables or predictor variables.

       A treatment is any experimental condition that is applied to the units.

       A placebo is an inert or inactive treatment that is applied to the units.

       A statistical inference is an estimate, conclusion, or generalization made about the target population from the information contained in an observed sample.

       A statistical model is a mathematical formula that relates the response variable to the explanatory variables.

      One of the most misunderstood and abused concepts in statistics is the difference between a parameter and a statistic, and researchers who do not have a basic understanding of statistics often use these terms interchangeably, which is incorrect. Whether a number is a parameter or a statistic is determined by asking whether or not the number was computed from the entire set of units in the target population (parameter) or from a sample of the units in the target population (statistic). It is important to distinguish whether a number is a parameter or a statistic because a parameter will provide the answer to a statistical research question, while a statistic can provide information only regarding the answer, and there is a degree of uncertainty associated with the information contained in a statistic.

      Example 1.1

      In a study designed to determine the percentage of obese adults in the United States, the BMI of 500 adults was measured at several hospitals across the country. The resulting percentage of the 500 adults classified as obese was 24%.

      In this study, the target population was adults in the United States, 500 adults constitute a sample of the adults in the United States, the parameter of interest is the percentage of obese adults in the United States, and 24% is a statistic since it was computed from the sample, not the target population.

      In designing a biomedical research study, the statistical protocol used in the study is usually determined by the research team in conjunction with the biostatistician. The statistical protocol should include the identification of the target population, the units in the population, the response variable and explanatory variables, the parameters of interest, the treatments or subpopulations being studied, the sample size, and models that will be fit to the observed data.

      Example 1.2

      In a study investigating the average survival time for stage IV melanoma patients receiving two different doses of interferon, n = 150 patients will be monitored. The age, sex, race, and tumor thickness of each patient will be recorded along with the time they survived after being diagnosed with stage IV melanoma. For this study, determine the following components of the statistical protocol:

      1 a. the target population,

      2 b. the units of target population,

      3 c. the response variable,

      4 d. the explanatory variables,

      5 e. the parameter of interest,

      6 f. the treatments,

      7 g. the sample size.

       Solutions

      1 a. The target population in this study is individuals diagnosed with stage IV melanoma.

      2 b. Units of the target population are the individuals diagnosed with stage IV melanoma.

      3 c.


Скачать книгу