Population Genetics. Matthew B. Hamilton
id="ulink_a1c0c658-09ad-5787-901c-dbd405c5188c">In actual populations, a parameter has a true value. For the allele frequency p, knowing this true value would require examining the genotype of every individual and counting all A and a alleles to determine their frequency in the population. This task is impractical or impossible in most cases. Instead, we rely on an estimate of allele frequency, , obtained from a sample of individuals from the population. Sampling leads to some uncertainty in parameter estimates because repeating the sampling and parameter estimate process would likely lead to a somewhat different parameter estimate each time. Quantifying this uncertainty is important to determine whether repeated sampling might change a parameter estimate by just a little or change it by a lot. When dealing with parameters, we might expect that p + q = 1 exactly if there are only two alleles with allele frequencies p and q. However, if we are dealing with estimates, we might say the two allele frequency estimates should sum to approximately one ( + ≈ 1) since each allele frequency is estimated with some errors. The more uncertain the estimates of and , the less we should be surprised to find that their sum does not equal the expected value of one.
Parameter: A variable or constant appearing in a mathematical expression; a value (usually unknown) used to represent a certain population characteristic; any factor that defines a system and determines or limits its performance.
Estimate: An indication of the value of an unknown quantity based on observed data; an approximation of a true score, parameter, or value; a statistical estimate of the value of a parameter.
It could be said that statistics sits at the intersection of theoretical and empirical population genetics. Parameters and parameter estimates are fundamentally different things. Estimation requires effort to understand sampling variation and quantify sources of error and bias in samples and estimates. The distinction between parameters and estimates is critical when comparing actual populations with expectations to test hypotheses. When large, random samples can be taken, estimates are likely to have minimal errors. However, there are many cases where estimates have a great deal of uncertainty, which limits the ability to evaluate expectations. There are also instances where very different processes may produce very similar expected results. In such cases, it may be difficult or impossible to distinguish the different potential causes of a pattern due to the approximate nature of estimates. While this book focuses mostly on parameters, it is useful to bear in mind that testing or comparing expectations requires the use of parameter estimates and statistics that quantify sampling error. The Appendix provides a review of some basic statistics that are used in the text.
Inductive and deductive reasoning
Population genetics employs both inductive and deductive reasoning in an effort to understand the biological processes operating in actual populations as well as to elucidate the general processes that cause population genetic phenomena. The inductive approach to population genetics involves assembling measures of genetic variation (parameter estimates) from various populations to build up evidence that can be used to identify the underlying processes that produced the observed patterns. This approach is logically identical to that used by Isaac Newton, who used knowledge of how objects fall to the surface of the Earth as well as knowledge of the movement of planets to arrive at the general principles of gravity. Application of inductive reasoning requires detailed familiarity with the various empirical data types in population genetics, such as DNA sequences, along with the results of studies that report observed patterns of genetic variation. From this accumulated empirical information, it is then possible to draw more general conclusions about the qualities and quantities of genetic variation in populations. Model organisms like D. melanogaster and Arabidopsis thaliana play a large role in population genetic conclusions reached by inductive reasoning. Because model organisms receive a large amount of scientific effort, for example, to completely sequence and annotate their genomes, a great deal of available genetic data are accumulated for these species. Based on this evidence, many inferences have been made about population genetic processes. Although model organisms are very rich sources of empirical information, the number of species is limited by definition so that any generalizations may not apply universally to all species.
Deductive reasoning: Using general principles to reach conclusions about specific instances.
Inductive reasoning: Utilizing the knowledge of specific instances or cases to arrive at general principles.
The study of population genetics can also be approached using deductive reasoning. The actions of general processes such as genetic drift, mutation, and natural selection are represented by parameters in the mathematical equations that make up population genetic models. These models can then be used to make predictions about the quantity of genetic variation and patterns of genetic variation in space and time. Such population genetic models make general predictions about things like rates of change in allele frequency, the eventual equilibrium of allele or genotype frequencies, and the net outcome of several processes operating at the same time. These predictions are very general in that they apply to any population of any species since the predictions arose from general principles in the first place. At the same time, such general predictions may not be directly applicable to a specific population because the general principles and assumptions used to make the prediction are not specific enough to match an actual population.
Historically, the field of population genetics has developed from an interplay between arguments and evidence developed using both inductive and deductive reasoning approaches. Nonetheless, most of the major ideas in population genetics can be first approached with deductive reasoning by learning and understanding the expectations that arise from the principles of Mendelian heredity. This book stresses on the process of deductive reasoning to arrive at these fundamental predictions. Empirical evidence related to expectations is included to illustrate predictions and to demonstrate hypothesis tests that result from expectations. Because the body of empirical results in population genetics is very large, readers should resist the temptation to generalize too much from the limited number of empirical studies that are presented. Detailed reviews of particular areas of population genetics, many of which are cited, are a better source for comprehensive summaries of empirical studies.
In the next chapter, we will start by building expectations for the frequencies of diploid genotypes based on the foundation of particulate inheritance: that alleles are passed unaltered from parents to offspring. There is ample support for particulate inheritance from both molecular biology, which identifies DNA as the hereditary molecule, and from allele and genotype frequencies that can be observed in actual populations. The general principle of particulate inheritance has been used to formulate a wide array of expectations about allele and genotype frequencies in populations.
1.2 Theory and assumptions
What Is a Theory and What Are Assumptions?
How Can Theories Be Useful with So Many Assumptions?
In colloquial usage, the word theory refers to something that is known with uncertainty, or a quantity that is approximate. On a day you are running late leaving work, you might say, “In theory, I am supposed to depart at 6:00 pm.” In science, theory has a very different meaning. Theory is the accumulation of expectations and observations that have withstood tests and critical scrutiny and are accepted by at least some practitioners of a scientific field. Theory is the collection of all of the expectations developed for specific cases or individual biological processes that together form a more comprehensive set of general principles. The combination