Evolution by the Numbers. James Wynn
paper has been considerably revised, and much new material has been added.
Foreword
Variation, Evolution, Heredity, and Mathematics in the 21st Century
All three non-adaptive forces of evolution—mutation, recombination, and random drift—are stochastic… and can generally only be understood in probabilistic terms. It is well-known that most biologists abhor things mathematical, but the quantitative details really do matter.
—Michael Lynch
In the twenty-first century, the concepts of variation, evolution, and heredity have influenced science, technology, and the public imagination in ways that could never have been imagined by their developers. Evolution has been embraced by the public to explain complex transformations in everything from organisms to economies, and in the process, has divided the public spheres on issues as diverse as religion, science, and public policy. Variation and heredity have similarly become part of our modern social and cultural awareness. As our capabilities to modify genes in plants and animals grow, so do the difficulties of our deliberations over whether and to what extent we should bioengineer our way to a better world.
Though we easily recognize how these ideas influence our social and cultural landscape, most of us rarely consider how they transform science. This task falls to historians, philosophers, and sociologist of science. However, even scholars in these fields have not considered all of the consequences of these notions. One of the effects that has not been explored is the impact of these ideas on the development of argument in the biological sciences. This book examines how the concepts of variation, evolution, and heredity, introduced by Charles Darwin and Gregor Mendel, transformed argument in the biological sciences by encouraging the growth of mathematical argumentation.
Mathematics and Modern Investigations of Variation, Evolution, and Heredity
Unlike scientific ideas, which regularly filter into the public’s awareness, mathematical aspects of scientific argument tend to develop quietly and anonymously. Despite their low profile, they are ubiquitous and in modern investigations of variation, evolution, and heredity. By going behind the scenes of current research in these fields, it is possible to illustrate just how important they are.
The extent to which these research fields have come to rely on mathematics is nothing short of extraordinary. At the dawn of the twentieth century, very few researchers investigating these phenomena would have been employing mathematical methods or arguments. However, in the twenty-first century, these methods pervade their work. This pervasiveness is evidenced by the spectacular growth in the last fifty years of mathematical fields of study related to these phenomena, including: population genetics, molecular genetics, biostatistics, bioinformatics, computational biology, and quantitative genetics. The ubiquity of mathematics is also evident in the range of subjects that are being examined using quantitative methods. According to Alan Templeton, a professor of population genetics at Washington University in St. Louis, mathematical models are currently being used in a variety of research areas, including “wildlife conservation projects, research assessing what it means to be human, and investigations tracking the historical development of disease.”
One publicly salient application of mathematics to the study of variation, evolution, and heredity has been the use of DNA to track the historical migrations of human populations as they spread out of Africa. This subject has been the focus of attention in a number of works in the popular media, including books such as Steven Olson’s Mapping Human History and Brian Sykes’s The Seven Daughters of Eve, websites like Wikipedia’s “Human Evolutionary Genetics,” and televised specials like PBS’s Journey of Man. In all of these media, however, the role of mathematics in the science is invisible. Closer scrutiny of these popularizations, though, offers a sense of the true extent to which mathematics contributes to the science that captures the public’s imagination.
In the television documentary Journey of Man, for example, English geneticist Dr. Spencer Wells travels the world tracing the hereditary path of our human ancestry by following the physical route by which it migrated out of Africa. As is the case for most popularizations of science, the main focus of the documentary is on the human story. Though it gets second billing, science does appear throughout the documentary. Before Wells leaves on his trip, for example, he visits his geneticist mentor, Luca Cavalli-Sforza, to talk about the foundations of research into human genetic variation. He also takes breaks in his travels to explain key scientific points, such as what scientists know about gene change over time and how it helps them establish relationships between modern humans and their progenitors.
However, the mathematical work that makes identifying these relationships possible receives only the briefest of acknowledgements. Interspersed throughout the documentary are visual images of peaked line graphs on computer screens. In addition, Wells makes brief reference to “the clear data” that has sent him to Africa in search of the Kalahari bushman whose genetic heritage represents the starting point of the human journey of geographic expansion and genetic diversification. The obliqueness and briefness with which the documentary treats the contributions of mathematical argument creates the impression that it played almost no substantive role in understanding the spread of our ancestors. In reality, however, Wells’s trip would not have had a scientifically supportable itinerary without quantitative data and mathematical methods for managing, comparing, and analyzing that data.
For example, to establish the chain of genetic ancestry from fixed mutations in the Y chromosome—which Wells uses as the scientific basis for his travels—thousands of blood samples taken in the field would first have to be processed. In the initial phases, chemical and physical procedures would be used to extract and precipitate DNA. Once the DNA had been extracted, it would be “unzipped,” bonded to other known bits of DNA, and run through a process of electrophoresis where it would be separated out and read by a laser.
Once the DNA was scanned and identified, mathematics would take on its essential role in the science. The information, read by laser from the DNA, would be stored in a database whose architecture would not be possible without the use of complex mathematical algorithms. Then this information would be compared to other samples in large databases, again using sophisticated algorithms. To determine the general relatedness of the sample of DNA to a population group, researchers would apply formulae to calculate the probability of the DNA’s belonging to a particular group based on the absence or presence of certain genetic markers. Finally, scientists would establish the place, say for a Kalahari bushman’s Y chromosome, in the larger sequence of genetic diversification amongst human population with another set of formulae. These formulae would be used to detect the absence or presence of key mutations in the bushman genome and to compare them to the mutations present or absent in other human populations.
Because of long-term efforts to gather genetic data and improve the speed of its analysis, scientists now have more information relevant to investigating variation, evolution, and heredity than ever before. The extent of this data and the type of inquiries it supports means that research such as the kind popularized by Wells cannot be conducted without mathematics. Its necessity is evidenced by the emergence and coalescence of a number of mathematical subfields in modern biology—such as bioinformatics, molecular genetics, population genetics, biostatistics, and statistical genetics—dedicated to meeting the needs of a quantitative science (Templeton). Researchers in bioinformatics, for example, devote their efforts to developing databases, algorithms, and statistical and computational techniques for analyzing and managing massive data sets. With the complete sequencing of the human genome and other important organisms, the amount of genetic data that needs to be organized and synthesized has grown. The human genome, for example, has between twenty and twenty-five thousand genes and other functional elements, with an estimated three billion base pairs. To collect this data set, the institutions working on the Human Genome Project collaboratively sequenced genes for fifteen years. Computer scientists in bioinformatics employ their mathematical skills to develop more powerful algorithms for ensuring that data on this scale can be properly stored and retrieved for scientific research.
Whereas some biomathematical researchers devote their talents to managing data, others use their mathematical skills to develop formulae to pose and solve important questions about variation,