An Introduction to Text Mining. Gabe Ignatow
handle research misconduct, case summaries involving research misconduct that make for sobering reading, and free tools for detecting plagiarism.
The Davis–Madsen ethics scenarios on the Ethicist Blog from the Academy of Management: “Ethics in Research Scenarios: What Would YOU Do?” (http://ethicist.aom.org/2013/02/ethics-in-research-scenarios-what-would-you-do)
Developing a Research Proposal
Consider the ethical dimensions of the research proposal or proposals you are developing. Does it make use of human subjects? Is the data in the public or private domain? And does your data contain information that can be used to identify individual research participants?
Further Reading
Israel, M. (2014). Research ethics and integrity for social scientists: Beyond regulatory compliance. Thousand Oaks, CA: Sage.
O’Leary, Z. (2014). The essential guide to doing your research project. Thousand Oaks, CA: Sage.
4 The Philosophy and Logic of Text Mining
Learning Objectives
The goals of Chapter 4 are to help you to do the following:
1 Define major philosophy of social science concepts that are relevant to the practice of text mining.
2 Recognize the interdependence of philosophical assumptions and decisions about methodology.
3 Summarize what is meant by the “two cultures” in academia.
4 Position your own research project in terms of debates over positivism and postpositivism.
Introduction
You may be tempted to skim over or even skip this chapter entirely, and it is certainly possible to make use of the more technical later chapters of this textbook without giving much thought to epistemology, ontology, metatheory, or inferential logic. But if you are in the early stages of a text mining research project, you would do well to read this chapter carefully. As we discussed in the Preface to this textbook, just as the foundations of a house must be properly designed and built if the house is to last, the philosophical foundations of your research project should be as solidly constructed as possible. Text mining research often involves making strong inferences about groups of people based on the texts they produce. Researchers working with these tools frequently claim to know something about the language people use that those same people do not themselves know; justifying such claims is not a simple matter. Several academic fields are relevant to questions about when researchers are justified in using digital texts to make inferences about social groups. These fields include the philosophy of science (Curd, Cover, & Pincock, 2013), the philosophy of technology (Kaplan, 2009), and science and technology studies (STS; Kleinman & Moore, 2014).
A historical example may be in order. Like text mining technologies today, a century ago the lie detector (polygraph machine) was a revolutionary technology with social implications that could not have been predicted. As with text mining technologies, it was claimed that lie detector technology would allow scientists to extend their powers of perception and even know what people were thinking. As a lie detector can potentially reveal things about individuals that they themselves do not know or would prefer not to reveal, text mining tools can potentially reveal what members of a group or community are thinking and feeling. But is it true that lie detectors can reveal whether people are attempting to deceive? What do data produced by lie detectors mean? How should these data be used? Lie detector technology itself does not provide answers to these questions. Instead, it took decades for individual scientists and scientific, legal, and criminal justice institutions to sort out what lie detectors can and cannot accomplish and how the data they produce could be used ethically (see Alder, 2007; Bunn, 2012). And even today there is often disagreement about the results of polygraph tests. In the same way, scientific institutions and public and private sector organizations are still in the early stages of sorting out what kinds of conclusions can be drawn from text mining research. This sorting-out process involves technical discussions but also philosophical discussions about knowledge, facts, and language.
The philosophy of social science is one of the main fields in which researchers debate how socially sensitive research technologies such as polygraphs and text mining tools can and should be used. The philosophy of social science is an academic research area that lies at the intersection of philosophy and contemporary social science. Philosophers of social science develop and critique concepts that are foundational to the practice of social science research (Howell, 2013). They critically analyze epistemological assumptions in social research, which are assumptions about the nature of knowledge. They also analyze ontological assumptions, which are assumptions about the nature of reality, and metatheoretical assumptions, which are assumptions about the capacities and limitations of scientific theories. Social scientists often make claims about the validity and generalizability of their findings, the adequacy of their research designs, and why one theory is superior to another. Such claims are grounded in epistemological, ontological, and metatheoretical positions that are generally implicit (Woodwell, 2014). The philosophy of social science allows us to bring these positions to light and to help us understand why different approaches to social science research can, or cannot, make use of each other’s findings. In this section we briefly review what we have found to be the most critical philosophical issues that arise in text mining research, and we discuss some of the practical implications of different philosophical positions.
Ontological and Epistemological Positions
When are we justified in reaching a conclusion about some person or group of people based on the texts they produce? Does text mining research produce findings that are merely interesting, or can it produce findings that are true and accurate reflections of reality?
Every approach to social science research addresses these kinds of questions based on one or another philosophical position. But the philosophical foundations of text mining research are uniquely unsettled because text mining methods are, for the most part, “mixed methods” (Creswell, 2014; Teddlie & Tashakkori, 2008) that are positioned at the intersection of the “two cultures” of the sciences and the humanities (Snow, 1959/2013). The “two cultures” was part of a 1959 lecture and subsequent book by the British novelist and scientist Snow. Snow was referring to the loss in Western society of a common culture as a result of the division between the sciences and humanities, a division that he saw as an impediment to solving social problems.
Although the idea of two cultures may seem simplistic, within the social sciences there continues to be a divide between more scientific and more humanistic forms of knowledge. These are sometimes referred to as idiographic and nomothetic knowledge (see Chapter 5), although social scientists themselves more often refer to scientific positivism and postpositivism. Positivism is a paradigm of inquiry that prioritizes quantification, hypothesis testing, and statistical analysis; postpositivism is a more interpretive paradigm that values close reading and multiple interpretations of texts. In practice, text mining and text analysis research is usually performed as a pragmatic combination of these two paradigms. Because positivism and postpositivism are premised on different epistemological and ontological positions, they often produce research findings that are “incommensurable,” meaning that they cannot build upon one another. Positivism and postpositivism are based on epistemological and ontological orientations that can be sorted into the following five philosophical positions (Howell, 2013).
Correspondence Theory
The first philosophical position that provides a foundation for social research, correspondence theory, is a traditional model of knowledge and truth associated with scientific positivism.