An Introduction to Text Mining. Gabe Ignatow
C and D), including information on each caller’s gender, age, fluency or nonfluency in Swedish as well as the outcome of the call (whether callers were referred to a general practitioner). The researchers found that men, and especially fathers, received more referrals to general practitioners than did women. The most common caller was a woman fluent in Swedish (64%), and the least likely caller was a man nonfluent in Swedish (3%). All in all, 70% of the callers were women. When the calls concerned children, 78% of the callers were female. Based on these results, the researchers concluded that it is important that telenursing not become a “feminine” activity, only suitable for young callers fluent in Swedish. Given the telenurses’ gatekeeping role, there is a risk that differences on this first level of health care could be reproduced throughout the whole health care system.
Analysis of Discourse Positions
Analyzing discourse positions is an approach to text analysis that allows researchers to reconstruct communicative interactions through which texts are produced and in this way gain a better understanding of their meaning from their author’s viewpoint. Discourse positions are understood as typical discursive roles that people adopt in their everyday communication practices, and the analysis of discourse positions is a way of linking texts to the social spaces in which they have emerged. An example of contemporary discourse position research is Bamberg’s (2004) study of the “small stories” told by adolescents and postadolescents about their identities. Bamberg’s 2004 study is informed by theories of human development and of narrative (see Chapter 10). His texts are excerpts of transcriptions from a group discussion among five 15-year-old boys telling a story about a female student they all know. The group discussion was conducted in the presence of an adult moderator, but the data were collected as part of a larger project in which Bamberg and his colleagues collected journal entries and transcribed oral accounts from 10-, 12-, and 15-year-old boys in one-on-one interviews and group discussions. Although the interviews and groups discussions were open-ended, they all focused on the same list of topics, including friends and friendships, girls, the boys’ feelings and sense of self, and their ideas about adulthood and future orientation. Bamberg and his team analyzed the transcripts line by line, coding instances of the boys positioning themselves relative to each other and to characters in their stories.
Edley and Wetherell’s (1997, 2001; Wetherell & Edley, 1999) studies of masculine identity formation are similar to Bamberg’s study in that they also focus on stories people tell themselves and others in ordinary everyday conversations. Edley and Wetherell studied a corpus of men’s talk on feminism and feminists to identify patterns and regularities in their accounts of feminism and in the organization of their rhetoric. Their samples of men included a sample of white, middle-class 17- to 18-year-old school students and a sample of 60 interviews with a more diverse sample of older men aged 20 to 64. The researchers identified two “interpretative repertoires of feminism and feminists,” which set up a “Jekyll and Hyde” binary and “positioned feminism along with feminists very differently as reasonable versus extreme” (Edley & Wetherell, 2001, p. 439).
In the end, analysis of discourse positions is for the most part a qualitative approach to text analysis that relies almost entirely on human interpretation of texts (see Hewson, 2014). Appendix D includes a list of contemporary qualitative data analysis software (QDAS) packages that can be used to organize and code the kinds of text corpora analyzed by Bamberg, Edley, Wetherell, and other researchers working in this tradition.
Critical Discourse Analysis
CDA involves seeking the presence of features from other discourses in the text or discourse to be analyzed. CDA is based on Fairclough’s (1995) concept of “intertextuality,” which is the idea that people appropriate from discourses circulating in their social space whenever they speak or write. In CDA, ordinary everyday speaking and writing are understood to involve selecting and combining elements from dominant discourses.
While the term discourse generally refers to all practices of writing and talking, in CDA discourses are understood as ways of writing and talking that “rule out” and “rule in” ways of constructing knowledge about topics. In other words, discourses “do not just describe things; they do things” (Potter & Wetherell, 1987, p. 6) through the way they make sense of the world for its inhabitants (Fairclough, 1992; van Dijk, 1993).
Discourses cannot be studied directly but can be explored by examining the texts that constitute them (Fairclough, 1992; Parker, 1992). In this way, texts can be analyzed as fragments of discourses that reflect and project ideological domination by powerful groups in society. But texts can also be considered a potential mechanism of liberation when they are produced by the critical analyst who reveals mechanisms of ideological domination in them in an attempt to overcome or eliminate them.
Although CDA has generally employed strictly interpretive methods, use of quantitative and statistical techniques is not a novel practice (Krishnamurthy, 1996; Stubbs, 1994), and the use of software to create, manage, and analyze large collections of texts appears to be increasingly popular (Baker et al., 2008; Koller & Mautner, 2004; O’Halloran & Coffin, 2004).
A 2014 study by Bednarek and Caple exemplifies the use of statistical techniques in CDA. Bednarek and Caple introduced the concept of “news values” to CDA of news media and illustrated their approach with two case studies using the same collection of British news discourse. Their texts included 100 news stories (about 70,000 words total) from 2003 covering 10 topics from 10 different national newspapers, including five quality papers and five tabloids. The analysis proceeded through analysis of word frequency of the top 100 most frequently used words and two-word clusters (bigrams), focusing on words that represent news values such as eliteness, superlativeness, proximity, negativity, timeliness, personalization, and novelty. The authors concluded that their case studies demonstrated that corpus linguistic techniques (see Appendix F) can identify discursive devices that are repeatedly used in news discourse to construct and perpetuate an ideology of newsworthiness.
In another CDA study, Baker and his colleagues (2008) analyzed a 140-million-word corpus of British news articles about refugees, asylum seekers, immigrants, and migrants. They used collocation and concordance analysis (see Appendix F) to identify common categories of representation of refugees, asylum seekers, immigrants, and migrants. They also discussed how collocation and concordance analysis can be used to direct researchers to representative texts in order to carry out qualitative analysis.
Research in the Spotlight
Combining Critical Discourse Analysis and Corpus Linguistics
Baker, P., Gabrielatos, C., Khosravinik, M., Krzyzanowski, M., Mcenery, T., & Wodak, R. (2008). A useful methodological synergy? Combining critical discourse analysis and corpus linguistics to examine discourses of refugees and asylum seekers in the UK press. Discourse & Society, 19(3), 273–306.
In this critical discourse analysis (CDA) study, the linguist Baker and his colleagues analyzed a 140-million-word corpus of British news articles about refugees, asylum seekers, immigrants, and migrants. The authors used collocation and concordance analysis (see Appendix F) to identify common categories of representations of the four groups. The authors also discuss how collocation and concordance analysis can be used to direct researchers to representative texts in order to carry out qualitative analysis.
Specialized software used:
WordSmith
Content Analysis