An Introduction to Text Mining. Gabe Ignatow

An Introduction to Text Mining - Gabe Ignatow


Скачать книгу
recognize the limitations of their chosen methods and not make imperious or inflated claims about these tools’ revolutionary potential. Like all social science methods, text mining methods have benefits and drawbacks that must be recognized from the start and given consideration in every phase of the research process. And text mining researchers should be aware of historians’ concerns about the quality of data stored in digital archives and the possibility for digital archives to encourage researcher passivity in the data gathering phase of research.

      Conclusion

      This chapter has introduced text mining and text analysis methodologies, provided an overview of the major approaches to text analysis, and discussed some of the risks associated with analyzing data from online sources. Despite these risks, social and computer scientists are developing new text mining and text analysis tools to address a broad spectrum of applied and theoretical research questions, in academia as well as in the private and public sectors.

      In the chapters that follow, you will learn how to find data online (Chapters 2 and 6), and you will learn about some of the ethical (Chapter 3) and philosophical and logical (Chapter 4) dimensions of text mining research. In Chapter 5, you will learn how to design your own social science research project. Parts II, IV, and V review specific text mining techniques for collecting and analyzing data, and Chapter 17 in Part VI provides guidance for writing and reporting your own research.

      Key Terms (see Glossary)

       Concordance 5

       Content analysis 5

       Conversation analysis 6

       Critical discourse analysis (CDA) 6

       Digital archives 15

       Disambiguation 4

       Discourse positions 6

       Foucauldian analysis 6

       General Inquirer project 5

       Natural language processing (NLP) 4

       Netnography 14

       Sample bias 12

       Sentiment analysis 4

       Text analysis 3

       Text mining 3

       Virtual ethnography 14

       Web crawling 4

       Web scraping 4

      Highlights

       Text mining processes include methods for acquiring digital texts and analyzing them with NLP and advanced statistical methods.

       Text mining is used in many academic and applied fields to analyze and predict public opinion and collective behavior.

       Text analysis began with analysis of religious texts in the Middle Ages and was developed by social scientists starting in the early 20th century.

       Text analysis in the social sciences involves analyzing transcribed interviews, newspapers, historical and legal documents, and online data.

       Major approaches to text analysis include analysis of discourse positions, conversation analysis, CDA, content analysis, intertextual analysis, and analysis of texts as social information.

       Advantages of Internet-based data and social science research methods include their low cost, unobtrusiveness, and use of unprompted data from research participants.

       Risks and limitations of Internet-based data and research methods include limited researcher control, possible sample bias, and the risk of researcher passivity in data collection.

      Review Questions

       What are the differences between text mining and text analysis methodologies?

       What are the main research processes involved in text mining?

       How is analysis of discourse positions different from conversation analysis?

       What kinds of software can be used for analysis of discourse positions and conversation analysis?

      Discussion Questions

       If you were interested in conducting a CDA of a contemporary discourse, what discourse would you study? Where would you find data for your analysis?

       How do researchers choose between collecting data from offline sources, such as in-person interviews, and online sources, such as social media platforms?

       What are the most critical problems with using data from online sources?

       If you already have an idea for a research project, what are likely to be the most critical advantages and disadvantages of using online data for your project?

       What are some ways text mining research be used to benefit science and society?

      Developing a Research Proposal

      Select a social issue that interests you. How might you analyze how people talk about this issue? Are there differences between people from different communities and backgrounds in terms of how they think about this issue? Where (e.g., offline, online) do people talk about this issue, and how could you collect data from them?

      Further Reading

      Ayers, E. L. (1999). The pasts and futures of digital history. Retrieved June 17, 2015, from http://www.vcdh.virginia.edu/PastsFutures.html

      Bauer, M. W., Bicquelet, A., & Suerdem, A. K. (Eds.), Textual analysis. SAGE benchmarks in social research methods (Vol. 1). Thousand Oaks, CA: Sage.

      Krippendorff, K. (2013). Content analysis: An introduction to its methodology. Thousand Oaks, CA: Sage.

      Kuckartz, U. (2014). Qualitative text analysis: A guide to methods, practice, and using software. Thousand Oaks, CA: Sage.

      Roberts, C. W. (1997). Text analysis for the social sciences: Methods for drawing statistical inferences from texts and transcripts. Mahwah, NJ: Lawrence Erlbaum.

      2 Acquiring Data

      Learning Objectives

      The goals of Chapter 2 are to help you to do the following:

      1 Recognize the role data plays in text mining and the characteristics of ideal data sets for text mining applications.

      2 Identify a variety of different data sources used to compile text mining data sets.

      3 Assess the advantages and limitations of using social media to acquire data.

      4 Analyze examples of social science research using data sets drawn from different sources.

      Introduction

      While social scientists have for decades made use of data from attitude surveys, today researchers are attempting to leverage the growing volume of naturally occurring unstructured data generated by people, such as text or images. Some of these unstructured data are referred to as “big data,” although that


Скачать книгу