An Introduction to Text Mining. Gabe Ignatow
An Introduction to Text Mining
Research Design, Data Collection, and Analysis
Gabe Ignatow
University of North Texas
Rada Mihalcea
University of Michigan
FOR INFORMATION:
SAGE Publications, Inc.
2455 Teller Road
Thousand Oaks, California 91320
E-mail: [email protected]
SAGE Publications Ltd.
1 Oliver’s Yard
55 City Road
London, EC1Y 1SP
United Kingdom
SAGE Publications India Pvt. Ltd.
B 1/I 1 Mohan Cooperative Industrial Area
Mathura Road, New Delhi 110 044
India
SAGE Publications Asia-Pacific Pte. Ltd.
3 Church Street
#10-04 Samsung Hub
Singapore 049483
Copyright © 2018 by SAGE Publications, Inc.
All rights reserved. No part of this book may be reproduced or utilized in any form or by any means, electronic or mechanical, including photocopying, recording, or by any information storage and retrieval system, without permission in writing from the publisher.
Printed in the United States of America
Library of Congress Cataloging-in-Publication Data
Names: Ignatow, Gabe, author. | Mihalcea, Rada, 1974- author.
Title: An introduction to text mining : research design, data collection, and analysis / Gabe Ignatow, University of North Texas, Rada Mihalcea, University of Michigan.
Description: Thousand Oaks : SAGE Publications, [2018] | Includes bibliographical references and index.
Identifiers: LCCN 2017038203 | ISBN 9781506337005 (pbk. : alk. paper)
Subjects: LCSH: Data mining. | Social sciences—Research.
Classification: LCC QA76.9.D343 I425 2017 | DDC 006.3/12—dc23 LC record available at https://lccn.loc.gov/2017038203
This book is printed on acid-free paper.
Acquisitions Editor: Helen Salmon
Editorial Assistant: Megan O’Heffernan
eLearning Editor: Chelsea Neve
Production Editor: Kelly DeRosa
Copy Editor: Megan Markanich
Typesetter: C&M Digitals (P) Ltd.
Proofreader: Wendy Jo Dymond
Indexer: Joan Shapiro
Cover Designer: Michael Dubowe
Marketing Manager: Shari Countryman
Acknowledgments
An Introduction to Text Mining has been a long time in the making, and there are too many people to count who deserve our thanks for helping to bring this book to publication. First and foremost, we must thank our undergraduate and graduate students who have shown so much enthusiasm for learning about online communities. It was their energy and questions that convinced us of the need for this book. Helen Salmon, Katie Ancheta, and the entire editorial and production staff at SAGE deserve our special thanks. In truth, it was Helen who got this project off the ground, and she and the entire SAGE staff, including SAGE’s team of expert reviewers, provided support and guidance throughout the writing and production process. SAGE’s reviewers played an especially critical role by providing invaluable feedback based on their research and teaching experiences in their home disciplines. A textbook as interdisciplinary as this one requires absolutely top-flight reviewers, and we were fortunate to have many of them. A special thank-you goes to Roger Clark, Kate de Medeiros, Carol Ann MacGregor, Kenneth C. C. Yang, A. Victor Ferreros, and Jennifer Bachner.
Last but by no means least we thank our spouses and children Neva, Alex, and Sara, and Mihai, Zara, and Caius, for their patience with us and their encouragement over the many years of research, writing, and editing that went into this textbook.
GI and RM
Preface
Students are accustomed to participating in all sorts of online communities. While interacting on platforms such as Facebook, Twitter, Snapchat, and Instagram as well as on blogs, forums, and many other apps and sites, some students taking courses in the social sciences and computer science want to take things a step further and perform their own research on the social interactions that occur in these communities. We have written this book for those students, including especially undergraduate and graduate students in anthropology, communications, computer science, education, linguistics, marketing, political science, psychology, and sociology courses who want to do research using online tools and data sets. Whether they are writing a term paper or honors thesis, or working on an independent research project or a project with a faculty adviser, students who want to use text mining tools for social research need a place to start.
Online communities offer no end of interesting linguistic and social material to study, from emojis and abbreviations to forms of address, themes, metaphors, and all sorts of interpersonal conversational dynamics. The volume of data available for research, and the many research tools available to students, are simply overwhelming. An Introduction to Text Mining is here to help. The book is organized to guide students through major ethical, philosophical, and logical issues that should be considered in the earliest stages of a research project (see Part I) and then to survey the landscape of text mining and text analysis tools and methodologies that have been developed across the social sciences and computational linguistics. Appendices A through G on data and software resources are a key to the book, and readers should consider reviewing these early and returning to them often as they work their way through the early chapters and begin to design their own research projects (see Chapter 5).
If you think of your text mining research project as a house, then the chapters in Part I are instructions for building the foundation. Just as a house with a flaw in its foundation will not last long, a research project with a shaky logical foundation or questionable ethics may look good at the start, but it is inevitable that at some point its flaws will be exposed. Chapter 5 on research design provides architectural instruction for building the framework of your house. Designing a research project that can address, and perhaps conclusively answer, a research question or questions is a challenging task, and it is useful to know the kinds of research designs that have a track record of success in research using text mining tools and methodologies. Parts III through V survey text mining and analysis methodologies, the equivalent of proven house-building methods.