Social Monitoring for Public Health. Michael J. Paul
section will describe the different types of social media, and will discuss the types of health applications for which they are appropriate.
3.3.1 GENERAL-PURPOSE SOCIAL MEDIA
Blogs and Microblogs
Blogs (short for weblogs) are websites where individuals post messages and articles. Popular blogging platforms include Tumblr, WordPress, and Blogger.
Microblogs, such as Twitter and its Chinese counterpart, Sina Weibo, are social media platforms where users share brief “status updates.” The defining characteristic of microblogs is the short message length, in contrast to standard blogs. For example, Twitter messages can be no longer than 140 characters, a restriction that has been in place since its inception (though it has been loosened in various ways, first by using URL shortening, and more recently by not counting usernames toward the limit). Other platforms like Facebook have higher length limits, but messages still tend to be short. Smaller specialty platforms often have specific features that can change how they are used, such as the now defunct app YikYak which offered users anonymity [Koratana et al., 2016].
Microblogs are popular avenues for sharing news as well as the current status, beliefs, and activities of users, making them desirable for social monitoring. These platforms are intended for broadcasting information, often to a general, public audience. As such, content on these platforms is most often public, even though private accounts are possible.
Microblog users will often share messages written by others, called “retweets” in Twitter. Retweets are repostings of previously-published messages, rather than original content, and are often handled separately in systems that use social media data, since retweet activity can differ from original tweet activity.
Social Networks
Social networking platforms, such as Facebook and LinkedIn, are websites where users can connect with one another. In contrast to microblogs, where users typically publicly broadcast information, information published on social networking platforms is typically shared with a limited audience, such as friends and coworkers. Such websites are primarily designed for maintaining relationships and accounts are often private, although there are plenty of public accounts on Facebook that share general news. For these reasons, social networks are used less commonly for public health surveillance. However, social network data can be valuable for research that investigates social factors [Cobb et al., 2011].
Media Sharing Platforms
Some social media websites primarily serve as platforms for sharing visual media, such as videos (e.g., YouTube) and photos (e.g., Instagram, Flickr) [Vance et al., 2009]. Media can reveal population attitudes and behaviors, such as dietary choices revealed through photos [De Choudhury et al., 2016a] and drug use captured in videos [Morgan et al., 2010]. Additionally, the comments on sites like YouTube can be helpful for some health applications [Burton et al., 2012a, Freeman and Chapman, 2007].
General-purpose sharing websites include Reddit and Digg, where users submit links to other websites and articles, in addition to media such as images and videos. These websites are typically organized into different categories of discussion, such as politics and science. For example, Reddit is organized into thousands of topic-specific “subreddits” which are created and moderated by users.
For social monitoring, often the text comments and discussions on these platforms are used as data rather than the media itself.
3.3.2 DOMAIN-SPECIFIC SOCIAL MEDIA
In addition to general-purpose social media, some websites exist for more narrow purposes, including in the domain of health.
Review Websites
Online reviews are a focused type of social media, where users write reviews (usually including numeric scores) of products and services. Some review websites are quite broad, like Yelp, which is most commonly used to review businesses and restaurants. However, many review websites are domain-specific, including in the domain of health. For example, RateMDs.com is a website where people can post reviews of their doctors, and Drugs.com allows users to write reviews of medications.
In the domain of public health, researchers have monitored review websites to detect food poisoning outbreaks (from restaurant reviews) [Harrison et al., 2014] and drug side effects (from medication reviews) [Yates and Goharian, 2013].
Patient Communities
There are many web-based communities designed for patients to share information and experiences with one another. Online communities often use discussion forums—websites where users can create and respond to threads of conversation and discussion—as the mode of communication. Forums can be used to communicate information as well as to provide social support. Some patient forums also function as support groups, such as the websites DailyStrength and MedHelp.
A well-known patient community is PatientsLikeMe, where patients share information, especially regarding treatment options. In a famous experiment, hundreds of PatientsLikeMe members experimented with a novel treatment for amyotrophic lateral sclerosis (ALS) and shared their results, functioning as an informal, grassroots clinical trial [Wicks et al., 2011].
Additionally, some grassroots patient communities have developed in general-purpose platforms. For example, people create “group chats” on Twitter, where interested users agree on a particular hashtag and meeting time, and regularly have a conversation on a topic (e.g., cancer support chat on a weekly basis). Approximately 10% of Twitter group chats are about health [Cook et al., 2013].
3.3.3 SEARCH AND BROWSING ACTIVITY
While most social media data consists of information that is broadcast by users, other useful sources of information are activities performed by users on the Web.
One of the most common types of web activity is search. A query in a search engine suggests an interest in a topic, and thus by analyzing what people are searching for, researchers can infer what people are interested in. In public health, search data was most famously used by the Google Flu Trends system (Section 5.1.1), which estimates flu prevalence based on the number of people who are searching for flu-related information, under the assumption that those who are interested in flu are probably experiencing flu.
Search engines, such as Google, Bing, and Yahoo, log the queries that are searched by users. Raw query logs are private data, but some engines make aggregate statistics about query volumes publicly available through services such as Google Trends, described in Section 3.5.
Search data can also be analyzed from domain-specific websites, such as PubMed [Yoo and Mosa, 2015], often through private services not publicly obtainable, in contrast to Google Trends. For example, researchers from the National Cancer Institute partnered with Ask Jeeves to understand the information needs of cancer patients [Bader and Theofanos, 2003], and Santillana et al. [2014a] obtained search data from UpToDate, a disease database used by clinicians, to infer disease prevalence from clinician activity.
Another useful type of activity is browsing—a trace of the web pages that are visited by a user. Such data can come from detailed logs recorded by browsers such as Google Chrome and Microsoft Internet Explorer, but this data is private and, as such, is typically limited to researchers working at these companies [schraefel et al., 2009]. Outside researchers can obtain browser activity logs directly from the machines of participants, but obtaining such data requires the recruitment of consenting volunteers, and thus such research will typically be small scale [Fourney et al., 2014].
A public source of browsing data comes from Wikipedia, which public health researchers have utilized. Wikipedia publicly publishes timestamped logs of visits to each article, and this data can be used to measure levels of interest in articles such as “Influenza”