Do No Harm. Matthew Webster
Years later, there is relatively little oversight of big data despite these not so insignificant problems.
Still, advertisers love to know this information so they can send targeted advertising. That seems innocent enough if that is all that is being done with the data. Mobilewalla, a Singapore-based search portal for applications that target mobile devices, decided to publish age, sex, and ethnicity data related to the George Floyd protests. For many, this was a wake-up call for how much data is leaking out of our mobile phones.18 It should also be a wake-up call that information about American citizens can be bought and sold all over the world. Why does a Singapore company have that much data on United States citizens? Why bother hacking organizations if any country can buy any information on any citizen? Internet trolls can also use this kind of information to target individuals. They know who you are and what your preferences are and can use that information to manipulate public opinion. This was a big concern in the 2016 election, no matter what side of the political fence you straddled.
From a legal perspective, unless the data broker uses the data from credit, employment, insurance, or housing, there is no requirement to keep the data private. They can even sell information about health conditions—so long as the data is either anonymized or not from a covered entity.19 This means that there are robust amounts of data with very few protections.
What makes data brokers more interesting is that today we have more data than at any time in history. The digitization of information makes it far easier to stream data all over the planet in a comparatively short period of time. Sifting through that data is also far easier, which means corporations, governments, and people have that information. We truly are in the era of big data.
Big Data
Big data has been around since at least 1937, on a project that Franklin Roosevelt's administration had in relation to the Social Security Act, whose goal was to keep track of 26 million Americans. IBM developed the punch card to keep track of the process.20 It wouldn't be until 2005 when the term “big data” would be coined by Roger Mougalas.21 Big data is exactly what you might think it is—very large sets of data. From a hyperconnected perspective, it ties into many different data sets. The more data from more sources, as long as it is accurate, the better discoveries that can be made as a result of the assessment. It is generally accepted that there are four Vs that go along with big data—volume, velocity, variety, and veracity. All are critically important to the accuracy of information and helping advertisers more accurately target individuals.
Volume is really critical to big data because the more information you have, the better chance you have of having a particular piece of data. Think about it from a COVID-19 perspective. If you had only two people and those two people died as a result of COVID-19, you might come to the erroneous conclusion that COVID-19 was 100% fatal. While that example is absurd, having a large volume of data helps to weed out the statistical improbabilities that a small volume of data might indicate. The larger the data set, the more reliable that data tends to be.
Velocity, generally speaking, centers around the analysis of streaming data. The more sources of information—the more sensors that are on a person (patient or not), the better overall picture the data brokers or hospitals concerning the person or patient. The more real time the data is, the more useful that data can be to an organization because near-real-time judgment calls can be made. The store-and-forward technique discussed in the previous chapter means that decision making has a lag and may not be as relevant depending on the circumstances. When we talk about the instantaneity of the world, this is what people are talking about.
Variety is also key from a big data perspective. Having a single type of data source is good, but having more data sources is even better. Let us use COVID-19 data as an example. If all we had was the data on young children, our view on the disease would be different. We know that it disproportionately affects the elderly in terms of severity. The larger the variety of sources, the better analysis we have overall.
Veracity pertains to the accuracy of data. If our data set was very diverse when analyzing COVID-19, but it was wildly inaccurate to the point where it looked like everyone was affected the way the elderly are, we probably would be taking very different actions. Having accurate data really matters. If any one of the four Vs fails, we are provided with less than optimal information.
Today we have data scientists who work with these large volumes of data to extract patterns and knowledge. The buzzword for what they do is called “data mining.” While technically inaccurate, it is the most common and easiest way to explain to a general audience what data scientists do. In reality, data mining is an interdisciplinary field that combines both statistics and computer science. While there are a host of other processes that go into what they do, sifting through that data to create accurate data models and trends is crucial. Visualization of that data is the ultimate goal because they need to communicate to others what trends they are discovering.
Big data has a tremendous number of advantages for things other than healthcare data. Cost savings alone is a very strong motivator. It is used to identify better ways of doing business. Quick, actionable information is very critical to the heart of many businesses. From a marketing perspective, it can help to understand market conditions and the sentiment of people online. Toward this end, companies can better target marketing strategies to help boost customer acquisition and retention. All of these can be used to fuel better product innovations.22 These are just the beginning, however. Almost every industry is reaping the rewards of big data. In 2017, Forbes identified that 53% of companies are adopting big data analytics.23
Healthcare tends to be a little less mature in its data analysis techniques, but richer in its data sources, especially when considering IoMT devices.24 Now many of those data-rich healthcare companies are eager to utilize that data, not only to improve their own practices and knowledge, but to sell. In fact, all of the data sources that IoMT brings to the table have seen an explosive 878% growth since 2016.25 With 80% of healthcare executives investing in big data, big data is just not going away without additional influence. In fact, there is a hefty supply of big data—some of which has been in place for decades.
QuintilesIMS, a company dedicated to improving patient outcomes through the analysis of data, was created in the 1950s and now collects data on most prescription sales in the United States and many other countries.26 Health insurance companies are also involved in selling this data. Blue Health Intelligence, part of Blue Cross Blue Shield, has data on at least 165 million people dating back to 2005 and helps to supply QuintilesIMS. Big data also pulls data from IoMT, EHRs, providers, patient registries, private players, government health plan claims, and pharmacy claims.27
Today, anonymized health data is being bought, sold, and used by large corporations to get more information and improve their products and/or services. What is concerning about the data brokers is that they are able to add disparate pieces of information to the anonymized data collection that allow big data companies to determine an individual's identity.28 Anonymizing the data in the fashion that HIPAA requires is simply insufficient in today's world.
What is a concerning is that there are no federal laws against re-identification of information.