Skip to main content

The Hidden Trade in Our Medical Data: Why We Should Worry

For-profit companies use our anonymized medical data in a huge secondary market. Advances in computing make it increasingly possible for outsiders to identify people from among the hundreds of millions of patients in dossiers, putting intimate secrets about our bodies and minds at risk

Excerpted and adapted from Our Bodies, Our Data: How Companies Make Billions Selling Our Medical Records.  Copyright © by Adam Tanner. With permission of the publisher, Beacon Press. All Rights Reserved.

Companies that have nothing to do with our medical treatment are allowed to buy and sell our health care data, provided they remove certain fields of information, including birth date, name and Social Security number. These guidelines, outlined in the U.S. HIPAA rules, have allowed a multi-billion-dollar trade in anonymized patient data to emerge in recent years, with data mining firms collecting dossiers on hundreds of millions of patients. A growing number of data scientists and health care experts say the same computing advances that allow the aggregation of millions of anonymized patient files into a dossiers also make it increasingly possible to re-identify those files—that is, to match identities to patients.

“It's very difficult to protect data from re-identification through most processes that are used to anonymize it,” said Dr. Jonathan Wald, a Harvard Medical School instructor and expert on health data at the non-profit group RTI International. “That is easy when it is a rare condition and there are a few other tidbits. It is getting easier and easier because of the amount of electronic publicly available data and the amount of analytic engines to turn through it.”


On supporting science journalism

If you're enjoying this article, consider supporting our award-winning journalism by subscribing. By purchasing a subscription you are helping to ensure the future of impactful stories about the discoveries and ideas shaping our world today.


Management Science Associates in Pittsburgh is one of the companies that helps data miners aggregate anonymized patient dossiers. Jani Syed, the company’s technical group director, surprised me with his candor about the risks of re-identification.

“In the area of big data there are always problems with the privacy,” he said. “No matter what you do, no matter how much data obfuscation you are going to do, if you have enough data it is always possible to identify a particular person. It's not that hard to do.”

Another way outsiders may be able to identify anonymized files is by cross referencing them with other sensitive files that hackers and thieves have obtained in recent years. Unfortunately, identified details about you from medical files may already be in circulation on the Internet or in hacker circles. This possibility is something I know about personally as I am one of many millions whom medical insurers and providers have notified as a victim of such attacks. Between 2009 and 2015, the U.S. Department of Health and Human Services recorded more than 1,300 data breaches involving more than 500 people, accessing data on more than 135 million people.

To date, there is no publicly recorded incident of hackers getting into the anonymized individual patient dossiers held by data miners, nor reported instances of re-identification of anonymized medical records in the United States other than academic experiments Even if thieves did hack such anonymized records, they would face the additional complication of re-identifying the records. The reward for all that effort would be a potentially richer array of insights into a patient than from single-source files, as anonymized patient data may contain pharmacy, claims, doctor, and even lab information.

Experts identify a variety of possible motivations for an outsider to seek to re-identify medical files.

A rival at work who wants your job or simply does not like you may know when you took medical leave and other clues that could make it possible to find you in batch of anonymous patient files. Suddenly, your re-identified files might appear in circulation. In a crime of passion, a romantic rival – or crossed former lover – might want to spread such information on the Internet, a variant of revenge porn in which former partners post intimate photos online.

“Health information, in particular, which can encompass a variety of things from sleep patterns to diagnoses to genetic markers, the data gathered about us can paint a very detailed and personal picture that is essentially impossible to de-identify, making it valuable for a variety of entities such as data brokers, marketers, law enforcement agencies, and criminals,” says Michelle De Mooy, director of the Privacy & Data Project at the Center for Democracy & Technology.

“Traditional methods of anonymization from commercial entities, such as the use of patient identifiers, have also become more of a problem with the amount of data available about individuals - there is of course an entire industry in vendors matching records retroactively.”

Medical data, both de-identified and re-identified, could also become national security weapons against members of the armed forces and their families, or high-ranking officers.

“It is not just that the information might embarrass a general or embarrass a senator—because we also see VIPs and so forth in our system—it is that the aggregation of certain health data in our context is potentially classified information,” said one military official who did not want to be named. “If I were to aggregate immunization data for a particular region of our country, like say Ft. Bragg, I might be able to learn where special operators are ready to deploy in the world given the timeline.”

The dramatic increase in online data theft in recent years shows that shadowy hackers routinely steal and release personal data, even though such activity is illegal. Thieves can use such information for extortion or medical identity theft. The actual re-identification of medical dossiers, however, is not a crime, although such action might constitute a breach of contract depending on the conditions set by the source of the information.

It is not hard to imagine a U.S. senator condemning a foreign country only to find his or her intimate medical data plastered on the Internet, or unscrupulous political operative leaking information about a rival candidate (the bitterness of the 2016 U.S. campaign makes such sleazy tactics easy to imagine). Rogue investors might be keen to learn inside details about the health of key corporate leaders before stock prices react to future revelations. A fanatical sports fan may want to humiliate a rival team’s star player.

“That's the key challenge: Unlike financial fraud, it's not that broad-scale sort of identification that matters, it’s the VIP identification that matters,” said Sean Nolan, former general manager of Microsoft HealthVault. “Because that's where you actually have actionable, real data that you can use.”

“The dirty not-so-secret is that data HIPAA considers anonymized isn't.”