In 2012, Latanya Sweeney was talking to a reporter in her Harvard office when she Googled her own name. An ad for a background check popped up next to the links of her published work. “Forget the studies,” the reporter said. “Tell me about that arrest record.”
Sweeney, mystified, clicked on “Latanya Sweeney, Arrested?”, paid the subscription fee and showed the reporter she had no arrest record. “It must be that black-sounding first name you have,” he said. That’s not how ad algorithms work, she told him, and they resumed the interview. But Sweeney kept thinking about the incident.
As professor of government and technology in residence and founding director of Harvard’s Data Privacy Lab, Sweeney knows more than most about the labyrinthine paths of our personal data. Companies can legally tap into details about what we buy, whom we text, even the brand and location of our computers. If your bed contains microsensors that allow you to tweak temperature and firmness, you’re handing intimate information over to a third party. If you’re diabetic, insurers can discern how often you indulge in Krispy Kreme—and deny you health or life insurance because of it. Sweeney has read that a third of Fortune 500 companies admit to making hiring, firing, and promotion decisions based on health information. “The way privacy works,” she says, “is that there is no law, and nothing’s a problem until it’s a problem.”
“It’s not just privacy,” says Jinyan Zang, managing editor of Technology Science, an online Harvard-based journal that reports and discusses society-technology issues, and a PhD candidate in the Department of Government affiliated with Harvard’s Institute for Quantitative Social Science. “All of our democratic values are up for grabs: First Amendment rights, surveillance, fake news. The rationale for the Data Privacy Lab is to create a movement dedicated to studying the issues challenged or disrupted by new types of technology.”
Like many of his generation, Zang wasn’t overly concerned with online privacy. When the movie The Social Network was being filmed in Harvard Square in 2009, his freshman year at Harvard, he recalled aspiring to emulate Mark Zuckerberg’s experiences in Silicon Valley. By the time Zang graduated in 2013, he no longer considered technology a universal good or a simple moneymaking activity.
In the past few years, in classes he co-teaches on the politics of personal data and “Data Science to Save the World,” he’s seen many undergraduates shift their stance, like him, to this: “If I’m young and idealistic and I want to change the world in a positive manner, then I want to deploy technology in a way that’s actually beneficial to the world, or at least figure out a way to address the emerging harms or disruptions that could be unintended,” Zang says.
“Unintended consequences” is Sweeney’s mantra. With her close-cropped graying hair, chunky glasses, no-frills blouses and blazers, and big laugh, she exudes a vibe that’s at once no-nonsense and eminently approachable. Her website features a collection of misguided predictions about, among others, the impossibility of flying machines and the unmarketability of personal computers. The quotes are funny, yet flaunt a very real issue: the inability of smart people to gauge the outcomes of accelerating technology.
Following up on the reporter’s observation about the pop-up ad, Sweeney discovered data fallout that surprised even her: Google AdSense generated ads suggestive of an arrest in as many as 95 percent of searches for names such as DeShawn, Darnell, and Jermaine, while names associated mostly with whites, such as Geoffrey, Jill, and Emma, generated more neutral copy.
The way privacy works is that there is no law, and nothing’s a problem until it’s a problem.” — Latanya Sweeney
After publishing her findings in 2013, Sweeney called on programmers and other technologists to think about societal consequences like structural racism in the technology they design. Her latest research goes much farther than gently chiding the keepers of highly personal data.
Unlocking the Data
Being raised by her great-grandparents in Tennessee set Sweeney apart from most of her peers. “Nothing was easy,” Sweeney said of her childhood. Except for math, where she found precision, predictability, and beauty. At Dana Hall in Wellesley, Massachusetts, Sweeney came across the new field of computer science and started writing programs “for everything,” she recalled. “I felt like I could get a computer to do anything I wanted.”
After earning a degree in computer science at Harvard and launching a Kendall Square startup, she pursued graduate degrees at the Massachusetts Institute of Technology, where in 2001 she would become the first African-American woman to earn a PhD in computer science. One day in 1997, she overheard Brandeis University medical ethicist Beverly Woodward expounding on technology’s social evils. Sweeney argued in favor of what she saw as computers’ life-enhancing capabilities. Woodward told her that the state agency providing health, disability, dental, and vision services to Commonwealth employees was giving away copies of their personal data to researchers and others. Was the data truly anonymous?
Sweeney knew that then-governor William F. Weld had recently been rushed to the hospital after collapsing while delivering a Commencement address at Bentley College. Sweeney found Weld’s date of birth in voter registration rolls, identified the only hospital record for a male with that birthday, and suddenly had access to details of his hospital stay, diagnosis, and prescriptions.
That “simple ad-hoc experiment,” as she describes it, had a profound effect. Within a month, Sweeney, who would be dubbed “the goddess of re-identification” by an Illinois judge, was testifying before Congress as the Health Insurance Portability and Accountability Act (HIPAA) was being drafted. It wasn’t just Massachusetts sharing data, Sweeney told legislators. “All of a sudden,” she says, “for the first time someone was saying, ‘Wait a second. That data is pretty vulnerable.’”
The Facebook Generation
Sweeney’s early attempts to raise the alarm about medical privacy fell on reluctant—if not completely deaf—ears. Pushback from academic journals and a dearth of funding sources kept most of her early studies, including the one on Weld, out of the public view. At the same time that her peers fretted about raising such provocative issues without simultaneously identifying potential solutions, younger generations, like Zang’s, were so cavalier about privacy that it hardly registered at all.
Zang had been warned by his parents and others to safeguard his Social Security number, so he dutifully covered it with his hand when he filled out forms in public to keep his information safe. But in Sweeney’s undergraduate course on technology and privacy, Zang saw that Sweeney could predict most or all of the digits of students’ Social Security numbers from only their birthdays and hometowns. “Wait a second,” Zang recalls thinking. “On Facebook, I shared my birthday because I like getting happy birthday messages and I also shared my hometown. And so that means that someone can pull my Facebook profile and predict my Social Security number?
“For me, that was one of the turning points,” he says. “Even if I haven’t been a victim of identity theft yet, the fact that it’s out there, a threat hanging over all of us, shouldn’t be a fact of life.”
Months after graduating in 2013, Zang took an unpaid leave from his management consulting job in Silicon Valley to join Sweeney as a research fellow when she became chief technologist for the US Federal Trade Commission. There, he and others documented data’s hidden flows, adding their findings to a graphical tool created at the Data Privacy Lab called theDataMap. When Sweeney returned to Harvard, he later followed and began his PhD studies in fall 2016.
Most people don’t scroll through—let alone read—the terms of service or privacy policies when they download a new app. When Zang tracked where free iOS and Android health-related apps were sending data, “we learned amazing things,” he says, depicted on theDataMap as a constellation of connected orange dots representing the unexpected places data from a doctor’s visit or health app could end up: the Centers for Disease Control, media outlets, marketing firms, research labs, and pharmaceutical companies, among others.
“If I use an Amazon app, I would expect it to talk to Amazon.com,” he says. “But if it is also sharing your personal data with ad platforms, analytics companies, all sorts of other companies, that’s not within most people’s expectation.
“We found that many free health apps, specifically ones that track women’s menstrual periods, were sharing a significant amount of data,” he says. “If I’m a free app developer, I need to make money somehow. If the service is free, your data is the product.”
The Illusion of Control
Big Data is not all bad. Scientists use artificial intelligence to detect patterns—useful in epidemiology, social science, even cancer research—in the millions of pieces of data darting across the Internet every microsecond. Sweeney wants to balance data’s utility with its potential for harm. In the big business of health, well-being, and by extension, healthcare data, the scale is precariously tipped against the consumer.
Take HIPAA. The ubiquitous forms we’re asked to sign actually give doctors the right to legally share our anonymized medical histories. Most Americans are unaware that information about their doctor and hospital visits goes to the state, which sells it to data analytic and pharmaceutical companies, among others, that have a financial incentive to exploit it. The HIPAA forms, Sweeney says, represent trees sacrificed for nothing more than a pacifier, a charade that someone cares about your privacy, and that it’s being protected.
Thirty-three of the states that amalgamate patient demographics, diagnoses, completed procedures, summaries of charges, and names of attending physicians and hospitals give away or sell a version of this information. Only five of those thirty-three have protections as restrictive as HIPAA’s. Certain details are stripped or redacted. Yet by cross-referencing data sets, Sweeney links names to records.
“The fact that we can show that the data is vulnerable to re-identification—but companies aren’t required to tell us whether a new product they produced exploited that data—is a real problem,” Sweeney says.
State by state, Sweeney is re-identifying individuals in so-called anonymous data—and now her studies, more than a hundred to date, are being published. In response to a 2015 study, Washington state passed legislation that makes the public-use version of medical data sets more secure and requires users in search of more detailed data to fill out applications. Separately, California has instituted some of the country’s toughest privacy laws. “That was a total win,” Sweeney said. “You’d think the other 30 would change. Nope. Just Washington state and California.”
If I'm a free app developer, I need to make money somehow. If the service is free, your data is the product." — Jinyan Zang
TheDataMap’s ever-widening circles indicate that technology is migrating past the law, Zang says. Health analytics companies, for instance, didn’t exist when HIPAA was drafted. Zang says much of the Data Privacy Lab’s work aims to illuminate this fact. “We want to help legislators and regulators create informed policy by giving them concrete information about how data-sharing actually happens in the real world,” he said. “Then, if you’re a legislator, advocate, journalist, or regulator, you can point to our actual studies instead of conjecture.”
Sweeney says companies should own the fact that the data they share is vulnerable. “We want them to say, ‘We’re going to put some data out. It’s going to be vulnerable, and when we learn about these vulnerabilities, we’re going to improve,’” she says. This means that buyers would agree to stop using compromised datasets—or face consequences.
To add transparency, online registries would document for the public who received what data. “Right now, if a breach does happen, you have no idea if company XYZ had your data. You don’t even know if you should be concerned about it,” Sweeney says. “But if we had logs of who received a certain hospital’s data, for instance, you can make predictions about whether you’re likely to be harmed.”
Sweeney also is working on a secure platform through which individuals can collect, assemble, and distribute their personal data across disparate silos, giving people the option to participate in research that might improve their own or others’ quality of life.
Zang is excited about the growing energy and awareness among millennials and others who had grown up with low privacy expectations. Being informed is the first line of defense to prevent, as standard-bearer and law professor Paul Ohm once put it, companies knowing more about us than we know about ourselves. “This is a harm that is known, and can be fixed,” Zang said. “How do we fix it, and how do we figure out these other vulnerabilities? That’s a really cool space to be in right now with the Data Privacy Lab.”