Everyone thinks about cybersecurity sometimes—but what if we told you that attacks on private data don’t affect all people the same way? It turns out that, depending on the dataset, some demographics are more vulnerable to privacy attacks, and the available defenses don’t account for these differences, leaving some people at greater risk.
It’s a cyber-situation first identified by a team from Stevens Institute of Technology, led by Hui (Wendy) Wang, associate professor in the Department of Computer Science (CS) and including Jun Xu, CS assistant professor, and Yu Tao, associate professor of social sciences in the College of Arts and Letters. The team’s recent award from the National Science Foundation’s Secure and Trustworthy Computing Program funds a comprehensive project that takes aim at fair privacy.
What is fair privacy?
Working at the nexus of privacy and fairness makes a lot of sense for Wang. “When I joined Stevens in 2008, I worked on data security, and then I eventually shifted to data privacy,” said Wang. “In recent years, I started to be interested in fairness in machine learning. This is a fast-growing research topic…both in industry and academia. And then I asked myself, ‘Why not work on the overlap of these two topics?’”
Data privacy is about ensuring that an individual’s data can’t be accessed by unauthorized users and includes specific types of privacy attacks, such as membership inference attacks (MIA) and attribute inference attacks (AIA). These occur when an attacker tries to determine if an individual is included in a dataset that is used to train a machine learning model or to infer an individual’s identity traits (such as their gender or political affiliation) from public data.
Fairness in machine learning looks at the way human bias influences both the data and the model itself. A classic example of this is gender bias. If a model is trained with data that contains gender bias—like associating women with cooking and men with sports—then the model will learn that bias and pass it on.
A new research frontier
Taken together, fair privacy explores how embedded bias makes some data—and people—more vulnerable to attacks. “When we think about privacy attacks, [we tend to think] those attacks will not distinguish different demographic groups,” explained Wang. “But when we measured the privacy attacks—and especially the outcomes—we found out that some demographic groups are particularly vulnerable against this attack compared with other groups.”
A group’s vulnerability to privacy attacks is related to how balanced the data distribution is—so groups that show up in smaller numbers in the dataset are more vulnerable. In some cases, the model can leak information about some groups if it’s overfit—meaning it models the training data but doesn’t generalize unseen data as well. For example, the team looked at a classification model built from the 1994 US Census database and found that female records were more vulnerable to privacy attacks on the dataset. That means a real-world attacker trying to figure out if someone is part of that dataset would have a better chance at succeeding if their target was a woman.
Since the available defenses against privacy attacks don’t yet account for these disparities, vulnerable groups aren’t defended fairly. “If a privacy attack is more successful against Black or against non-Black people, but the defense mechanism doesn't make any distinction between these two groups, then of course, we don't expect a fair amount of protection by the same mechanism,” explained Wang. “So, that's why, in our project, we try to investigate from this perspective. We want to design a mechanism that can provide fair protection across different demographic groups.”
The project—which kicks off in January and will run for four years—approaches the issue of fair privacy from technical and social angles. The team’s mission is to formalize the concept of fair privacy, using MIAs and AIAs to show how and why disparate vulnerability occurs and explore the shortcomings of existing defense mechanisms. They will also tackle the problem head-on, designing mitigation mechanisms that ensure equitable protection against those attacks and employing social science frameworks to shape this new area of research.
Building public trust in machine learning
This project is a big deal for data privacy research—but the implications are tangible, too. Wang hopes that resolving the existing unfairness will build public confidence in the trustworthiness of machine learning.
“People are worried that machine learning can be used as a weapon by attackers,” explained Wang. “If we can provide a fairness guarantee—and maybe other projects could provide a robustness guarantee or security guarantee—people can feel more confident to use machine learning in their everyday life.”
The project is a boon for Stevens students, who will have the opportunity to participate in fair privacy research as it emerges. The team is developing cutting-edge course materials as one outcome of the project. Eventually, they expect to release the curriculum for use by faculty at other universities, but Stevens students can expect immediate access.
Learn more about cybersecurity at Stevens: