Research & Innovation

Stevens Takes the Lead in Keeping Your Private Data Private

As federated learning systems gain popularity, Shusen Wang is working to protect everything from those cute family photos on your smart phone to your highly confidential medical data

abstract computer security image

In the digital jungle, data is king—and like crafty hyenas eager to steal a lion’s dinner, cyber criminals are always lurking to obtain data from unsuspecting—and even the most cautious—users. Joining the hunt to put a stop to this cyber threat, Stevens Institute of Technology assistant computer science professor Shusen Wang is devoting his research to securing data used in the emerging machine learning field known as federated learning.

A Wake-Up Call for Data Privacy

“Data is the key to machine learning,” Wang explains. “The more data we have, the better we can train a machine learning model to make highly accurate predictions. A major challenge, though, is the so-called ‘data silos’ distributed across isolated sources that cannot be shared due to privacy issues. Federated learning has been proposed as a solution to allow cross-device, cross-silo learning without breaking privacy constraints. The raw data does not leave one’s device, yet multiple parties’ data can be used on a central server for training the model.”

On the surface, federated learning sounds like a dream come true for transforming data to preserve privacy while improving the execution of countless everyday activities and potentially life-saving innovations. Conceptually, nothing about the raw data is revealed. Tech leaders such as Google, one of the model’s first proponents, can aggregate app data to improve cellphone users’ experience in simple tasks such as typing words and organizing photos—without sharing those words or photos. Organizations can bridge internal data silos to work more effectively—without breaching security safeguards. Hospitals and pharmaceutical companies can merge private data such as name, gender, race, blood tests, X-ray images, and CT scans to predict disease incidence, track the evolution of illnesses such as COVID-19, or make medical decisions—without violating patient confidentiality or related laws, policies, and user agreements.

However, thanks to the groundbreaking efforts of Stevens computer science chair Giuseppe Ateniese and his team, there’s been a rude awakening that, in its present state, federated learning may be a privacy nightmare.

“Back in 2017, Giuseppe and his co-authors were the first to realize that federated learning still leaks users’ privacy,” Wang says. “Attacks are easy to launch; the attacker needs only to be a participant who can communicate with the server in the federated learning model. Defending data against these privacy leakages is imperative. Stevens is strong in research into security and privacy, and when Giuseppe told me about this work during my interview in 2018, I became interested in helping make federated learning safer.”

Double-Blind Machine Learning Defense May Support a Clear Sight to ‘Holy Grail’ of Data Protection

Ateniese and Wang are each developing defenses intended to stop privacy leakage.

“Since my team and I designed the first attack on these protocols, we’re credible in the community, and we’re excited to be taking the lead in finding a solution that makes sense from a security perspective,” Ateniese says. “I’m using cryptography techniques, which can be expensive. Shusen, one of the industry’s top machine learning experts, is approaching it from a computational, machine learning point of view.”

Specifically, Wang is developing a double-blind, collaborative learning defense that shows strong potential to substantially improve data privacy, more quickly and at a much lower cost.

“Random matrices transform the information sent between the server and users, so an attacker cannot use the communicated information for privacy inference,” Wang explains. “Theories guarantee that the defense defeats gradient-based attacks, which are the most effective privacy leakages at present. Experiments demonstrate that the defense works and does not hurt accuracy or efficiency. So far, it’s not absolutely safe, but we’re getting there. It’s like a sword and a shield—we’re still developing the shield to protect data from even the strongest sword.”

Although they are working independently, Ateniese and Wang are eager to see whether they can combine their approaches for the best of both worlds.

“Federated learning technology can be a game-changer, but we can’t patch fixes after an attack, because once private data has been leaked, it’s gone forever—you can’t reverse the damage,” Ateniese says. “Although we’ve seen hundreds of proposals for privacy-preserving federated learning since my team published our work in 2017, we’ve actually seen no effective solutions—they’ve been too easy to defeat or didn’t fully address privacy leakage. This has to be done properly from the start. Shusen’s method provides several privacy guarantees while being more efficient than a simple cryptographic approach. Ideally, we’ll combine his efforts from the machine learning field with strong security guarantees from cryptography to achieve the ‘holy grail’ of data privacy.”

Learn more about Computer Science at Stevens: