Teaching Technology to Detect Fake News
To improve knowledge, security, and public trust, Yue Ning is using machine learning to uncover the hidden connections that underlie fake news and social media misinformation
This month, Stevens Institute of Technology will host its fifth annual Introduce a Girl to Engineering! Day event at Brensinger Elementary School in Jersey City. Throughout February, we’re also sharing the stories of some of our female faculty members and students, and how they are critical to our mission to inspire, nurture, and educate leaders in tomorrow's technology-centric environment while contributing to the solution of the most challenging problems of our time.
These notable female researchers are fundamental contributors to the achievements that earned Stevens the Bronze Award in the American Society for Engineering Education’s inaugural Diversity Recognition Program for helping us make “significant, measurable progress in increasing the diversity, inclusion, and degree attainment outcomes.”
Misinformation. Disinformation. Rumors. Fake news.
Twentieth-century advances in digital technology have inexorably transformed the world in which we now live. But the near-instantaneous access to information that exemplifies the 21st Century also brings with it near-instantaneous access to fabrications, distortions, and falsehoods.
Can the selfsame technology exploited to promote and disseminate misleading information online also be used to identify, detect, and ultimately combat the very problem it was used to create?
Unearthing hidden connections in fake news
An affiliated faculty member of the Stevens Institute for Artificial Intelligence, Ning is working to develop an algorithm that can successfully determine whether content posted online is true—or, more importantly, false.
"If we are given a news article or any text information like social media posts, can we use artificial intelligence [AI] or machine learning to automatically tell the trustworthiness of this text?" she said. This question lies at the core of the project.
Using small subsets of news data gleaned from larger open-source data sets—including FakeNewsNet, Twitter, and Celebrity—Ning began by focusing on natural language processing using new encoding models like Google's BERT (Bidirectional Encoder Representations from Transformers).
Approximately half of the news is either fake or true within each data set and is labeled accordingly. By knowing these distinctions, Ning and a Ph.D. student can compare the vocabulary, frequency, and distribution of words within each set of content and identify relationships between the entities—such as a particular politician or celebrity—that appear in the content.
One unexpected discovery in the early stages of their content analysis, Ning said, was the ways in which the phrasings and patterns of fake news differ from true news.
"The distribution of the words, the relationships between words are quite different," she said. "We need to figure out how we can use that observation to improve the performance of [fake news] detection."
However, prediction performance based on vocabulary calculations alone is not very good. For an algorithm to successfully determine whether a news article is fake or not, it requires an understanding of semantic meaning and context. An important factor of establishing context is identifying the relationships between entities in a story.
To that end, Ning says their goal is to build a knowledge base extracted from true news and factual information that can be used to guide fake news detection.
"The true news is already verified to be true, so we can build factual relationships of entities from this set of true news, and then encode those relationships and entities into hidden features to help us detect if [other] relationships are fake," she said. "Based on the relationships and the roles of the entities in this [true] story, the AI model can tell if [a different story] is fake news or not."
Relationships between entities, however, are always evolving. The U.S. president, for example, changes every four or eight years, so his or her relationship information must be continually updated.
"We want to build a dynamic knowledge graph to adapt to the change of fake news in the case that, when we have new content, we can easily detect if this relation between the two entities is true—how much percentage we can guarantee this is true or not true," she said.
The ultimate goal is to build a system that will help not only to analyze historical data to make predictions for the future, but also to identify which past indicators or signals can be mapped to future events so that users can understand why the model is making the predictions that it's making.
One clear, practical application for such a tool is to aid social media platforms in detecting fake news spread using their platform and identifying suspicious user accounts in need of attention. This tool could not only flag fake news related to public figures, but also more general content such as health-related misinformation—such as vaccine suspicions or flu outbreak rumors—that have the potential to create confusion, security issues, and panic.
"To build a healthy online community, it's important to make sure the information that is presented to the people is trustworthy, instead of fake or random rumors that can cause problems and security problems," Ning says. "I think it will help them to build a platform that everyone trusts."
Beyond the news: Forecasting social and patient health events
Uncovering hidden connections between past and future events is a common theme in much of Ning's work.
In addition to the current ongoing fake news detection project, Ning and her students are developing models to identify evidence-based predictive patterns in social media and healthcare data.
One such study concerns analyzing historical data to identify and uncover hidden triggers that may serve as precursors to events of civil resistance, such as protests and strikes. The price of gas, corporate layoffs, or a fire at a nightclub, for example, may predict a rise in mass demonstrations.
“We do not make strong causal assumptions,” she said. “Instead, we want to find clues and signals that have higher predictive and interpretable values in machine learning models."
Implications for such studies are not isolated to civil disobedience alone. Ning says such a model can also be applied to local phenomena like crime and traffic and to critical security issues like airport threats.
A third, a more individual-centered prong of Ning's research focuses on health informatics.
One such project involves predicting future flu outbreaks based on past flu incidence records collected from the Centers for Disease Control, while another mines hospital digital health records—including inpatient visit history and doctors' clinical notes—to build a framework to predict when a patient is likely to return to the hospital and for what medical condition.
Whether focused on fake news, social action, or patient health, one of the biggest challenges Ning faces is the sheer complexity of data.
Because the underlying mechanisms and hidden structures and relationships behind events and fake news are unknown, developing a model to detect them requires making one of a near-infinite possibility of assumptions, testing those assumptions, and trying again.
"Data is complex and noisy. It's hard to pick out the correct signals and patterns for the events we are predicting or for the fake news we are trying to detect," she said.
"Developing machine learning models is definitely interesting and requires lots of work and lots of exploration."
Transparency in information—and in the systems behind it
National borders and language barriers, however, do not limit the problem of fake news. So one possible next step for the project is to approach the problem from a multilingual learning angle, searching for common structures hidden within content in such languages as Chinese, Spanish, or Portuguese.
"Without a concern for a specific language," Ning says, "we can [potentially] extract common patterns that are happening in all the languages."
Ning is quick to emphasize that her objective in all projects is not merely to make predictions or detect patterns, but to identify the interpretable factors that are relevant to explaining the results—to make clear just why and how her machine learning models work.
Her desire to shine a light on truth and transparency, therefore, applies not only to publicly disseminated content, but also to the systems that underlie it, including the ones she herself builds. Doing so, she said, will ultimately benefit society and improve public trust.
"Most machine learning models and deep learning models, they kind of look like a black box for many practitioners and general users. But I think we need to enhance transparency in this field to increase trust and fairness in the AI and machine learning communities," she said. "That's why most of my projects include interpretability—find evidence for your predictions, find the precursors of your predicted events. Overall my goal is to improve transparency and fairness and interpretability in machine learning and AI algorithms."
Learn more about computer science at Stevens: