A Stevens team has developed a new explainable AI (XAI) framework that can detect fake news with high accuracy — and it depends partly on human expertise.
Working with doctoral students Mingxuan Chen and Ning Wang, Stevens professor and founding AI research center director K.P. "Suba" Subbalakshmi presented the new findings at the Association of Computing Machinery's prestigious KDD (Knowledge Discovery and Data Mining) TrueFact workshop in summer.
"The most important thing here is the modularity," notes Subbalakshmi, a professor of electrical and computer engineering. "For this work, we incorporated 'handcrafted' features that qualitative research in this area has pointed out as well as latent features. Different groups have studied a subset of these handcrafted features independently, but our method proposes a new XAI model that incorporates all these different features in a modular way so that their relative importance in detecting fake news from real news can be gauged.
"The key thing about our method is that it is modular, which means, if researchers would like to test new handcrafted features, they should be able to plug it into this model with minimum redesign."
Indeed, after testing the several versions of the AI on a large dataset of social media posts, the team found their best hybrid prediction model performed better than 11 existing fake news-detection systems in separating truth from lies.
"Basically, we found that a best-of-both-worlds approach, allowing for the addition of hand-tuned features, did in fact work best," says Subbalakshmi.
Three steps to detecting fake news
The first component of the team's AI simply scans and encodes all the written content into various positions in a 512-dimension matrix, using very basic characteristics of the words as a guide. That's done using the open-source Universal Sentence Encoder (USE) created by Google, one of the leading tools for doing so.
This matrix can be used alone to detect fake news, but they found that this “latent feature only” approach did not work as well as the hybrid approach which includes “hand crafted” features as well.
The second portion of the team's AI involves including handcrafted features, such as "ConcretenessScore" and "VBG" that measure frequency of verb use, richness of descriptiveness and other aspects of writing. The AI correlates those frequencies with potential truthfulness.
Importantly, the number of these variables can be individually tuned by a researcher as desired. Subbalakshmi's team found that this so-called "handcrafted" portion of the AI proved to be the most valuable and accurate of the three main components at helping separate truth from lies.
"There's always a debate, in AI, about whether a system should be entirely automatic and artificial, or whether human experts should be allowed to influence the mixture of factors," notes Subbalakshmi. "This study made a pretty strong case that the human influence is both necessary and beneficial."
The third class of features incorporated involves extraction, classification and analysis of details about the specific person or account sharing the information. (A Twitter account with very few or even no followers may be unlikely to be truthful, for example.)
"Is it a brand new social media account, or has it been around for a long time? Is the user's identity verified? How many friends does it have, how many 'favorites' has it made?" asks Subbalakshmi.
"That's what this part of the AI looks at, because these are some data points researchers have previously identified as potentially useful in detecting fake news."
Better transparency, enhancing refinements
As the system produces its predictions, it also shares transparency into its decision-making, in a way most AI systems do not. Subbalakshmi says this is critical because it helps researchers improve their methods.
"What's nice about our AI is that it can show you which factors are giving you the best performance at detecting truth for any input example," she explains. "For example it might tell you that, from all these handcrafted features that were examined, the richness of the language and the tone or sentiment of the communication were the two most meaningful factors in making a decision that this tweet was false.
"So if you find that some features consistently underperform, they can be removed in your next iteration of the AI to improve its accuracy."
In the next phase of the work, the team will begin incorporating image analysis into its fake-news system — a move that will step up the complexity of the AI, and the computing resources needed to make its predictions, but will also likely increase power and accuracy.
"There is no magic bullet for detecting truth; you are never going to attain a final victory, because the deceivers are always working to learn how to game the detection systems you are building," concludes Subbalakshmi. "It is always a cat-and-mouse game.
"But the system we have developed and explained here is a good next advance in the field, we believe, and we're very excited to be adding images to the mix next."