"Linguistics is kind of my weakness."
Those are the humble words of Doris Shao, a Ph.D. student and I&E Fellow at Stevens Institute of Technology. A student in the Department of Electrical and Computer Engineering, Shao is a data and machine learning researcher—with a focus on linguistics. "I’m much more of a visual person," she says, admitting she’s "better with pictures. Linguistics is my kryptonite."
Shao is a student of professor Rajarathnam Chandramouli. Before coming to Stevens to earn her master’s degree, she worked as a software developer in her native China for three years. The work "was not challenging at all," she says. "I changed jobs and every time I switched I was more passionate because it was new. I spend more time learning more things… but when you know something very well you just consider it routine. That was not challenging."
She wanted to do more research and work on more interesting tasks like plotting data. She took her first big step toward accomplishing that goal by applying to a Canadian university for a Ph.D. Unfortunately, she was "rejected twice because my English TOEFL score wasn’t high enough," she says.
Rather than accept that setback, Shao set her sights on improving her TOEFL score. She studied hard and applied to Stevens. She was not only accepted, but found herself in Chandramouli’s class—with an unprecedented opportunity to improve her weakness.
"I consider it fate," she says.
Now, Shao works alongside him and fellow professor Koduvayur Subbalakshmi, collecting and analyzing linguistic data to help machines spot neurological disorders like Alzheimer’s.
Data Mining to Detect Early Alzheimer's Patterns
Neurological disorders are difficult to diagnose during their early stages because they may resemble forgetfulness or the normal aging process. That’s why Chandramouli and Subbalakshmi are building an algorithm based on natural language processing and machine learning to accurately and inexpensively warn patients and physicians about these disorders in the earliest stages when the most treatment is available.
Shao is helping refine that algorithm.
"In America, we have a large population of Alzheimer’s patients," she says. "In developing countries, 75% of the population who have Alzheimer’s and dementia are un-diagnosed. We want to help all people know when they have [one of the diseases] at the beginning, when they feel slightly different."
The way the Stevens researchers are building awareness relies on data mining techniques—specially, mining communication patterns from social media. "We’re always talking or interacting with people through emails or daily social media," Shao explains. "It’s convenient to use these communications to detect triggers." They hope to focus on Facebook in particular.
Having an algorithm that can mine real-time social media updates to detect degenerative speech patterns is the end goal of the project. Right now, they’re using clinical data to train the algorithm via machine learning on those patterns for Alzheimer’s, dementia and aphasia patients.
The algorithm is looking for linguistic cues and performing content analysis on them to determine patterns. "If we’re talking to healthy people, they might have a lot to talk about," Shao explains. "When they describe a picture, the pictures have a lot of detail. Alzheimer’s patients are more likely to repeat certain features rather than embellish." Other statistics include data usage, the type of vowels and pronouns used and how long a person spends trying to describe the language."
Shao is well on her way to honing the algorithm, as it’s got over 80% accuracy for dementia and Alzheimer’s data, and over 90% for aphasia patients. The latter was higher "because it’s more linguistic," Shao explains, "there were more markers for the machine to learn from." The team is refining the algorithm and may apply for a patent on the technology soon.
"The Forefront Of Cutting Edge Analytics"
Shao was recently selected to speak at the AT&T Labs Graduate Student Symposium. It was "my first conference engaging with my peers," she says, "I am excited, but still nervous." She spoke about another project she’s working on with Chandramouli: comparing 880 different machine learning schemes for Alzheimer's, aphasia and dementia patients. "These are different combinations of several machine algorithms and three feature sets," she explains, noting that the datasets were collected by other organizations. The goal of analyzing this data was to see if machine learning and language processing techniques could accurately match early-detection patterns of each disease—which they did. "It’s the first time this data has been compared this way," she says.
Impressive as all of that is, colleagues are also noticing Shao’s efforts.
"Doris has been instrumental in pushing the innovation barrier in this dynamic yet complex domain," states Constantine Boyadjiev, Head of Accenture’s North American Fraud & Risk Analytics Practice. Boyadjiev, also acting as Doris’ dissertation committee member, adds: "Doris’s research has been on the forefront of cutting edge analytics, and by developing powerful algorithms at the intersection of audio signal processing, NLP, and deep machine learning, Doris has helped break new ground in what has been a largely underexplored space. Her academic work has real life relevance, for example in battling today’s fraud and misconduct pandemic, with applicability across a number of industries and clients, from Financial Services through the Government/Public Sector arena."
This semester, Shao is doing audio analytics. That means she’s studying "emotion in voice—spoofing fake voices, synthetic voices and voice conversion from male to female," she says. In order to do that, she’s segmenting chunks of speech examples for the algorithm to learn from.
While she graduates in May 2018, she is hoping to continue her research in this area in a post-doctoral study.
"I hope to amend my weakness"