Research & Innovation

Stevens Professor Develops First Deep Learning Model to Predict High Concentration Antibody Viscosity

The convolutional neural network surrogate model can accelerate early-stage antibody drug development by screening for elevated viscosity in a fraction of a second

Monoclonal antibody therapeutics are complicated to develop, complicated to manufacture and complicated to administer.

Beneficial in the treatment of a variety of diseases and conditions — including cancer, heart disease, multiple sclerosis, arthritis and COVID-19 — most traditional antibody treatments are injected intravenously (through a vein), requiring specialized medical staff, training and equipment.

Generally taking several hours, such treatments are available only in hospitals or other high-volume medical settings. In addition to the healthcare resources required, the process costs patients in the form of time, expenses and the potential loss of wages with every infusion needed.

A less costly, more convenient method of delivery would be if patients could self-administer antibody treatments subcutaneously (beneath the skin) using a simple syringe while at home.

But at concentrations high enough to reach therapeutic effect, antibodies — which are large proteins — tend to stick together. That stickiness, or high viscosity, can create a drug solution that is simply too thick to draw through a syringe needle.

This syringeability issue, however, is often only discoverable late in the drug development pipeline. Current methods for predicting its likelihood sooner are prohibitively slow and expensive, requiring the immense processing power of supercomputers running for days at a time.

Pin-Kuang LaiPin-Kuang Lai

To shortcut this process, Stevens Institute of Technology chemical engineering and materials science assistant professor Pin-Kuang Lai has developed DeepSCM, the first deep learning model specifically designed for predicting high concentration antibody viscosity.

Requiring only antibody sequence data as input, the convolutional neural network surrogate model can screen more than 1000 antibodies simultaneously in 10 seconds using only a consumer-grade laptop computer.

Lai's research was recently featured in the September 2022 issue of Staying Current: Formulation of Biopharmaceuticals, one of only 21 studies selected from nearly 100 journals highlighting the most significant in biotechnology-based formulation and process development research. Staying Current is read by top pharmaceutical scientists at the largest pharmaceutical and biopharmaceutical companies worldwide.

By facilitating such rapid identification of antibody viscosity early in the drug development process, DeepSCM holds the potential to accelerate new antibody drug research while reducing patient and healthcare system costs, increasing convenience and efficiency, and improving patient quality of life. 

A concentration problem

An ideal antibody drug candidate for subcutaneous injection contains a high enough concentration of antibodies to be medically beneficial at a low enough viscosity for easy syringeability.

But in the earliest stages of drug discovery, pharmaceutical companies aren't yet looking at the potential pitfalls of a drug candidate's manufacture: rather, they're concentrating on the most basic question of whether a candidate has the potential for therapeutic effect.

If a candidate shows no signs of medical benefit, after all, there's simply no point in developing it.

The amount of genetic material researchers work with in the discovery stage is very small. Mammalian cells, which are needed to produce antibodies, are expensive and time-consuming to produce, needing several days to grow in order to collect only a small amount of antibody material.

But most antibodies have low viscosity at low concentrations, such as those early researchers are working with. Protein stability issues (including high viscosity) generally only reveal themselves at high concentrations. And high concentrations are necessary if a drug is to actually work.

"The volume of a typical syringe is very small, only 1 to 1.5 milliliters. So the antibody concentration inside the syringe needs to be high in order to support the whole human body," said Lai.

Although not all antibody drugs will exhibit stability issues, Lai explained, with certain drug candidates "their viscosity increases exponentially at high protein concentrations."

By the time a drug candidate has reached a stage where it has been developed at concentrations high enough to exhibit the problem, however, years of research and billions of dollars may have already been spent. If that candidate then proves to be unsyringeable, even more research, time and funding would be required to make it viable for release.

Moreover, had researchers known earlier in the process that elevated viscosity was likely to be an issue, they could have chosen to focus their efforts on a different drug candidate entirely — one less likely to exhibit stability issues at all.

"This is why a computational tool that can predict this behavior, using protein sequence only, can help pharmaceutical companies do drug screening before they produce all this material at high concentrations," Lai said. 

A sticking point

To determine which antibody drug candidates have the best chance for syringeability at high concentrations, drug manufacturers need to know which antibodies are likely to exhibit high or low viscosity — and the sooner in the process, the better.

But if only low concentrations of materials are available in discovery, how can researchers predict the viscosity of a high concentration antibody drug solution they haven't even invented yet?

The answer lies in electrostatic charge.

Antibodies are protein molecules shaped like the letter Y. At the top ends of the diagonal arms of the Y are what's called the variable fragment (Fv) regions. Patches of these exposed Fv regions carry an electrostatic charge. The way antibody sequences interact with one other as a result of this charge is a major contributing factor to an antibody's viscosity at high concentrations.

Most antibody regions at formulation conditions, said Lai, carry net positive charges. When patches on these Fv regions are positively charged, antibody sequences will repel each other (i.e., not clump together), resulting in a low viscosity drug solution.

But because opposite charges attract, Fv region patches that are negatively charged will stick to the patches of antibody sequences that are positively charged, clumping together to create a drug solution with higher viscosity.

The more negatively charged patches there are, the higher viscosity the resulting drug solution is likely to be. Above a certain viscosity threshold, no amount of force administered by a patient will make that liquid syringeable.

Thus, by measuring the extent and magnitude of negatively charged patches on the surface of these antibody Fv regions — a measurement called the spatial charge map (SCM) score — researchers can predict the statistical likelihood that an antibody drug candidate will prove prohibitively viscous when manufactured at high concentrations.

As a proxy for viscosity, SCM score is "not perfect," Lai admitted, with an accuracy for predicting elevated antibody viscosity of approximately 70%. But, he said, it is a good indicator for distinguishing promising drug candidates from more problematic ones, based on electrostatic charge alone.

"If the SCM score is above 1000, this is a good threshold to separate high viscosity antibodies from low viscosity antibodies," Lai said.

Current methods for calculating SCM scores, however, are "computationally very expensive," he said. 

Based on molecular dynamics simulations that require antibody structure information, SCM calculations can take two to three days to run per antibody. Such simulations also require training and access to supercomputers, which can cost hundreds of millions of dollars to purchase, run and maintain.

"If we have 100 drug candidates, and we want to do screening for all of them, it takes several months to do the screening," Lai said. "It's not efficient, and pharmaceutical companies might not have supercomputers in their research group."

Additionally, the element of randomness inherent to the simulations means their results are not precisely reproducible, generating statistically similar, but slightly varied, SCM scores with every run.

The molecular dynamics simulations that underlie the original SCM model, Lai said, are a counterproductive bottleneck for developing an efficient and cost-effective way to predict antibody viscosity.

To push through this bottleneck, Lai turned to the neural networks of deep learning to develop a way to bypass the need for these resource-heavy calculations.

To generate sufficient data to develop and train the DeepSCM surrogate model, however, he first needed to conduct his own computationally expensive molecular dynamics simulations. 

Going deep

Using a data set of 6596 nonredundant antibody Fv sequences that he'd processed for length and suitability, Lai generated 3D protein structure models called homology models for each of the antibody sequences. These models extrapolate a target protein's structure based on how well its amino acid sequence resembles existing templates.

Molecular dynamics simulations were then run for each of these homology models, calculating an SCM score for each of the antibody sequences.

Lai's process up until this point parallels the one researchers must follow to calculate SCM scores via the traditional machine learning SCM model.

But rather than continue to repeat this process for every future antibody whose biophysical features need to be predicted, Lai set out to apply this gathered data to developing a deep learning shortcut to the process.

"In the beginning I needed to build these structure models and run molecular dynamics simulations because you need data to find correlations and features in order to develop a surrogate model," Lai explained. "But once you build the surrogate model, you no longer need to run the simulations."

Leveraging 1D convolutional neural network architecture, deep learning algorithms developed and trained the DeepSCM surrogate model using the thousands of preprocessed antibody sequences as input and their SCM scores obtained from the molecular dynamics simulations as output.

Whereas traditional machine learning would require researchers to define the features that connect the one to the other, the advantage of deep learning, Lai said, is that the neural network will uncover these hidden features in the training data on its own.

Once trained, the DeepSCM surrogate model can apply the patterns it discovered to predict an antibody's SCM score and likelihood of high viscosity in a fraction of a second using only antibody sequence data — no lengthy simulations, protein structure models, or expensive equipment required.

With no random variables to contend with, results generated by DeepSCM are also identical every time.

"Our deep learning model, it's like an algebraic equation: one input and one output," Lai explained by way of comparison. "So we always get the same result with the same input sequence. We can get exact reproducibility.

Applied to a test set of 1320 sequences not included in the training data, the DeepSCM surrogate model achieved a statistical correlation of 0.9 (on a scale from 0 to 1) compared with the original SCM model.

This strong correlation shows DeepSCM is an effective surrogate for the original SCM model, achieving equivalently accurate results for screening high concentration antibody viscosity.

"DeepSCM is a close to perfect model," Lai said. "Because of this high correlation, we can use DeepSCM to substitute for the original model, but the DeepSCM model is so much faster."

In addition to analyzing DeepSCM's performance as an appropriate surrogate, Lai tested its performance as an antibody viscosity screener overall.

Evaluated using 38 therapeutic monoclonal antibodies that the original SCM model had already predicted correctly, DeepSCM screened the antibody viscosity level correctly of all but one.

Accelerating drug development

Lai has made the code and parameters for DeepSCM freely available on GitHub. His research was published in Computational and Structural Biotechnology Journal in March 2022. 

Despite DeepSCM's high success, Lai said, "we'll continue to work on this so that our model can be further improved."

He also plans to build off this research to explore applications beyond viscosity prediction.

"There are different stability properties of antibodies. Viscosity is one," he explained. "Aggregation, solubility, immunogenicity, bioavailability — these are all important biophysical properties to be considered for developing a new antibody drug. I think we can apply this protocol to develop other deep learning models to predict these properties."

Lai, who joined Stevens last August, noted how his work aligns well with his department's Application of Machine Learning to Pharmaceutical Development Graduate Certificate program. This program, which launched in 2021, focuses on helping students develop predictive computational techniques and skills to enhance the drug development landscape.

"Traditionally, before machine learning, the research and development timeline for one drug is usually over 10 years, and you'd need several billion dollars to develop it. And the success rate for developing a new drug is low," he said. 

But combining pharmaceutical research with artificial intelligence and data science, he said, is "a gamechanger" for advancing modern medicine and healthcare — and can capitalize on resources already available.

"Pharmaceutical companies have collected a lot of data over these past decades," Lai said. "We can take advantage of that data using machine learning techniques to develop models that facilitate the prediction of stability or other drug properties so researchers don't need to go through time-consuming and costly experiments."

Learn more about academic programs and research in the Department of Chemical Engineering and Materials Science: