# Big Data Meets Quantum Physics

Rupak Chatterjee uses the advantages of quantum systems to address classical machine learning issues and explains why the future is quantum.

Traditional computer systems based on classical physics can process only so much data so quickly and with so much complexity. While classical computers are technically capable of analyzing massive data sets, doing so can be prohibitively slow, expensive, inefficient, and limited by existing physical storage capacity. All these issues hamper technological innovation.

Stevens Institute of Technology physics professor Rupak Chatterjee develops quantum-engineered systems to address such problems in big data analytics. His primary research focus is on using quantum systems for machine learning algorithms.

Chatterjee presented an overview of his research, as well as his philosophy of approach, last month as part of the School of Engineering and Science's Virtual Research Forum. This weekly presentation series was launched in response to the COVID-19 pandemic to foster engagement and cross-disciplinary collaboration.

Currently leader of the Quantum Big Data Analytics Cluster in the Stevens Center for Quantum Science and Engineering (CQSE), Chatterjee joined Stevens full-time in the fall of 2012, having taught financial engineering and risk management as an adjunct professor the previous year. He is the former director of the Financial Engineering Division and an inaugural member of the Hanlon Financial Systems Center at Stevens.

With an educational background in theoretical physics and applied math, Chatterjee spent 15 years working as an analyst in the financial services sector, most recently as director of the Multi-Asset Hybrid Derivative Research group at Citigroup. An unexpected opportunity to delve into quantum computing and machine learning led him back to academia.

"I've gone full circle, from a physics research post, then Wall Street, then back to financial engineering, and now I'm back to the physics department," he said.

### The quantum advantage

To fix the problems of classical computing requires more than simply throwing more hardware at them.

"You could use as much hardware as you want, but the speed-up eventually is just not that fast," Chatterjee said. "Two PCs is not necessarily going to be twice as fast. Sometimes these problems don't scale in that fashion."

Quantum computer systems, however, which are based on quantum rather than classical physics, offer inherent advantages over classical computer systems: namely, speed, storage capacity, and the ability to handle high amounts of complexity.

There are some practical factors of scale at play. Because the particles underlying a quantum system—such as photons or electrons—are atomically or subatomically small, they offer the ability to pack more data into a smaller physical space. A quantum computer can simply hold more information than a classical computer of the same physical size.

But certain underlying principles of quantum physics also contribute to the advantages.

The principle of superposition, in which a thing—such as a basic unit of information—can exist in more than one state at a time, allows a quantum system to hold and represent even larger volumes of data simultaneously.

"Classical computers have to hold the state of a system one by one, such as binary bits of data represented by 1s and 0s. If you flip a coin, that state is going to be either heads or tails," Chatterjee explained. "But a quantum system can hold those states simultaneously. For instance, because an electron is not in any one state, it could exist simultaneously in many of those states with a certain probability. The coin can be heads *and* tails."

These larger volumes of data can then be manipulated simultaneously through a concept known as quantum parallelism. By being able to compute and analyze different states of data at the same time in parallel, rather than one at a time, quantum systems can produce results at vastly accelerated speeds.

The third major piece of the puzzle lies in quantum entanglement, in which the state of a thing—such as a bit of data or a particle—is inextricably linked to all the other things it interacts with. This phenomenon increases computational complexity, allowing quantum systems to uncover correlations in highly complex structures that classical data analysis methodologies can't.

"If there's a problem that can be done easily with classical computers, we're not going to use quantum computers for that," Chatterjee said. "These (quantum machine learning) problems are chosen because they have a certain amount of complexity and time challenges such that the quantum process can assist in getting the answer."

### Developing better machine learning algorithms

At its most basic, Chatterjee says, machine learning is the search for patterns.

"You have data on all sorts of things in the world—visual data, numerical data, stock market prices—and if you take many examples of a certain data type, or data that looks very complicated, you may not be able to immediately find patterns in it," he said. "Statistical algorithms can identify and find if there is any hidden information or pattern in the data."

One way in which Chatterjee seeks to advance such pattern-seeking is by developing quantum algorithms that speed up or otherwise improve some aspect of classical machine learning.

In collaboration with physics professor and department chair Ting Yu, Chatterjee has developed a quantum approach to support vector machine algorithms using quantum optics (interactions between photons of light and matter).

A support vector machine algorithm is a type of supervised learning model that classifies input data into one of two preexisting choices. When given new information, the model then attempts to predict how to classify this new data based on its knowledge of classifying the previous data.

Such algorithms are used to separate data based on certain features into binary categories, such as whether certain students are likely to be admitted into a college (yes or no) or whether a particular aerosol that has appeared suddenly in the atmosphere is safe or dangerous.

By embedding data into a quantum photonic system, rather than a classical one, Chatterjee argues this quantum approach results in faster calculations of increased complexity and improved scalability.

A related project involves using superconducting processors to develop and test improved quantum k-means clustering algorithms.

Rather than classifying data into one of two known binary options, a clustering algorithm groups data together according to similar features. Using what's called an unsupervised learning model, in which you don't already have preexisting knowledge of what the possible end results could be, this type of algorithm is useful for predicting answers to open-ended questions—such as those you might pose to a query-based virtual assistant like Siri.

Additional uses for such algorithms include customer segmentation for targeted marketing and identifying and predicting existing and probable clusters of regional phenomena, such as crime, natural disaster incidence, and the spread of infectious disease.

"Clustering can give you groups of features that cluster together that you weren't expecting," Chatterjee said. "They can deliver something that's not obvious that you wouldn't know to even look for. For instance, by clustering data, you might suddenly discover that, say, 12-year-olds and 72-year-olds like the same type of large iPhone."

Employing a quantum distance formula rather than traditional Euclidean distance (i.e., the shortest distance between two points is a straight line), Chatterjee was able to develop an algorithm that not only delivered results faster than a classical algorithm, but also resulted in improved accuracy of the resulting clustered data.

The results of both projects have been published in two papers in *Quantum Information & Computation.*

### Generating truly random numbers

A vital application of mathematical models is the ability to conduct simulations—and to do so quickly and efficiently.

One example of immediate concern, Chatterjee notes, is trying to simulate how the SARS-CoV-2 virus will mutate and affect different areas of the population. Chatterjee is currently working with an undergraduate student who has undertaken developing such simulations.

Simulations rely on certain rules based on random processes in order to run. "You're using probabilities to simulate the future. That’s why it's called a simulation," Chatterjee said. "To simulate these probabilities, you need random numbers."

Generating random numbers, however, is a slow and complex process. So slow and complex, in fact, that companies exist solely to generate and sell random numbers on a large scale.

But most of the random numbers you hear about, Chatterjee says, are not actually random at all.

While seemingly random numbers have been generated by classical computer systems for decades, in reality these numbers are based on mathematical formulas. Given enough time and capacity, a rival computer system could eventually unlock the pattern of these pseudo-random numbers—a potential problem for high-security military cryptography.

"Sometimes in different protocols of encryption, you need to generate a certain random number. If you use pseudo-random numbers from a pseudo-random number generator, and I know the initial seed (number) used to generate those random numbers, I can get exactly the same sequence of 'random' numbers as you did," Chatterjee said. "So true random numbers that cannot be replicated by your enemy is of importance."

Additional applications for truly random numbers include statistical sampling, financial transactions and e-commerce, and cybersecurity.

Quantum systems, however, have the ability to generate truly random numbers, thanks to the fundamental principles of randomness and uncertainty that underlie quantum physics.

In collaboration with CQSE director Yuping Huang, Chatterjee performed simulations using photonic systems developed by Huang's lab to demonstrate the advantage of quantum-generated true random numbers over classical pseudo-random numbers.

His research also showed that, in addition to their true randomness, quantum-generated numbers are generated far faster and more efficiently than those created by classical systems.

The results of this study were published in *Physica A: Statistical Mechanics and its Applications* in 2019.

### Quantum optimization for the risk-averse investor

A third major aspect of Chatterjee's research focuses on using a quantum approach to optimize financial asset portfolios to minimize risk.

In collaboration with assistant professor of business Ricardo Collado, Chatterjee's research combines financial engineering and quantum science to develop an algorithm that assembles a portfolio of stocks and bonds that delivers a desired rate of return based on a chosen risk profile. This optimization determines how much of each underlying class of asset will be included in the portfolio, as well as the portfolio's overall characteristics.

The main advantage of quantum over classical computing in this regard lies in the former's potential to perform the necessary complex calculations at an exponentially faster rate—a challenge Chatterjee experienced firsthand while working on Wall Street.

"We used to run massive amounts of computations that we would kick off on a Friday night, and keep it going until Monday morning, hoping that it would be done by then," he said. "But people want to know information quickly. Even 10 years ago, they couldn't wait over the weekend to figure out their risk numbers. You want results in five minutes, not five days."

### The future is quantum

While Chatterjee plans to continue his research addressing classical machine learning problems, in future he would also like to apply his knowledge to data generated by quantum systems themselves.

"Within the physics world, there's a lot of data that comes from quantum systems that has to be analyzed. It's a fairly new thing for a physicist to do quantum machine learning within their own quantum domain," he said.

General interest in quantum machine learning is high, including on the Stevens campus. Chatterjee noted that four separate groups of undergraduates he's working with over the summer have expressed interest in the discipline.

But he emphasizes that quantum computing is not some theoretical concept destined to be replaced by the next big technological fad. Rather, it has very real, practical applications—and, he predicts, represents the inevitable future of computing.

"This is going to sound biased, but the whole quantum world will eventually take over," he said. "There's practical reasons for that. Even with regular computers, they're already so small that they're almost at the quantum level. If they make the chips and components any smaller, they'll have to deal with quantum physics. So instead of avoiding it, you just go along with it."

That's not to say, however, that Chatterjee believes you'll be using quantum computing to read your email or post to Instagram within the next decade. Rather, he explained, computers will likely employ a hybrid system, combining classical physics for basic needs like reading email with quantum components for performing operations like computation.

He also sees a high likelihood for more quantum peripheral devices that will work in tandem with classical computers to perform specific quantum processes.

Chatterjee points to one of those quantum random number-generating companies as an example of this future already in motion. The company, he explained, provides their random number generation via a USB accessory. Although the device itself plugs into any average classical computer, he said, "inside you have quantum photons doing their magic."

**Learn more about Physics at Stevens:**