Stevens News / Research & Innovation

Stevens Launches Laboratory for Artificial Intelligence in Mathematics Education to Drive Breakthroughs in AI and Math Education

The lab has already set the largest university-level mathematics benchmark of its kind

Artificial intelligence (AI) is transforming learning, including in higher education, but university-level mathematics reveals the limits of large language models (LLMs). Stevens Institute of Technology aims to advance AI in mathematics through its newly established Laboratory for Artificial Intelligence in Mathematics Education, setting new standards for higher education.

Alexei Miasnikov, director of the Laboratory for Artificial Intelligence in Mathematics Education, professor in the Department of Mathematical Sciences and co-founder and scientific advisor at Gradarius explains that LLMs must advance beyond computations to support logical deduction, theoretical understanding and advanced problem-solving — skills essential for decision-making and scientific progress.

"While LLMs have shown impressive reasoning abilities, their capacity for mathematical reasoning remains a critical area of focus," he said.

Located on Stevens’ Hoboken, New Jersey, campus, the lab aims to enhance LLMs’ mathematical reasoning and their applications in education, bridging the gap between human expertise and AI’s potential in math education.

"The lab’s work will help shape the future of AI applications in higher education and beyond by refining tools, exploring AI integration in classrooms, and investigating how AI can solve mathematical problems,” said Jan Cannizzo, associate chair for Undergraduate Studies.

"The work of the lab seeks to fundamentally transform mathematics education by developing AI-driven tools that adapt to individual learning styles, provide personalized instruction and offer new insights into mathematical reasoning,” said Michael Zabarankin, professor and chair of the Department of Mathematical Sciences.

As the lab continues its work, it aims to secure large-scale research grants and leverage industry partnerships for breakthroughs in thoughtful, responsible AI and math education.

Elevating the standards of AI in mathematics education

One recent achievement of the lab is the development of U-MATH, the largest university-level mathematics benchmark. Results were published in U-MATH: A University-Level Benchmark for Evaluating Mathematical Skills in LLMs. Miasnikov adds: "This benchmark underscores nuances in academic mathematics education. Its multimodal approach and inclusion of a meta-benchmark highlight that even LLMs evaluating LLMs remain an unresolved challenge. We’ve all lived through the initial wave of AI excitement and are now beginning to understand its true nature — not as a magic pill, but as a powerful tool with specific strengths and limitations."

Portrait of professor Alexei MiasnikovAlexei Miasnikov will serve as director of the new Laboratory for Artificial Intelligence in Mathematics Education.

U-MATH is a crucial step in using quantifiable metrics to assess how LLMs can be effectively applied in education. It includes more than 1,000 problems validated by Department of Mathematical Sciences faculty, including Cannizzo; Andrey Nikolaev, teaching associate professor; and Paul Schwartz, lecturer; with contributions from Chloe Weiers, an algebraic cryptography Ph.D. candidate.

The benchmark reveals that even advanced models such as GPT-4o struggle with more than 50% of university-level problems, underscoring the complexity of mathematics education. By contrast, domain-specific models such as Gemini 1.5 Pro and Qwen2.5 have shown promise in areas like visual reasoning.

Addressing the limitations of LLMs goes beyond data, explains Vlad Stepanov, CEO of Gradarius. "We can no longer live with a feeling that LLMs can't do this and that, but give them several more petabytes of data and everything will be solved. We're approaching the limits of what's achievable with LLMs and to use them correctly, we need to understand the limitations pretty well."

"We are confident this comprehensive benchmark will elevate the standards of LLM performance evaluation," said Olga Megorskaya, CEO at Toloka AI, a company that provides data for LLMs. "This benchmarking approach can be applied to any niche topic, ensuring high-quality performance and accountability among LLM developers and fostering trust and integrity in AI technologies."

Launching the lab

The lab, which will leverage advanced computing resources in collaboration with industry partners Gradarius and Nebius AI, will be officially launched on Feb. 19, 2025 at an event that will include a public lecture and demonstrations.

"Nebius AI and Gradarius are thrilled to launch this ambitious endeavor with our long-time partner, Stevens Institute of Technology," said Stepanov. "Partnering with Stevens ensures educational integrity and remains a priority."

Learn more about academic programs and research in the Department of Mathematical Sciences:

Related Stories

Two people stand behind a podium wearing business clothes. On the podium it reads "Order of the Engineer." The two people are shaking hands through a large ring that is on the podium.
July 10, 2025
Stevens Inducts Graduating Seniors into the Order of the Engineer at Spring 2025 Ceremony
July 07, 2025
This Year Marks the 200th Anniversary of America’s First Locomotive, Built by Inventor John Stevens Whose Family Founded Stevens Institute of Technology
July 03, 2025
From Stevens to Peloton: Aaryan Yekkisetty’s Journey into the Future of Fitness Tech
July 01, 2025
Stevens Hosts First Three Minute Thesis Competition — Big Ideas for a Broad Audience
More Engineering & Science News

Stevens News

aerial view of Stevens campus and NYC skyline across Hudson River
June 11, 2025
Stevens: Innovation Rooted in Purpose
Welcome to Stevens, Class of 2029!
July 09, 2025
Stevens President Discusses AI's Impact on Job Market in CBS News Interview
April 29, 2025
Why Students Thrive at Stevens: Creating a Student-Centered University Experience
All Stevens News