Can Generative Chatbots that Err be Trusted Teachers?

Maybe, if learners can adopt effective methods of use, says Tiffany Li

Tiffany Li focuses on the intersection of human-computer interaction and education technology

Hoboken, N.J., November 20, 2025 — Personalized learning is a very effective teaching method, but its potential is limited due to resource constraints. In a small, in-person class, instructors can walk around, engage with students individually, adjust lessons and adapt their teaching to match each learner’s needs. However, as class sizes grow — or in online courses with hundreds or even thousands of learners — this level of personalization becomes difficult or impossible.

That’s where pedagogical chatbots come in, says Assistant Professor Tiffany Li, whose research focuses on the intersection of human-computer interaction and education technology. Pedagogical chatbots, which have become more capable and widely available with the rise of generative AI, promise to transform personalized learning at scale by providing individualized, on-demand support to students. For instance, when a learner is struggling to understand a concept, a chatbot can respond to them, answering questions and addressing confusion in real time. And, unlike a single instructor, a chatbot can do this for countless students simultaneously. Chatbots can also be very valuable for learners who prefer to study at their own pace, or outside of traditional classroom settings.

However, these chatbots aren’t without fault, Li notes. Sometimes they make mistakes, provide incomplete information, invent facts, or misconstrue logic, which may harm their users’ learning. To ensure the deployment of these chatbots does not bring harm to learners, Li and several collaborators conducted a systematic study to find out the extent to which learners are able to detect factually incorrect information provided by a chatbot, and where and why learners struggle with this process.

First, Li and her team built a pedagogical chatbot for learning introductory statistics and situated it in a learning environment similar to that offered by online learning platforms like Coursera. While building the chatbot, they preprogrammed it to make some errors in its answers to see whether learners using it would be able to detect them. “The chatbot would make a specific factual error that we designed relevant to that topic the learners were learning,” says Li.

Then, the team recruited 177 participants, including college students and adult learners. As participants worked through practice problems to achieve learning goals and in the learning session, they interacted with the chatbot as they saw fit. They were given simultaneous access to additional resources, such as an online textbook and access to search engines, to allow verification of chatbot responses and accommodate diverse learning needs. Participants were instructed to report any issue they noticed with the chatbot responses using a button underneath every chatbot response. The team also provided an incentive for verifying chatbot responses. “We encouraged participants to report problems by providing a small monetary bonus,” Li says.

Tiffany Li discusses whether generative chatbots can be trusted teachers and whether users can learn to spot their errors.

The team found that on average, learners had only about a 15 percent chance of successfully reporting chatbot errors, despite the incentive and the tools to double-check the answers. The incorrect information affected their learning. “When participants encountered chatbots errors, their average accuracy on the practice problems ranged from 25 to 30 percent,” says Li. That differed significantly from the control group that didn’t encounter chatbot errors. “Participants in the no-error group did much better. Their average accuracy on the same practice problems ranged from 60 to 66 percent. So it's a huge difference.”

The team outlined their findings in the paper titled Can Learners Navigate Imperfect Generative Pedagogical Chatbots? An Analysis of Chatbot Errors on Learning, which they presented at the Conference on Learning @ Scale.

The team also investigated the reasons for this phenomenon and found that there were more than one. Sometimes participants relied on verification strategies that were less effective for novices, such as conditioning the verification of chatbot responses on their prior knowledge of the subject. “If they thought there was no conflict between the chatbot and their knowledge, they felt there was no need to check. But this may not work for those with little or incorrect prior knowledge,” Li says.

Sometimes participants felt that the chatbot knew better, especially when they were new to the topic, so they accepted answers without verifying. “Chatbots can sound very eloquent and confident, which makes them appear very competent — and that induces trust,” Li explains. “Participants who had a lower level of trust in the chatbot reported the chatbot’s error correctly more often.”

Li’s team also found that certain types of learners were more affected than others. People with less prior chatbot usage experience and non-native English speakers were less likely to report chatbot errors. And participants with less prior knowledge of the subject and female participants experienced greater drops in their practice problem performance when they encountered chatbot errors. “These groups were more vulnerable to chatbot errors than others, and that means that we should potentially develop personalized interventions or support for certain groups based on why they struggle more,” Li says.

In the meantime, Li suggests instructors should be a little more cautious about whether and when they should deploy these chatbots in their specific classes. “Maybe the chatbot shouldn't be used when students are trying to understand the concepts for the first time,” she notes. “But perhaps it could be used later in the course when students have learned the basics and have certain prior knowledge and are preparing for the test. So building up students’ prior knowledge before exposing them to the chatbots, may be one of the mitigating strategies for now.”

About Stevens Institute of Technology

Stevens Institute of Technology is a premier, private research university situated in Hoboken, New Jersey. Since our founding in 1870, technological innovation has been the hallmark of Stevens’ education and research. Within the university’s three schools and one college, more than 8,000 undergraduate and graduate students collaborate closely with faculty in an interdisciplinary, student-centric, entrepreneurial environment. Academic and research programs spanning business, computing, engineering, the arts and other disciplines actively advance the frontiers of science and leverage technology to confront our most pressing global challenges. The university continues to be consistently ranked among the nation’s leaders in career services, post-graduation salaries of alumni and return on tuition investment.

Stevens Media Contact
Lina Zeldovich
Manager of Media Relations
Division of University Advancement
[email protected]
201-216-5123

Academics

Undergraduate Study

Discover Stevens

The Innovation University

Student Life

New Students

The Stevens Experience

Supporting Your Journey

Research

Admission & Aid

Undergraduate Admissions

Graduate Admissions

Tuition and Financial Aid

Veterans and Military

Can Generative Chatbots that Err be Trusted Teachers?

Academics

Undergraduate Study

Discover Stevens

The Innovation University

Student Life

New Students

The Stevens Experience

Supporting Your Journey

Research

Admission & Aid

Undergraduate Admissions

Graduate Admissions

Tuition and Financial Aid

Veterans and Military

Can Generative Chatbots that Err be Trusted Teachers?

Related Stories

Stevens News