Home
/
Latest news
/
Research developments
/

Ll ms struggle with basic graphs: a stem exam test

AI Models Flounder on Basic Graph Reading | A Wake-Up Call for Education

By

Emily Lopez

Nov 26, 2025, 04:10 AM

Edited By

Carlos Mendez

Updated

Nov 27, 2025, 04:18 AM

2 minutes needed to read

A student looking puzzled at a chemistry graph on a test paper, surrounded by textbooks and notes, illustrating the difficulty of interpreting graphs in exams.

A recent evaluation of leading language models reveals a concerning inability to accurately interpret graphs in STEM exams. Models like GPT-5.1, Gemini 3, Qwen, and oPUS 4.5 tripped up on fundamental graph tasks, raising serious concerns about their role in educational settings.

The Test Findings: A Clear Gap

Researchers tested these top LLMs on high school-level physics and chemistry exams, focusing on their graph reading abilities. The results were alarming. The models failed to correctly interpret important details, consistently misreading equivalence points and critical data.

  • One task involved analyzing a reaction progress graph with a tangent at t = 60h.

  • Another challenged them to interpret a pH vs. titrant volume graph to find an equivalence point.

For example, the models misidentified the equivalence volume as 10 mL instead of the correct 25 mL. This misunderstanding led to inaccurate calculations of half-life and reaction rates, turning an otherwise simple 20-to-20 exam into a disappointing score of around 14.

"This sets a dangerous precedent," one concerned forum commenter noted, reflecting a growing unease among users.

Mixed Reactions from the Community

Responses on various forums highlighted users' frustrations. Many pointed out that even advanced multimodal LLMs still struggle with visual data interpretation.

  • "Those clustered lines make it tough for them," one user shared.

  • Another added, "These models are blindβ€”failing to interpret images accurately."

These insights highlight a stark difference in human and AI graph-reading skills. One individual recalled trying to get insights from AI on a trading app, "I asked where the vwap was, and it couldn't even find where the line ended! I was like, what?" Such comments underline the urgent need for improvements in AI capabilities in educational applications.

Challenges and Opportunities

As the demand for AI tools in education grows, the shortcomings in interpreting graphs raise questions about their effectiveness. Advocates argue the models' limitations could hinder advancements in STEM learning.

Looking ahead, developers seem poised to refine AI’s visual data interpretation skills. Sources indicate there’s a significant possibility of improvements focusing on these gaps, with a potential 60% chance of meaningful updates in just a few years. Leveraging specialized training datasets could also help in transforming these systems into more reliable educational tools.

A Step Back: Lessons from Tech History

The situation takes a nostalgic twist, much like when smartphones first struggled with touch accuracy. Early tech failures often prompt quick adaptations and refinements, and AI developers may need to take a page from that playbook. Misinterpretations are reminiscent of the unreliable touchscreens from the past, but history shows that dedicated effort can turn shortcomings into strengths.

Key Insights

  • β–³ LLMs demonstrate significant struggles with basic graph tasks.

  • β–½ Misreading leading to critical calculation errors is prevalent.

  • β€» "They struggle to follow lines," commented an active forum member.

As we revise our understanding of how AI can aid educational systems, the need for clearer and more accurate visual data processing becomes urgent.