Connect with us

Hi, what are you looking for?

Tech & Science

AI hallucinations: Asking AI to perform math is the worst offending task

AIs are at their best when they help you edit the text that has been drafted, or rainstorm ideas, or are part of a game or role play.

This photo shows pupils in a primary school class using AI for maths lessons
This photo shows pupils in a primary school class using AI for maths lessons - Copyright AFP Matthieu RONDEL
This photo shows pupils in a primary school class using AI for maths lessons - Copyright AFP Matthieu RONDEL

Over a billion people are using AI in 2026, and many people do not limit themselves to the ubiquitous ChatGPT, instead trying other options instead. However, many tools still experience ‘hallucinations’, making up wrong data.

AI hallucinations occur when artificial intelligence systems generate outputs that are plausible but factually incorrect, fabricated, or not based on their training data.

Analyzing the trend of LLM use for daily tasks, a March 2026 report from Open Resource Applications compared which assignments users give to AI the most and which of them are most vulnerable to AI’s ‘hallucinations’.

This revealed that mathematical calculations are the easiest for AI to mess up, with an accuracy of only 0.38/1.

The study collected the most common tasks assigned to AI based on public records of generative artificial intelligence usage. To assess LLM models’ performance, the research matched each task category to the most relevant benchmarks, using datasets from MMLU-Pro, GPQA, IFEval, WildBench and Omni-MATH. The accuracy scores were calculated for each model and then averaged for each task. The study also includes the models that performed the best in each assignment.

The top 5 most difficult tasks for AI to complete are:

Everyday TaskBenchmarkAverage AccuracyBest Model
Mathematical CalculationOmni-MATH0.3861GPT-5 mini (2025-08-07)
Data AnalysisGPQA0.522Gemini 3 Pro (Preview)
Tutoring or TeachingMMLU-Pro0.67Gemini 3 Pro (Preview)
Health, Fitness, Beauty or Self-CareMMLU-Pro0.67Gemini 3 Pro (Preview)
Specific InformationMMLU-Pro0.67Gemini 3 Pro (Preview)

AI Is Bad At Math

Large Language Models (LLMs) are created to analyze and generate texts, and calculations are not part of their primary function. This is one of the reasons why AI is often wrong when given even the simplest math tasks. Most AIs score only 0.38/1 on the accuracy, meaning 2 times out of 3 the final result can be ‘hallucinated’. 

AI Cannot Perform Data Analysis With Incomplete Datasets

Data analysis includes inspecting, cleaning, and transforming the data, and while it seems that AI should be able to process it easily, only in 52% of the cases will AI give you the correct data. It happens because LLMs prioritize guessing the next logical token, a word or a number, in a longer sequence, rather than displaying the correct data.

AIs Cannot Be Your Teacher

While many digital users turn to AI for teaching, most language models score only 0.67 out of 1 on accuracy when it comes to learning tasks. The best model that can reliably give data or create a useful learning exercise is Gemini 3 Pro (Preview).

“Teaching is 100% about giving students correct information, and right now, most AIs cannot achieve that,” comments a spokesperson from Open Resource Applications.”LLMs’ output is often wrong when the data given to it is incomplete, or when the larger context is required.”

Health, Fitness, Beauty, and Self-Care Are Better Left For Professionals

Similar to teaching materials, most AIs score 0.67/1 for accuracy when it comes to health and beauty-related topics. Most of the time, LLMs will be able to search and summarize information from the Internet, but even one wrong source or a lack of data can lead to AI hallucinations that can be dangerous for users’ health.

AI With Come Up With Information Instead Of Admitting to Not Finding It

AI scores 0.67/1 on average for accuracy when it comes to specific information queries. When LLMs are given a niche topic with few sources or incomplete data, they will ‘predict’ the answer instead of admitting they cannot help. For most of these tasks, Gemini 3 Pro (Preview) showed better results than other language models, but no model was able to avoid making up information 100% of the time.

Dangers revealed

Although LLMs are a very useful tool, users need to understand their primary function and limitations. AIs are at their best when they help you edit the text that has been drafted, or rainstorm ideas, or are part of a game or role play.

Mathematics or medical fields can use AI only with professionals nearby who can check the work. Otherwise, users may end up with completely wrong data.

Avatar photo
Written By

Dr. Tim Sandle is Digital Journal's Editor-at-Large for science news. Tim specializes in science, technology, environmental, business, and health journalism. He is additionally a practising microbiologist; and an author. He is also interested in history, politics and current affairs.

You may also like:

World

A new idea for combatting America's horrific problem of school shootings is to unleash an unarmed drone to confront the attacker.

Business

Prime Minister Mark Carney and the leader of Canada's oil‑rich Alberta province took a major step Friday toward building an oil pipeline.

Business

An electronic board shows the Nikkei 225 index on the Tokyo Stock Exchange at an office building in Tokyo - Copyright AFP Kazuhiro NOGIGlobal...

World

While Donald Trump and Xi Jinping were hailing their friendship for the cameras, it was less amicable for the rival Chinese and US security.