Over a billion people are using AI in 2026, and many people do not limit themselves to the ubiquitous ChatGPT, instead trying other options instead. However, many tools still experience ‘hallucinations’, making up wrong data.
AI hallucinations occur when artificial intelligence systems generate outputs that are plausible but factually incorrect, fabricated, or not based on their training data.
Analyzing the trend of LLM use for daily tasks, a March 2026 report from Open Resource Applications compared which assignments users give to AI the most and which of them are most vulnerable to AI’s ‘hallucinations’.
This revealed that mathematical calculations are the easiest for AI to mess up, with an accuracy of only 0.38/1.
The study collected the most common tasks assigned to AI based on public records of generative artificial intelligence usage. To assess LLM models’ performance, the research matched each task category to the most relevant benchmarks, using datasets from MMLU-Pro, GPQA, IFEval, WildBench and Omni-MATH. The accuracy scores were calculated for each model and then averaged for each task. The study also includes the models that performed the best in each assignment.
The top 5 most difficult tasks for AI to complete are:
| Everyday Task | Benchmark | Average Accuracy | Best Model |
| Mathematical Calculation | Omni-MATH | 0.3861 | GPT-5 mini (2025-08-07) |
| Data Analysis | GPQA | 0.522 | Gemini 3 Pro (Preview) |
| Tutoring or Teaching | MMLU-Pro | 0.67 | Gemini 3 Pro (Preview) |
| Health, Fitness, Beauty or Self-Care | MMLU-Pro | 0.67 | Gemini 3 Pro (Preview) |
| Specific Information | MMLU-Pro | 0.67 | Gemini 3 Pro (Preview) |
AI Is Bad At Math
Large Language Models (LLMs) are created to analyze and generate texts, and calculations are not part of their primary function. This is one of the reasons why AI is often wrong when given even the simplest math tasks. Most AIs score only 0.38/1 on the accuracy, meaning 2 times out of 3 the final result can be ‘hallucinated’.
AI Cannot Perform Data Analysis With Incomplete Datasets
Data analysis includes inspecting, cleaning, and transforming the data, and while it seems that AI should be able to process it easily, only in 52% of the cases will AI give you the correct data. It happens because LLMs prioritize guessing the next logical token, a word or a number, in a longer sequence, rather than displaying the correct data.
AIs Cannot Be Your Teacher
While many digital users turn to AI for teaching, most language models score only 0.67 out of 1 on accuracy when it comes to learning tasks. The best model that can reliably give data or create a useful learning exercise is Gemini 3 Pro (Preview).
“Teaching is 100% about giving students correct information, and right now, most AIs cannot achieve that,” comments a spokesperson from Open Resource Applications.”LLMs’ output is often wrong when the data given to it is incomplete, or when the larger context is required.”
Health, Fitness, Beauty, and Self-Care Are Better Left For Professionals
Similar to teaching materials, most AIs score 0.67/1 for accuracy when it comes to health and beauty-related topics. Most of the time, LLMs will be able to search and summarize information from the Internet, but even one wrong source or a lack of data can lead to AI hallucinations that can be dangerous for users’ health.
AI With Come Up With Information Instead Of Admitting to Not Finding It
AI scores 0.67/1 on average for accuracy when it comes to specific information queries. When LLMs are given a niche topic with few sources or incomplete data, they will ‘predict’ the answer instead of admitting they cannot help. For most of these tasks, Gemini 3 Pro (Preview) showed better results than other language models, but no model was able to avoid making up information 100% of the time.
Dangers revealed
Although LLMs are a very useful tool, users need to understand their primary function and limitations. AIs are at their best when they help you edit the text that has been drafted, or rainstorm ideas, or are part of a game or role play.
Mathematics or medical fields can use AI only with professionals nearby who can check the work. Otherwise, users may end up with completely wrong data.
