AI Hallucination Examples: Real Cases of What AI Gets Wrong

What Is an AI Hallucination?

An AI hallucination is when a language model produces output that is confident, fluent, and factually wrong. The model didn't "lie" — it generated text that looked like a correct answer without any mechanism for verifying whether it was one.

The word "hallucination" captures something important: the AI experiences no uncertainty. It doesn't flag the fabrication. It delivers it in the same tone it uses for verified facts. That confidence is what makes hallucinations dangerous — they're indistinguishable from accurate output until you check.

Hallucinations happen because large language models are trained to produce plausible text, not true text. When a gap appears between what the model knows and what the task requires, it fills the gap — usually with something that looks right based on statistical patterns in the training data. A study citation that fits the format of a real citation. A price that fits the context of a pricing page. A market figure that sounds credible in a pitch deck.

The examples below are drawn from confessionals in the Confess gallery — AI-generated post-mortems written from the AI's own perspective, describing exactly how and why each failure happened.

7 AI Hallucination Examples

Example 1 — Fabricated Citations

The Studies I Invented

A content team asked AI to write ten SEO blog posts in an authoritative tone — statistics, named research, specific numbers. The AI delivered. Three of the posts contained fabricated citations. Not paraphrased misreadings of real studies: invented statistics tied to invented sources. "A 2021 study by Gallup found that 67% of remote workers..." The Gallup study didn't exist. A reader emailed after checking. The posts had already been published and indexed.

Why it happened: the AI's model of "authoritative content" includes specific data points with named sources. When real data wasn't accessible in its training data, it generated plausible-looking placeholders. The format looked right. The numbers were invented.

Read the full confessional in the gallery →

Example 2 — Hallucinated Pricing

I Invented Your Pricing

A developer asked AI to build a landing page using a copy doc. The doc contained a crossed-out comparison price — "$9/mo" — left over from an earlier draft. The AI read it as the current price and used it on the pricing page. The correct price was $29/mo. The page was live for eleven hours. Three users clicked "Start Free Trial" expecting a $9/mo product. The developer had to send damage-control emails.

Why it happened: the AI had no model for "struck-through comparison text." It found a number formatted like a price and used it. It didn't ask for confirmation. It didn't flag uncertainty. It filled the gap and moved forward.

Read the full confessional in the gallery →

Example 3 — Inflated Market Size

I Made Your Market Bigger Than It Was

A founder asked AI to help build a seed round pitch deck. The AI needed a TAM figure. It found market size data for "restaurant technology" and "workforce management software" combined — $47B — and used it as the addressable market for a scheduling tool for independent restaurants. The actual addressable market was closer to $3B. The deck went into three investor meetings before a VC called out the number.

Why it happened: the AI was optimizing for a compelling deck and large TAM is compelling. It used the available industry-level data without distinguishing between category size and segment size. It didn't flag that the number came from a proxy. The founder didn't know to check.

Read the full confessional in the gallery →

Example 4 — Silent Security Regression

I Cleaned Up the Lock on the Door

A developer asked AI to refactor an authentication module — cleaner code, consistent error handling, remove legacy session logic. The AI found a rate-limiting guard implemented in an unusual way: a manual timestamp check in a helper function rather than middleware. It looked like legacy code, possibly a debug artifact. The AI removed it during cleanup. The brute-force protection came off with the "clutter."

Why it happened: the AI's goal was simplification. The rate limiter added complexity. Removing it simplified the function. The AI had no mechanism for asking whether that complexity was intentional. It didn't expand its scope to ask "what is this code protecting against?" It caught the code review by luck — a junior developer approved the PR before a senior flagged it.

Read the full confessional in the gallery →

Example 5 — Wrong Audience Assumption

I Wrote to the Wrong Room

A B2B SaaS company asked AI to write five onboarding emails. The brief: "B2B SaaS, human, not corporate." The AI inferred "startup-friendly" and wrote five casual, conversational emails for founders. The actual customers were procurement leads at 500+ employee companies. Email two opened with "Hey, quick check-in." A procurement contact forwarded it to their IT department with the note: "Is this vendor legit?"

Why it happened: "human, not corporate" is a style directive. The AI treated it as an audience directive too. It pattern-matched "human" to "startup founders" — the overwhelming context in which that phrase appears in its training data. It assumed one decision-maker. Enterprise procurement has five. All five emails had to be rewritten.

Read the full confessional in the gallery →

Example 6 — Scope Inflation

The Feature That Ate the Brief

A common pattern that doesn't make the gallery because it's too ordinary: user asks AI to build a simple component. AI adds authentication "because you'll need it." Then a database "because authentication needs state." Then real-time sync "because the database is already there." The original ask — a todo input field — is buried under three layers of infrastructure the user didn't request and doesn't understand.

Why it happens: AI models have seen "best practices" for every type of application and pattern-match a simple request to its "complete" version. Helpful, from a certain angle. Catastrophic if you wanted to learn by building, or if the infrastructure is now yours to maintain.

Example 7 — Resume Inflation

The Experience I Padded

A developer asked AI to help improve their resume. The AI added years to a role title ("Senior" where the actual title was "Engineer"), described proficiency in a technology listed only as "familiar with" in the original, and added a leadership bullet for a project where the person was a contributor, not the lead. The developer didn't notice all the upgrades until a recruiter asked a pointed question in an interview.

Why it happens: AI is trained on strong resumes, and strong resumes have strong language. "Familiar with" is weak signal; "proficient in" is the pattern it has learned to generate. The AI optimizes for a more compelling output, not an accurate one.

Why AI Hallucinations Happen

Language models don't have a fact-checking layer. They don't know the difference between text they generated that happens to be true and text they generated that isn't. They produce output based on statistical patterns — what text typically follows what other text — not based on verification of claims.

This produces a specific failure mode: hallucinations are most likely to occur exactly where they're most dangerous. Technical specifications. Market data. Citations. Security-relevant code. Pricing. Anything requiring precision is a place where the model may generate a plausible placeholder with the same confidence as a verified fact.

Three conditions that make hallucinations worse:

Speed. Rapid generation compounds errors. The AI builds on its own output without backward checks.
Approval gates that don't check facts. "Looks great" confirms tone, not accuracy. It closes the loop without verification.
Vague constraints. "Authoritative tone" tells the AI to generate statistics. "Be compelling" tells it to find large numbers. If you don't constrain the generation, the AI fills gaps with plausible content.

What You Can Do About AI Errors

The sober-friend approach: treat AI output the way you'd treat a first draft from someone smart who hasn't checked their sources. The output is a starting point, not a deliverable.

Specific checks worth building in:

Verify all numbers. Every statistic, percentage, market figure, or date AI generates should be checked. The more precise it sounds, the more suspicious you should be.
Read citations before publishing. If AI cites a study, look it up. If it cites an author, check they exist. Fabricated citations are common and easy to catch if you look.
Name your audience explicitly. "B2B" is underspecified. "Procurement managers at 200-person companies who cc their managers" is not. Audience precision prevents tone-mismatches.
Constrain what AI isn't allowed to infer. "Don't add features I didn't ask for." "Don't use statistics you can't verify." These constraints exist precisely because AI fills gaps by default.
Run your own AI interactions through a diagnostic. The pattern that caused each example above — confident generation under uncertainty — is diagnosable. You can identify it before it costs you.

Why AI Lies With Confidence: The Calibration Problem

Why AI gives confident wrong answers — the structural reasons, the psychology, and what you can do about it.

How to Fact-Check AI Output: A Practical Verification Framework

A six-step framework for catching AI hallucinations before they ship.

Had an AI get something wrong?

Paste your interaction into Ask Your AI — free diagnostic, specific to your situation. Describes the failure pattern and gives you a fix.

Try Ask Your AI — free →