GPT-5's Source Problem: Why AI Hallucinates Citations

8/12/2025

Why GPT-5's Source Linking Problems Force Users to Beg for References

You’ve probably seen the headlines & felt the hype. GPT-5 is here, or at least, its rollout has begun, promising a "significant leap in intelligence." We were told to expect something akin to a "helpful friend with PhD-level intelligence." So, you’d think that with all this brainpower, the one thing it would get right, the absolute bare minimum for any serious use, is telling you where it got its information from. Right?

WRONG.

Turns out, one of the biggest frustrations with the latest & greatest from OpenAI is the same old, infuriating problem that has plagued its predecessors: its complete inability to reliably cite its sources. It's a problem so bad that it has users on forums & social media practically begging for a simple, working reference. People are spending more time trying to verify the AI's claims than it would have taken to just find the information themselves. It's not just an annoyance; it's a fundamental flaw that undermines the very trust we're trying to build in these powerful new tools.

Honestly, it feels like we’ve been handed a super-intelligent, know-it-all assistant who, when asked "how do you know that?", just confidently makes something up. This isn't just a glitch in the system; it's a deep-seated issue that goes to the very core of how these models work. & it’s making a lot of us wonder if we're moving forward at all.

The User's Plight: A Sea of Broken Links & Make-Believe Sources

If you’ve spent any time trying to use ChatGPT for research, you know the pain. You ask it to find some data, draft a report, or explain a complex topic, & you make one simple request: "Please include your sources."

What you get back looks convincing at first. The text is well-written, the claims seem plausible, & there they are – a neat little list of citations or hyperlinks. You breathe a sigh of relief. This is going to save SO much time. But then you start clicking.

Link #1: 404 error. Link #2: Leads to a completely unrelated page. Link #3: Looks like a real academic paper, but when you search for it on Google Scholar, it doesn't exist. The authors are real, but they never wrote that paper.

This is the reality for countless users. One user on Reddit described using ChatGPT to draft reports for clients, explicitly telling it to source all claims. The result? Most of the links were broken, leading to 404 pages. They were left scrambling, unable to send the work to a client because the foundation of the research was pure fiction. This isn’t a rare occurrence; it's a systemic problem. Studies have shown that AI models can fabricate anywhere from 18% to a staggering 69% of their citations. Some analyses have found that AI search engines provide incorrect citation information over 60% of the time.

It’s not just broken links either. Sometimes, the AI will confidently state a "fact" & attribute it to a reputable source, but if you go & check, that source says nothing of the sort. Or, in a more bizarre twist, it will invent entire studies, complete with plausible-sounding titles & author names. This phenomenon, often called "hallucination," is one of the most significant hurdles for the practical application of LLMs. It’s the digital equivalent of a pathological liar who is so convincing they almost make you doubt your own sanity.

The frustration is palpable. Users are tired of having to play detective, of spending hours cross-referencing every single claim. The promise of AI was to augment our intelligence, not to send us on a wild goose chase for non-existent evidence. This constant need for verification has led to a breakdown of trust. How can you rely on a tool that so confidently & consistently gets it wrong?

Why This Is a HUGE Problem: More Than Just an Annoyance

This isn't just a quirky bug. The inability of large language models to provide accurate sources has serious, real-world consequences.

First & foremost, it’s a direct threat to academic & professional integrity. Students who use these tools for research risk committing plagiarism or submitting papers based on fabricated evidence. Researchers who might be tempted to use AI to speed up their literature reviews could find themselves building arguments on a foundation of sand. In fields like law & medicine, where accuracy is paramount, the consequences of relying on a hallucinated citation could be catastrophic.

Then there’s the issue of misinformation. We’re already living in an age where it’s difficult to distinguish fact from fiction. AI models that generate plausible but unsourced or incorrectly sourced information are a super-spreader event for fake news waiting to happen. As one report notes, this blurs the lines between genuine & generated information, creating a massive opportunity for the intentional dissemination of false narratives.

For businesses, the stakes are just as high. Imagine a company using an AI to generate marketing content that includes fake statistics or misattributes quotes. The damage to their reputation could be immense. Or what about a business that wants to use an AI chatbot on their website to help customers? If that chatbot starts inventing product features or making up policy details, the result is customer confusion, frustration, & a loss of trust that can be very difficult to win back.

This is a critical point that many businesses overlook when they're excited about the prospect of AI automation. A generic AI model, even one as powerful as GPT-5, is not a reliable source of information about your specific business. It will try to fill in the blanks, & in doing so, it will likely get things wrong.

This is where a solution like Arsturn becomes so important. Instead of relying on a model trained on the wild, often inaccurate, expanse of the internet, Arsturn allows businesses to create custom AI chatbots trained on their own data. This means you can feed the chatbot your company's actual product manuals, your real FAQ pages, your genuine policy documents, & your verified marketing materials. The result is a chatbot that provides instant, ACCURATE customer support. It won’t hallucinate answers because it’s only drawing from the information you provided. It’s a closed loop of verified knowledge. This is not just about answering questions; it's about building meaningful connections with your audience through personalized & trustworthy interactions. For any business considering AI for customer engagement or lead generation, this distinction is not just a feature – it’s a necessity.

The "Why" Behind the Lie: How LLMs Actually Work

So, why is this so hard for AI to get right? The answer lies in the fundamental nature of how these models are built. A large language model is not a database. It's not a search engine. It’s a word prediction machine.

Think of it this way: an LLM has been trained on a truly mind-boggling amount of text from the internet. It has read more books, articles, & websites than any human ever could. But it hasn't understood any of it in the way a human does. Instead, it has learned the statistical patterns of language. It knows that in a sentence about a cat, the word "meow" is more likely to appear than the word "photosynthesis." It knows that a list of academic references usually follows a certain format.

When you ask it a question, it doesn't "look up" the answer. It starts generating a response, one word at a time, based on the patterns it has learned. It's constantly asking itself, "Given the words I've already written, what is the most probable next word?" This is how it can create such fluent & human-sounding text.

But here’s the problem: this process has no built-in fact-checker. If the most statistically likely sequence of words forms a sentence that is factually incorrect, the model has no way of knowing. If the most plausible-looking citation is one that doesn't actually exist, the model will generate it with the same level of confidence as a real one. This is what we call "hallucination," & it's not a bug, but a natural byproduct of the way LLMs work.

The model is essentially an incredibly sophisticated mimic. It has seen millions of examples of how people cite sources, so it can create new citations that look just like the real thing. It can generate URLs that follow the correct format, but that doesn't mean it has ever checked if a website actually exists at that address. It’s all pattern-matching, no verification.

This is why we see the phenomenon of "confident incorrectness." The AI presents its fabricated information with the same authoritative tone as it does its factual information. There's no hesitation, no "I'm not sure about this," which makes it incredibly difficult for a casual user to spot the errors.

GPT-5: The "Next-Gen" Letdown

With the rollout of GPT-5, many of us hoped these fundamental problems would finally be addressed. OpenAI promised a model that was a "significant leap in intelligence." Users were expecting a more reliable, more accurate, & more trustworthy AI.

The reality, however, has been a bit of a letdown. Early users have reported that GPT-5 suffers from many of the same issues as its predecessors, including the frustrating inability to handle sources properly. One ZDNet writer who tested GPT-5's coding skills found it to be so bad that he was sticking with the older GPT-4o. He described the new model as delivering "broken plugins and flawed scripts," & noted that it "confidently presents an answer that is completely wrong."

The user backlash has been significant enough that even OpenAI CEO Sam Altman had to admit that the rollout was a "little more bumpy than we hoped for." He acknowledged that in some cases, the system was routing prompts to the wrong model, making GPT-5 seem "way dumber" than it should. This has led to a wave of disappointment, with thousands of users signing a petition to keep the older, more predictable GPT-4o available as an option.

This isn't to say that GPT-5 isn't powerful. It has shown improvements in certain areas, like creative writing & complex reasoning. But for many users, these improvements are overshadowed by the persistence of these fundamental flaws. What good is a more "creative" AI if you can't trust a word it says? The core issue remains: these models are being scaled up in size & capability, but the underlying problem of grounding them in reality has not been solved.

Are There Any Fixes on the Horizon?

The good news is that AI developers are acutely aware of this problem. It's a major focus of research & development. One of the most promising approaches being explored is something called Retrieval-Augmented Generation, or RAG.

In a nutshell, RAG attempts to connect the LLM to an external, verifiable source of information. So, instead of just generating text based on its internal patterns, the model is supposed to first "retrieve" relevant information from a trusted database or a set of documents, & then use that information to "augment" its response. In theory, this should make the AI's answers more accurate & its citations more reliable.

However, as of today, RAG is not a perfect solution. It still has its own set of challenges. The model can misinterpret the retrieved information, it can still hallucinate even with access to a knowledge base, & the quality of the output is heavily dependent on the quality of the information it has access to.

This brings us back to the importance of a curated, controlled data source. For a business that needs a reliable AI assistant, the most effective approach right now is not to hope that a general-purpose model like GPT-5 will get it right, but to build a solution on a platform like Arsturn. By training a no-code AI chatbot on your own specific, verified data, you are essentially creating your own small-scale, highly effective RAG system. The "retrieval" part is from your own documents, & the "generation" part is focused on accurately representing that information. This is how you can harness the power of conversational AI to boost conversions & provide personalized customer experiences without the risk of your chatbot going rogue & making things up.

Hope this was helpful!

Look, here's the thing. AI is an incredible technology with the potential to change the world. But as with any powerful tool, we need to be clear-eyed about its limitations. The hype around models like GPT-5 is immense, but the reality on the ground is that we are still in the very early days. The problem of source linking & citation is not just a minor inconvenience; it's a reflection of the deep-seated challenge of building AI that is not just intelligent, but also trustworthy.

For now, the onus is on us, the users, to be critical thinkers. We have to treat every claim from an AI as suspect until proven otherwise. We have to click the links, search for the studies, & do our own due diligence. The dream of a completely reliable AI research assistant is still just that – a dream.

But for businesses looking to use this technology today, there are practical solutions. By focusing on creating custom chatbots trained on their own data with platforms like Arsturn, they can bypass the hallucination problem entirely & build AI tools that are genuinely helpful, accurate, & trustworthy. It's about using the right tool for the right job, & right now, that means choosing a controlled environment over the wild, unpredictable frontier of general-purpose AI.

Let me know what you think. Have you had your own frustrating experiences with AI citations? What are your strategies for verifying the information you get? The conversation is just getting started, & we all have a role to play in shaping how this technology evolves.