8/13/2025

A Developer's First Look: Is GPT-5 Really Better for Programming?

Alright, let's get into it. The tech world has been buzzing, & if you're a developer, you've probably been hearing the whispers, the hype, & maybe even some of the grumbling about OpenAI's latest creation: GPT-5. The big question on everyone's mind is pretty simple: is it actually better for coding?

As someone who spends a good chunk of their day wrestling with code, I was SUPER curious. Is this the upgrade we've all been waiting for, the one that finally makes AI a true pair programmer? Or is it just another incremental update with a fancy new version number?

Honestly, after digging through a bunch of reviews, benchmarks, & developer chatter, the answer seems to be… well, it’s complicated. The real truth is somewhere in the middle of the breathless hype & the frustrated rants. So, let's break down what's new, what's actually improved for us developers, & where GPT-5 still kinda stumbles.

The Big Picture: What's Different This Time Around?

First off, GPT-5 isn't just one single model. OpenAI is marketing it as a "family of models." Think of it like a smart router that automatically picks the best tool for the job. You've got your "general" variants for everyday stuff & then the more powerful "thinking" variants for when you need some serious cognitive horsepower. This is a pretty big deal because, in theory, it means we don't have to constantly switch between models to get the best results for different tasks. It's supposed to learn over time what kind of model is best for your prompts, which is a nice touch.

Compared to its predecessor, GPT-4, the new version is supposed to be a major leap in a few key areas that matter to us: writing, health-related advice, & of course, coding. OpenAI is tooting its own horn quite a bit, claiming it's the strongest coding model they've ever made. They're talking about generating entire websites, apps, & even games from a single prompt. That's a bold claim, & we'll get into whether it holds up in a bit.

Another thing that a lot of people will be happy to hear is that GPT-5 is apparently less of a suck-up. You know how GPT-4 would sometimes blindly agree with you or shower you with praise? Yeah, they've toned that down. It's supposed to be more confident in its own responses, which is a good thing when you're looking for a straight answer.

And of course, there are the usual claims of being faster & more efficient. OpenAI says it can produce better results with less effort, which is not only good for our patience but also for the environment & our wallets.

So, How Does It Actually Code? The Good, the Bad, & the… Buggy

Now for the million-dollar question: can it actually code better? The short answer is yes… but with some pretty big caveats.

The Good Stuff: Where GPT-5 Really Shines

Let's start with the wins. According to the benchmarks, GPT-5 is a beast. OpenAI is reporting some impressive scores, like 74.9% on SWE-bench Verified & 88% on Aider Polyglot for coding tasks. For those of you who don't follow benchmarks religiously, those are some pretty solid numbers that suggest a real improvement in coding capabilities.

So where will you actually feel the difference? Here are a few areas where GPT-5 seems to have a clear edge over GPT-4:

Debugging & Cross-File Reasoning: This is probably the biggest win for developers. GPT-5's "thinking" mode & improved context handling make it MUCH better at tracing bugs across multiple files. You can throw a failing test, a screenshot of a stack trace, & the relevant files at it, & it's more likely to come back with a focused patch & a unit test to prove it works. With GPT-4, this kind of thing often required a lot of back-&-forth to get to the root of the problem.
Multimodality is a Game-Changer: This is a pretty cool one. GPT-5 is way better at understanding multimodal inputs – that is, a mix of text, images, & diagrams. For frontend developers, this is HUGE. You can feed it a screenshot of a broken component along with the CSS, & it's more likely to give you an accurate diagnosis. This is a massive time-saver for debugging UI regressions & for getting up to speed on legacy code.
Better Code Quality (Usually): The code generated by GPT-5 tends to be more idiomatic & less prone to the kind of "plausible but brittle" code that GPT-4 sometimes spat out. It's better at avoiding hallucinated APIs & type mismatches, which means less time spent fixing the AI's mistakes.
More Effective Pair Programming: Because it's faster & has a better grasp of the codebase, GPT-5 feels more like a real-time pair programmer. It's better at making suggestions that actually fit the context of what you're working on & at catching subtle mistakes.
Smarter Refactoring: Got a big refactoring job on your plate? GPT-5's ability to hold more context & reason across multiple files makes it a much more capable assistant for larger-scope refactors.

Here's a concrete example of where GPT-5 pulls ahead. Imagine you have a 500-line diff that touches multiple services & frontend components. GPT-5 can generate a prioritized list of functional changes, potential regressions, & even suggest test cases. That's incredibly useful for busy reviewers & can significantly speed up the code review process.

This is where having a smart AI assistant really pays off. It's not just about generating code; it's about understanding the entire development lifecycle. For businesses looking to streamline these processes, this is where tools like Arsturn come into the picture. Imagine having a custom AI chatbot trained on your company's entire codebase & documentation. It could act as a super-powered linter, a junior developer that never sleeps, or even a customer support agent that can answer technical questions about your API. By building a no-code AI chatbot with Arsturn, you could provide your developers with instant, context-aware support, helping them navigate complex codebases & resolve issues faster.

The Bad & The Buggy: Where GPT-5 Falls Short

Now, before you rush off to replace your entire dev team with GPT-5, let's talk about the not-so-great stuff. Because, honestly, it's not all sunshine & roses.

David Gewirtz over at ZDNet did a series of tests on GPT-5's coding skills & came away pretty unimpressed. In fact, he found that it failed half of his programming tests – the worst result he's ever seen from an OpenAI flagship model.

In one test, he asked it to write a WordPress plugin, a task that previous versions of ChatGPT had handled with ease. GPT-5's first attempt was a complete failure, producing a non-working plugin that redirected to the wrong page. It eventually fixed the issue after being prompted, but the fact that it failed so badly on the first try is a pretty big red flag.

In another test, GPT-5 was asked to write a script that involved AppleScript. It completely fumbled the task, inventing a property that doesn't exist & demonstrating a fundamental misunderstanding of how case sensitivity works in the language. It confidently presented an answer that was completely wrong – a classic example of AI hallucination.

This is a SUPER important point to remember: GPT-5, for all its improvements, still hallucinates. It's better than GPT-4 at reducing certain types of hallucinations, like those around API usage & type errors, but it's by no means perfect. For security-sensitive code, you should treat its output as a draft proposal at best. NEVER deploy it without a rigorous review from a human expert.

It seems that for some tasks, especially those that require a deep understanding of a specific framework or a less common language, GPT-5 can still get things spectacularly wrong. This has led to a bit of a backlash from the developer community. When GPT-5 first rolled out, OpenAI made it the default & made it difficult to go back to GPT-4o. The internet, as you can imagine, was not happy about this. OpenAI eventually relented & added an option to use legacy models, but it's a good reminder that newer isn't always better for every single task.

So, Should You Upgrade? A Look at the Cost-Benefit Analysis

This is the real question for a lot of us, isn't it? Is the upgrade worth it?

The answer, as with most things in tech, is: it depends.

If you're a hobbyist or someone with a very tight budget, you might want to hold off for now. GPT-4 still does a pretty good job for most day-to-day tasks, & the improvements in GPT-5 might not be worth the extra cost for you. The same goes if your primary workload is pixel-perfect UI design, as GPT-5's outputs in this area still often require a lot of human polishing.

However, if you're on a team that works with large codebases, spends a lot of time on PR triage, or frequently deals with complex, cross-file bugs, then GPT-5 could be a game-changer for you. The time saved on these tasks can add up VERY quickly. One analysis suggested that if GPT-5 saves a 10-engineer team just one hour per week per engineer, that could translate to over $30,000 a year in productivity gains. When you look at it that way, the cost of the upgrade starts to make a lot more sense.

And this is where the conversation extends beyond just individual developer productivity. For businesses, the ability to automate & streamline workflows is a massive competitive advantage. Think about all the time your team spends answering the same questions over & over again, whether it's from customers, new hires, or even other developers. This is where a solution like Arsturn can have a huge impact. By creating a custom AI chatbot trained on your own data – your documentation, your knowledge base, your codebase – you can provide instant, accurate answers 24/7. This frees up your team to focus on the high-value work that really drives your business forward, whether that's building new features, fixing complex bugs, or engaging with your community. It's about building a more efficient & responsive organization, & that's something that EVERY business can benefit from.

The Human in the Loop: The New Developer Workflow

So, what does all of this mean for the future of our jobs? Is AI going to take over the world of programming?

Honestly, I don't think so. At least, not yet.

What we're seeing is a shift in the developer workflow. The AI is becoming an increasingly capable assistant, but it's not ready to take the driver's seat. Here's what the new human + AI workflow might look like:

Humans retain strategic authority: We're still the ones making the big decisions about architecture, product direction, & security tradeoffs. The AI can suggest options, but the final call is still ours.
AI handles the cognitive grunt work: Let the AI write the boilerplate, generate the unit tests, & draft the code comments. This frees up our mental energy for the more creative & challenging aspects of our work.
Humans enforce correctness: We're the final line of defense. We're the ones who need to run the tests, do the manual reviews, & validate that the AI's output actually meets the needs of the business.

The key is to play to each other's strengths. Let the AI do what it does best – process vast amounts of information & generate code at lightning speed. And let us do what we do best – think critically, solve complex problems, & understand the nuances of what our users actually want & need.

The Final Verdict: Is GPT-5 a Must-Have for Developers?

So, after all that, what's the bottom line? Is GPT-5 a must-have for developers?

I'd say it's a "strong maybe."

There's no doubt that it's a powerful tool with some impressive new capabilities. The improvements in debugging, multimodality, & cross-file reasoning are genuinely exciting & have the potential to significantly boost our productivity. For teams working on large, complex projects, the upgrade is likely a no-brainer.

But it's not a magic bullet. It still makes mistakes, it still hallucinates, & it still requires a human expert to guide it & validate its work. The ZDNet review is a stark reminder that we can't just blindly trust its output, especially on complex or mission-critical tasks.

My advice? Don't just jump on the bandwagon because it's the new shiny thing. Take a look at your own workflow & your own pain points. If you're constantly struggling with the kinds of tasks where GPT-5 excels – like debugging tricky bugs or refactoring large codebases – then it's definitely worth giving it a try. But if your needs are more modest, you might be perfectly happy sticking with GPT-4 for a while longer.

Ultimately, the best way to know for sure is to try it out for yourself. Put it through its paces on a real-world project & see how it performs. And who knows, you might just find that it's the coding partner you've been dreaming of.

I hope this was helpful! Let me know what you think in the comments below. Have you tried GPT-5 for coding yet? What's been your experience? I'm super curious to hear what other developers are seeing out in the wild.