8/10/2025

GPT-5 for Math & Science: Is It Smarter Than a Ph.D.?

Alright, let's talk about the big one. GPT-5 is here, & the chatter is deafening. The announcement from OpenAI claims it's "smarter across the board," especially in heavy-hitting fields like math & science. They're not just whispering it, they're shouting it from the rooftops. We're seeing benchmarks that are, frankly, mind-boggling. But it begs the real question, the one everyone's thinking: is this thing actually smarter than a human with a Ph.D. who's dedicated their life to a specific, esoteric corner of the universe?

It’s a loaded question. The answer isn't a simple yes or no. It's more of a "well, it depends on what you mean by 'smarter'," which I know is an annoying answer, but stick with me. We're at a fascinating inflection point in history, & the implications are HUGE. So, let's dig in, look at the data, the expert opinions, & the stuff that doesn't show up in a benchmark score.

The Raw Power: GPT-5's Jaw-Dropping Performance

First off, let's give credit where it's due. The performance numbers for GPT-5 are nothing short of spectacular. OpenAI didn't just release a slightly better model; they seem to have made a significant leap.

Here's a quick rundown of what's making everyone's head spin:

Competition-Level Math: On benchmarks like the AIME (American Invitational Mathematics Examination), GPT-5 is scoring a staggering 94.6% without using external tools. For context, these are problems designed to challenge the brightest high school minds aiming for the International Mathematical Olympiad. It's also hitting 93.3% on the Harvard-MIT Mathematics Tournament (HMMT) without tools. That's expert-level problem-solving, plain & simple.
PhD-Level Science Questions: This is where it gets REALLY interesting. There's a benchmark called GPQA Diamond, which is composed of PhD-level science questions. GPT-5 is clocking in at 87.3% with the help of a Python tool & 85.7% without it. The Pro version pushes this even higher to 89.4%. This suggests an incredible ability to recall & reason with highly specialized scientific knowledge.
Humanity's Last Exam (HLE): If that wasn't enough, there's a new benchmark with the delightfully ominous name "Humanity's Last Exam." It's a collection of 2,500 hand-picked, PhD-level questions across math, physics, chemistry, & more. The GPT-5 Pro variant scored 42% on this beast, which is a testament to its broad & deep knowledge base.
Coding & Multimodal Reasoning: It’s not just about abstract questions. GPT-5 shows massive gains in coding, scoring 74.9% on a benchmark that involves fixing real-world Python code issues from GitHub. Plus, its ability to interpret scientific charts & figures (multimodal reasoning) has set a new state of the art.

Sam Altman, OpenAI's CEO, said that GPT-3 felt like a high school student, GPT-4 a college student, & GPT-5 is the first time it feels like "talking to an expert in any topic, like a PhD-level expert." Honestly, based on these numbers, you can see why he'd say that. The model can process & answer questions on a level that, in many specific instances, rivals or even exceeds human experts.

For businesses & researchers, this is a game-changer. Imagine having an assistant that can instantly analyze data, write complex code, or provide a detailed summary of existing literature on any topic. This is where tools built on this tech become invaluable. For instance, a company could use a platform like Arsturn to build a custom AI chatbot trained on their own extensive research data. This bot could then act as an internal expert, providing instant answers & insights to scientists & engineers 24/7, dramatically speeding up the research & development process.

But Hold On... What Does "Smarter" Even Mean?

Okay, the benchmarks are impressive. If "smarter" just means having faster access to more information & being able to process it to answer specific questions accurately, then yeah, GPT-5 is probably "smarter" than a Ph.D. A human brain can't compete with that kind of recall & speed.

But is that all a Ph.D. is? Of course not. This is where the limitations of AI, even one as powerful as GPT-5, start to become pretty clear.

The Creativity & Intuition Gap

The real magic of a brilliant scientist or mathematician isn't just knowing all the answers; it's asking the right questions. It's about having a flash of insight in the shower, connecting two seemingly unrelated concepts, or designing a completely novel experiment to test a wild hypothesis.

This is where AI currently struggles. As some researchers have pointed out, AI can't replace the creativity, intuition, & critical thinking skills that are the lifeblood of scientific research. An AI is trained on the vast corpus of existing human knowledge. It's incredibly good at finding patterns, interpolating, & even extrapolating from that data. But can it create a truly new paradigm? Can it have a "eureka!" moment that isn't just a recombination of what it's already seen?

Think about it this way: GPT-5 could probably write you a flawless summary of Einstein's theory of relativity. But could it have come up with the theory of relativity in the first place? That's a much taller order. That required a unique blend of deep knowledge, creative leaps, & a healthy dose of rebellion against established physics.

The 'Garbage In, Garbage Out' Problem

Another major hurdle is data dependency. AI models are only as good as the data they're trained on. The scientific literature itself is not perfect; it's filled with biases, paywalled content, non-reproducible studies (the so-called "replication crisis"), & a strong bias towards English-language papers.

An AI trained on this imperfect data will inevitably inherit its flaws. It might present information from a single, non-replicated study with the same level of confidence as a widely accepted theory. It can lead to misleading conclusions or perpetuate existing biases in research. A human Ph.D., on the other hand, develops a critical filter over years of study. They learn to be skeptical, to check sources, to understand the context & reputation of different labs & journals. They know that not all published papers are created equal.

The Black Box & The Physical World

Then there's the problem of interpretability. Sometimes, an AI will give you an answer, & it might even be the right one, but we have no idea how it got there. This "black box" nature is a BIG problem in science, where the process & the reasoning are just as important as the final result.

Furthermore, science isn't just about thinking. It's about doing. It's about designing & running experiments in the messy, unpredictable physical world. While AI can help design these experiments, it still relies on human researchers (or robots) to actually perform them, collect the data, & deal with the million little things that can go wrong in a lab. Physical experimentation remains a huge bottleneck.

The Evolving Role of the Human PhD: From Sage to Shepherd

So, if GPT-5 isn't going to replace Ph.Ds wholesale, what is going to happen? The consensus seems to be that the role of the human expert is evolving. AI is becoming an unbelievably powerful tool, a force multiplier that will augment, not replace, human intelligence.

Think of it like this: the scientist of the future might spend less time on tedious literature reviews, routine data analysis, or writing boilerplate code, because the AI can handle that. This frees them up to focus on the higher-level tasks:

Asking Big Questions: Defining the grand challenges & formulating the truly novel hypotheses.
Critical Oversight: Acting as the ultimate arbiter of the AI's output, validating its results, & spotting its inherent biases.
Experimental Design & Execution: Interacting with the physical world to generate new, high-quality data that can then feed back into the AI.

In this new world, the most successful researchers will be the ones who are best at collaborating with AI. They'll be the "shepherds" who guide these powerful systems toward interesting problems & interpret their findings with a critical human eye.

This is true outside of the lab as well. Businesses are quickly realizing that AI can handle a massive amount of customer interaction & lead generation, freeing up human teams for more complex issues. For example, a business can use Arsturn to build a no-code AI chatbot trained on their product documentation & sales materials. This bot can engage with website visitors, answer their questions instantly, & qualify leads 24/7, boosting conversions while allowing the sales & support teams to focus on high-value conversations. It's not about replacing the team; it's about giving them a powerful assistant.

So, What's the Verdict?

Is GPT-5 smarter than a Ph.D.?

If you define "smarter" as the ability to win a trivia contest, to answer complex questions based on existing knowledge with superhuman speed & accuracy, then the answer is leaning towards yes. Its performance on standardized tests is undeniable & it can serve as an expert on call for a vast array of topics.

But if you define "smarter" in the way we value human experts—for their creativity, their critical thinking, their intuition, their ability to chart a course into the unknown & generate truly novel ideas—then the answer is a firm no. Not yet, anyway.

GPT-5 is an incredible achievement. It represents a monumental step forward in our ability to codify & access knowledge. It will undoubtedly accelerate scientific discovery at a rate we've never seen before. But it's a tool, a brilliant collaborator, not a replacement for the human mind at the helm of the scientific enterprise. The Ph.D. isn't obsolete; their job is just getting a whole lot more interesting.

Hope this was helpful & gives you a clearer picture of where we're at. It's a pretty wild time to be alive, that's for sure. Let me know what you think.