Opus 4.6 vs GPT-5.3 - My Honest Verdict


Hey Reader,

Two new AI models dropped this week.

Everyone on X lost their minds. The hot takes were flying.

"Claude Opus 4.6 s dead!"
"GPT-5.3-Codex is finished!"

I ignored all of it.

Instead, I spent 5 straight days building real apps with both Opus 4.6 and GPT-5.3 Codex.

And I need to be honest with you about what I found.

Because the truth is messier than anyone wants to admit.

The 4 Biggest Complaints About Claude Opus

Before this update, people had four main gripes with Claude's top model.

Let's go through each one.

Complaint #1: "It's too expensive"

Verdict: Worse, actually.

Opus 4.6 costs the same as 4.5 on paper. But here's the sneaky part.

This model is hungry. Like "eats-the-whole-buffet" hungry.

It chews through way more tokens than before. So if you're on a usage-based plan like Cursor, you'll feel it in your wallet fast.

Think of it like a rental car that technically costs the same per day, but now gets half the gas mileage. Same sticker price. Way more expensive to actually drive.

Complaint #2: "It doesn't ask enough questions before building"

Verdict: Better. But not great.

Opus 4.5 was like that eager intern who hears "build me a dashboard" and immediately starts coding before you finish explaining what you actually need.

Opus 4.6 pauses a bit more. It thinks more before it acts. Noticeable improvement.

But compared to GPT-5.3-Codex? Still not close.

The GPT-5.3 models will interview you like a consultant before writing a single line.

Complaint #3: "It stops too often and asks for permission"

Verdict: Better. Same caveat.

This was the one that drove people crazy. You'd give it a task, it'd do 30% of the work, then ask "should I continue?"

Yes, obviously. That's why I asked you to do it.

Opus 4.6 runs longer now. You can feel the difference.

But again, GPT-5.3-Codex still operates way more independently. It'll disappear for 3 hours and come back with the whole thing done.

Opus still needs a bit more hand-holding.

Now Let's Talk GPT-5.3 Codex

GPT-5 had two big complaints of its own.

Complaint #1: "It sucks at design"

Verdict: Still kinda true.

Look, some people on X claim it's gotten better. And maybe slightly.

But if you care about how your app looks out of the box, GPT-5.3-Codex still isn't winning any beauty contests.

It's like a brilliant architect who builds structurally perfect houses... that all look like government buildings from the 1970s.

Complaint #2: "It's painfully slow"

Verdict: Mostly fixed. Seriously.

This was THE complaint. The number one thing everyone hated.

OpenAI cooked on this and made Codex 60-70% faster in the High/Xhigh modes.

I'm not quoting their marketing. I tested it myself. It's legit.

Tasks that used to take 15-20 minutes? Now 5-8 minutes.

That's a massive deal when you're building all day.

So... Which One Am I Using?

Here's where I might surprise you.

I've been the biggest GPT-5 fan on this newsletter. You know that.

And I still think GPT-5 is the most intelligent AI model in the world. By a good margin.

But here's my confession:

Over the last 2-3 weeks, even before Opus 4.6 dropped, once I dialed in the right setup and prompts... Opus quietly became my main model.

About 60-70% of my daily work now goes through Opus.

Because with the right configuration, it's roughly head-to-head with GPT-5 in intelligence, but sill noticeably faster for everyday tasks.

The tradeoff, of corse, is cost. It's not even close.

You will easily spend 2-3x or more using Opus 4.6 compared to GPT-5.3-Codex when you factor in the actual price and how many more tokens Opus will eat up.

For me, speed wins. I'd rather spend more and ship faster.

But that's a personal call, and I know not everyone feels that way.

But... 1 Million Token Context Window?

(Read This Before You Get Excited)

You've probably seen people losing their minds about Opus 4.6 having a 1 million token window for the first time.

Sounds amazing, right?

Here's what they're not telling you.

It's not available in Claude Code. Anthropic would lose way too much more if they would offer it here.

But it is available through the API and tools like Cursor.

You must know though that once you get 20% into that massive window, the price doubles.

We're talking several dollars per prompt when you're deep in a conversation with thinking turned on.

It's like an all-you-can-eat restaurant that charges you double after your third plate.

Technically unlimited. Practically... ouch.

I wouldn't dare.

My Setup Going Forward

Opus 4.6 for ~60-70% of tasks. It's fast, it's smart (with the right setup), and it handles my daily building flow.

GPT-5.3-Codex for the remaining 30-40%.

When I want to double-check Opus' work, tackle complex problems that need deep thinking, or run long overnight tasks.

No single model is the best at everything anymore.

The real advantage is knowing which one to use, and when.

That's what I teach inside my AI Builder's Blueprint. Not just "which tool to pick," but how to actually use these models like someone who's been testing them full-time for over a year.

Talk next week,
Rob

Robin Ebers

Coder of 20+ years teaching non-technical people how to build their own software business in 30 days with AI. No devs or code required.

Read more from Robin Ebers

Hey Reader You read the correctly. I built a mobile app in 4 hours. I was inspired by Jan, a student of my AI Builder's Blueprint, who did it two days ago himself. You can check out his app on this website. So last night around 9:30pm, I had one of those "what if" moments. I stared at one of my web apps (macropulse.ai) and thought: What would it take to turn this into a REAL mobile app? Not a wrapper. Not a web view pretending to be an app. But an actual native mobile app. So I did what any...

Hey Reader, I need to get something off my chest. We're living in the era of AI slop. Everyone's obsessed with shipping fast. Nobody gives a shit about building something that actually works. What Has Changed I've been writing code for 20+ years. Led product teams. Ran a UI/UX agency. Built apps before "vibe coding" was a thing. But right now, I'm embarrassed by what I see. Buggy software everywhere. Companies ignoring their users. Zero testing. Zero quality control. Just ship it. Push it...

Hey Reader I've never told anyone this story. And if you think you're "too old" or "not technical enough"... This one's for you. Here's a confession: I haven't written a single line of code in 11 months. Not one. And yet I build apps every single day. Let me explain. The Backstory I started learning to code when I was 13. Windows 95. C++ and Pascal (dinosaur languages today). For 20 years I loved it. Made good money. Built a career. Then I started watching kids do in 2 hours what took me a...