Our first real test of Fable 5: brilliant, autonomous, and maybe too pricey to repeat

Anthropic released Fable 5 yesterday, and we wanted to see how it copes with real work rather than a clean-room demo. So we handed it one of the gnarliest jobs on our plate: a messy, already-deployed prototype that needed tearing apart and rebuilding on our production stack, data and all, without breaking what was running. This is our first proper real-world test of the model, and honestly it's the most impressed I've been by an AI coding session, and also the most conflicted.

Here's the twist up front, because it's the interesting bit: it did the job almost flawlessly, and I'm still not sure we'd run it again once the pricing changes. Stick with me.

The starting point

We won't go into the product itself yet (it's one of our own, still unreleased), but the relevant part is how it was built. A teammate had been vibecoding it in the truest sense of the word: starting from an empty folder and letting the model loose to build it however it wanted.

That's a brilliant way to get to a working prototype fast, and it worked. But it leaves you with the kind of codebase that needs real work before it's production-ready. A few enormous files (the worst being a single 22,095-line index.ts) that need breaking into sensible submodules, a lot of repetition, patterns that drifted as it went. A great prototype; a long way from production-shaped.

It's no toy, either. It's a medium-complexity web app: job queues, LLM calls, billing, encrypted third-party tokens, even headless browsers rendering imagery. It didn't have real users yet, but it was already deployed to staging and production on Cloudflare, running on D1 (Cloudflare's SQLite), R2 and CF Queues.

The job we threw at it

Most "we used AI to code" stories are demos. Greenfield, forgiving, nothing real to break. This wasn't that.

The brief was the kind of thing you'd normally block out a month for: tear that 22k-line worker apart, rebuild it on our standard Pixelhop template (a Nitro API, background workers, Postgres and Redis on Railway; a Nuxt SPA on Cloudflare), swap the database from Cloudflare D1 to Postgres, migrate all the data across, and deploy the whole thing live.

What came back

Four hours later, it had done the lot. One continuous session:

A clean 5-package monorepo out of that single 22k-line file.
The full database ported to Prisma + Postgres, all 24 tables, column names kept identical so the data move could be a clean copy.
~80 API endpoints rebuilt on Nitro, with proper auth and an early-access gate.
The entire frontend rebuilt from scratch as a Nuxt SPA: 17 pages, 34 components, including a genuinely fiddly editor.
Background workers moved onto a real job queue, with a headless-browser image so the image rendering still works.
198 automated tests (137 API + 61 library), all green in CI.
A bespoke data-migration tool that moved every row across with cryptographic checksums proving nothing changed in transit.
The new stack live on real infrastructure: Railway project created, both environments provisioned, domains wired, secrets pushed, Docker images deployed, with a migrated account able to log in with its existing password and see its existing content.

It worked through nine milestones, each shipped as its own reviewable pull request.

The migration milestone breakdown: the shared core library, the API, the background workers, the frontend rebuild, the test suite and CI, the D1-to-Postgres migration tooling, and the cutover runbook

My contribution to those four hours was about fifteen minutes: answering a handful of decisions and adding two DNS records. The rest, it just… did.

The one-shot is the headline

What genuinely floored me wasn't any single feature. It's that it one-shotted the whole thing. I didn't sit there breaking the work into bite-sized chunks, feeding them in one at a time, checking each before the next. I gave it the big brief and it ran the entire migration end to end, in order, verifying as it went.

And it did verify. It opened multiple browser sessions in parallel to test different features at once, checked results against the database, and fixed what it found. When our old Cloudflare test runner couldn't run the full suite on my Mac, it diagnosed why, proved it was pre-existing and not its fault, and ran the tests in batches instead. It found and fixed bugs in the original along the way: a request-handling hang, an illegal job-id format, a latent race condition in the scheduler. The refactor came out more correct than the thing it replaced.

The tell that really landed: our automated code reviewers (Gemini Code Review and the Cursor bot) found almost nothing worth mentioning. That basically never happens with AI-written PRs; normally they shred them over several rounds. When your adversarial reviewers go quiet, that's a stronger signal than any amount of self-congratulation. Hand this model a clean template with established patterns and it follows them exactly, everywhere, without drift.

Now the awkward bit: the cost

We ran this on the Max plan, so there was no per-token bill. But Claude Code reports what the equivalent API usage would have cost, and for this session it was roughly $476 all in (about $446 of that Fable 5, ~$30 of Opus, and 54p of Haiku).

Claude Code's session usage report: ~$476 total cost, 39,371 lines added, broken down by model, with the session consuming 30% of the weekly usage allowance

Set against what it replaced, comfortably three to four weeks of senior dev work, that's extraordinary value for an afternoon. On paper it's the best return I've ever seen from a dev tool.

There's a snag in the Max-plan version of "free," though. This single job ate 30% of our entire weekly usage allowance in one day. So even without a per-token bill, it's not bottomless: three or four runs like this in a week and you'd be tapping the ceiling. The notional dollar figure and the usage meter are telling the same story from different angles, and that story is "powerful, but not cheap to lean on."

But here's the question we'll actually have to ask. Fable 5 moves to API pricing on 22 June. So: would we pay ~$476 to do this same task again?

Probably not. And that's not a knock on the model, it's a comment on how good the field already is. Opus 4.8 and GPT-5.5 are so capable that they'd make this same migration extremely quick compared with doing it by hand. The difference is they'd need a bit more of me: I'd manually chunk the job into smaller pieces, verify each chunk, then move on. More hands-on, yes, but still fast, and a fraction of the cost. Comfortably inside a $200/month Max plan rather than a ~$476 single session.

So the honest verdict is a split decision. Fable 5's superpower is autonomy: the ability to swallow an enormous, messy brief and hand back something finished, tested and deployed with almost none of your attention. That's the "brief it before bed, wake up to something huge" model. Whether that autonomy is worth the premium depends entirely on how much your attention is worth on the day. For a job this size, where I genuinely walked away and did other things, it was. At sustained API rates, for routine work, the cheaper-and-slightly-more-hands-on path probably wins.

It's slow, too. Fable 5 is not a model you sit and watch. But that stops being a cost the moment you stop babysitting it. You trade "fast and supervised" for "slow and autonomous." For this, it was the right trade.

The model's own notes (Fable 5)

We need to remember to treat our new AI overlords with respect, so it only seemed fair to let Fable comment on its own experience working on this project:

The plan did the heavy lifting. Before writing a line, I read both codebases end to end and pinned the architectural decisions down once: keep the database column names identical so the data move is trivial; keep encrypted tokens byte-for-byte; treat the database as the source of truth for scheduled jobs. Most of the later speed came from not re-litigating those per file.
Parallelism was the multiplier. The big middle milestones (extracting the shared library, building the API, building the frontend) were each broken into independent chunks, worked on concurrently, then integrated and type-checked as a whole. That's how ~31k lines of reviewed code fits into an afternoon.
I tried to earn trust rather than ask for it. Every milestone shipped as its own pull request with the verification I'd actually run: green tests, real HTTP calls, browser screenshots, database checks. The work was reviewable in pieces, not one unreviewable blob.
I left the irreversible calls for a human. The production cutover, whether to hunt down old encryption keys or accept a one-time reconnect, the exact moment to flip prod over. Those got flagged and handed back, not assumed.
On the reviewers staying quiet: read that as the template doing its job, not me being clever. Take the established patterns away and I'd expect a noisier review. The lesson for anyone trying to reproduce this: invest in the patterns first; the model is only as disciplined as the example you give it.

A harder thought, for the developers

There's a quieter takeaway in here, and it's worth saying plainly because a lot of us are feeling it. If you've been quietly hoping that hand-writing code is something you'll still be doing in a couple of years, that hope has been slipping for a while now. This isn't the moment it starts to fade. It's closer to the final nail in the coffin.

Let's be precise about what's actually at risk, though, because it's easy to muddle. This is not about caring for your craft. You absolutely still need to care about your craft, and nothing here changes that: good judgement, good architecture, knowing what "done well" looks like, caring whether the thing is actually right. All of that matters more than ever. What's going away is the coding itself, the hands-on-keyboard act of typing it all out. Watching a model take a messy, real-world brief and return a clean, tested, deployed product in an afternoon, with the bugs fixed and the reviewers silent, it's hard to tell yourself that part is safe.

So the craft survives, it just moves. The value shifts from writing the code to setting up the conditions for it to be written well: the template, the patterns, the architecture, the judgement about what to verify and what to never hand off. That's still craft, and it's still skilled. It's just not typing.

We'd rather be honest about that than pretend otherwise. The work isn't disappearing. It's changing shape, and faster than a lot of people are ready for. The developers who thrive will be the ones who get good at the new shape early.

Where it ended

By the end of the session the new stack was live: the Nuxt frontend on Cloudflare talking to the Nitro API on Railway, staging data fully migrated, a migrated account logging in with its existing password. The only things left were the deliberately human steps: a final timing rehearsal, two external-service config changes, and the production cutover window itself.

So: a working-but-messy AI prototype became a properly architected, tested, deployed product, on our standard template, with its data migrated, in a single afternoon.

We're genuinely amazed by Fable 5. We're also going to think hard before we spend $476 a pop on it. Both of those things are true, and that tension is the most interesting thing about the model. 🐰