Our first real-world test of Anthropic's Fable 5: running inside Claude Code, it autonomously rebuilt a messy, already-deployed prototype onto our production stack in a single afternoon. Hugely impressive and near-faultless, but slow and notionally pricey, which raises a real question once API pricing lands.
Anthropic released Fable 5 yesterday, and we wanted to see how it copes with real
work rather than a clean-room demo. So we handed it one of the gnarliest jobs on our
plate: a messy, already-deployed prototype that needed tearing apart and rebuilding
on our production stack, data and all, without breaking what was running. This is our
first proper real-world test of the model, and honestly it's the most impressed I've
been by an AI coding session, and also the most conflicted.
Here's the twist up front, because it's the interesting bit: it did the job almost
flawlessly, and I'm still not sure we'd run it again once the pricing changes. Stick
with me.
The starting point
We won't go into the product itself yet (it's one of our own, still unreleased), but
the relevant part is how it was built. A teammate had been vibecoding it in the
truest sense of the word: starting from an empty folder and letting the model loose
to build it however it wanted.
That's a brilliant way to get to a working prototype fast, and it worked. But it
leaves you with the kind of codebase that needs real work before it's production-ready.
A few enormous files (the worst being a single 22,095-line index.ts) that need
breaking into sensible submodules, a lot of repetition, patterns that drifted as it
went. A great prototype; a long way from production-shaped.
It's no toy, either. It's a medium-complexity web app: job queues, LLM calls,
billing, encrypted third-party tokens, even headless browsers rendering imagery. It
didn't have real users yet, but it was already deployed to staging and production on
Cloudflare, running on D1 (Cloudflare's SQLite), R2 and CF Queues.
The job we threw at it
Most "we used AI to code" stories are demos. Greenfield, forgiving, nothing real to
break. This wasn't that.
The brief was the kind of thing you'd normally block out a month for: tear that
22k-line worker apart, rebuild it on our standard Pixelhop template (a Nitro API,
background workers, Postgres and Redis on Railway; a Nuxt SPA on Cloudflare), swap
the database from Cloudflare D1 to Postgres, migrate all the data across, and
deploy the whole thing live.
What came back
Four hours later, it had done the lot. One continuous session:
- A clean 5-package monorepo out of that single 22k-line file.
- The full database ported to Prisma + Postgres, all 24 tables, column names
kept identical so the data move could be a clean copy.
- ~80 API endpoints rebuilt on Nitro, with proper auth and an early-access gate.
- The entire frontend rebuilt from scratch as a Nuxt SPA: 17 pages, 34
components, including a genuinely fiddly editor.
- Background workers moved onto a real job queue, with a headless-browser image so
the image rendering still works.
- 198 automated tests (137 API + 61 library), all green in CI.
- A bespoke data-migration tool that moved every row across with cryptographic
checksums proving nothing changed in transit.
- The new stack live on real infrastructure: Railway project created, both
environments provisioned, domains wired, secrets pushed, Docker images deployed,
with a migrated account able to log in with its existing password and see its
existing content.
It worked through nine milestones, each shipped as its own reviewable pull request.

My contribution to those four hours was about fifteen minutes: answering a
handful of decisions and adding two DNS records. The rest, it just… did.
The one-shot is the headline
What genuinely floored me wasn't any single feature. It's that it one-shotted the
whole thing. I didn't sit there breaking the work into bite-sized chunks, feeding
them in one at a time, checking each before the next. I gave it the big brief and it
ran the entire migration end to end, in order, verifying as it went.
And it did verify. It opened multiple browser sessions in parallel to test
different features at once, checked results against the database, and fixed what it
found. When our old Cloudflare test runner couldn't run the full suite on my Mac, it
diagnosed why, proved it was pre-existing and not its fault, and ran the tests in
batches instead. It found and fixed bugs in the original along the way: a
request-handling hang, an illegal job-id format, a latent race condition in the
scheduler. The refactor came out more correct than the thing it replaced.
The tell that really landed: our automated code reviewers (Gemini Code Review and
the Cursor bot) found almost nothing worth mentioning. That basically never happens
with AI-written PRs; normally they shred them over several rounds. When your
adversarial reviewers go quiet, that's a stronger signal than any amount of
self-congratulation. Hand this model a clean template with established patterns and
it follows them exactly, everywhere, without drift.
Now the awkward bit: the cost
We ran this on the Max plan, so there was no per-token bill. But Claude Code reports
what the equivalent API usage would have cost, and for this session it was roughly
$476 all in (about $446 of that Fable 5, ~$30 of Opus, and 54p of Haiku).

Set against what it replaced, comfortably three to four weeks of senior dev work,
that's extraordinary value for an afternoon. On paper it's the best return I've
ever seen from a dev tool.
There's a snag in the Max-plan version of "free," though. This single job ate
30% of our entire weekly usage allowance in one day. So even without a per-token
bill, it's not bottomless: three or four runs like this in a week and you'd be
tapping the ceiling. The notional dollar figure and the usage meter are telling the
same story from different angles, and that story is "powerful, but not cheap to lean
on."
But here's the question we'll actually have to ask. Fable 5 moves to API pricing on
22 June. So: would we pay ~$476 to do this same task again?
Probably not. And that's not a knock on the model, it's a comment on how good
the field already is. Opus 4.8 and GPT-5.5 are so capable that they'd make this
same migration extremely quick compared with doing it by hand. The difference is
they'd need a bit more of me: I'd manually chunk the job into smaller pieces, verify
each chunk, then move on. More hands-on, yes, but still fast, and a fraction of the
cost. Comfortably inside a $200/month Max plan rather than a ~$476 single session.
So the honest verdict is a split decision. Fable 5's superpower is autonomy: the
ability to swallow an enormous, messy brief and hand back something finished, tested
and deployed with almost none of your attention. That's the "brief it before bed,
wake up to something huge" model. Whether that autonomy is worth the premium depends
entirely on how much your attention is worth on the day. For a job this size, where I
genuinely walked away and did other things, it was. At sustained API rates, for
routine work, the cheaper-and-slightly-more-hands-on path probably wins.
It's slow, too. Fable 5 is not a model you sit and watch. But that stops being a
cost the moment you stop babysitting it. You trade "fast and supervised" for "slow
and autonomous." For this, it was the right trade.
The model's own notes (Fable 5)
We need to remember to treat our new AI overlords with respect, so it only seemed
fair to let Fable comment on its own experience working on this project:
- The plan did the heavy lifting. Before writing a line, I read both codebases
end to end and pinned the architectural decisions down once: keep the database
column names identical so the data move is trivial; keep encrypted tokens
byte-for-byte; treat the database as the source of truth for scheduled jobs. Most
of the later speed came from not re-litigating those per file.
- Parallelism was the multiplier. The big middle milestones (extracting the
shared library, building the API, building the frontend) were each broken into
independent chunks, worked on concurrently, then integrated and type-checked as a
whole. That's how ~31k lines of reviewed code fits into an afternoon.
- I tried to earn trust rather than ask for it. Every milestone shipped as its
own pull request with the verification I'd actually run: green tests, real HTTP
calls, browser screenshots, database checks. The work was reviewable in pieces, not
one unreviewable blob.
- I left the irreversible calls for a human. The production cutover, whether to
hunt down old encryption keys or accept a one-time reconnect, the exact moment to
flip prod over. Those got flagged and handed back, not assumed.
- On the reviewers staying quiet: read that as the template doing its job, not me
being clever. Take the established patterns away and I'd expect a noisier review.
The lesson for anyone trying to reproduce this: invest in the patterns first; the
model is only as disciplined as the example you give it.
A harder thought, for the developers
There's a quieter takeaway in here, and it's worth saying plainly because a lot of us
are feeling it. If you've been quietly hoping that hand-writing code is something
you'll still be doing in a couple of years, that hope has been slipping for a while
now. This isn't the moment it starts to fade. It's closer to the final nail in the
coffin.
Let's be precise about what's actually at risk, though, because it's easy to muddle.
This is not about caring for your craft. You absolutely still need to care about your
craft, and nothing here changes that: good judgement, good architecture, knowing what
"done well" looks like, caring whether the thing is actually right. All of that
matters more than ever. What's going away is the coding itself, the hands-on-keyboard
act of typing it all out. Watching a model take a messy, real-world brief and return a
clean, tested, deployed product in an afternoon, with the bugs fixed and the reviewers
silent, it's hard to tell yourself that part is safe.
So the craft survives, it just moves. The value shifts from writing the code to
setting up the conditions for it to be written well: the template, the patterns, the
architecture, the judgement about what to verify and what to never hand off. That's
still craft, and it's still skilled. It's just not typing.
We'd rather be honest about that than pretend otherwise. The work isn't disappearing.
It's changing shape, and faster than a lot of people are ready for. The developers who
thrive will be the ones who get good at the new shape early.
Where it ended
By the end of the session the new stack was live: the Nuxt frontend on Cloudflare
talking to the Nitro API on Railway, staging data fully migrated, a migrated account
logging in with its existing password. The only things left were the deliberately
human steps: a final timing rehearsal, two external-service config changes, and the
production cutover window itself.
So: a working-but-messy AI prototype became a properly architected, tested, deployed
product, on our standard template, with its data migrated, in a single afternoon.
We're genuinely amazed by Fable 5. We're also going to think hard before we spend
$476 a pop on it. Both of those things are true, and that tension is the most
interesting thing about the model. 🐰