Portfolio

Orbit: AI grant scraping for Grants Online

An AI grant scraping pipeline that turned a year of manual review into minutes per grant.

What did we do?

Case study.

How Pixelhop built Orbit for Grants Online: a continuous AI grant scraping pipeline that turned a year of manual review into minutes per grant.

~4 mins 93% 4 weeks
per grant scrape, run in parallel of updates auto-applied without human touch from kickoff to production MVP

Grants Online runs a UK grants directory used by businesses, charities, schools, and councils to find funding they can actually apply for. Around 3,000 grants. A small team. A single source of truth that has to stay accurate while the funding landscape changes underneath it.

The job of keeping it accurate, before Orbit, was a year long. One pass across the catalogue. Funding amounts changing, deadlines moving, eligibility tightening, whole grants closing. By the time the team finished the loop, the start of the loop was already out of date. New grants piled up faster than old ones could be checked.

Andreas Lichters, the founder, came to us with the idea. He didn't want an AI that replaced his team. He wanted one that did the grunt work so his team could spend their time on judgement calls. Our job, as an AI product studio, was to make that real. And to make it cheap enough to actually run.

How do you trust an AI to update a live grants database?

Not by having a human approve every change. That defeats the point.

This was the design decision everything else hangs off. If every AI update needed a human eye, we'd just be shifting the bottleneck from "review the whole catalogue once a year" to "review every AI suggestion, forever". No time saved. The brief was different. The system needed to publish routine changes on its own, and only pull a human in when something looked unusual.

So Orbit has two layers. The first is Orbit Admin, a working surface where staff can see every proposed change as a field-by-field diff: current value on the left, proposed value on the right, accept, reject, or edit toggles per field. The second is the Gatekeeper, a separate AI agent whose only job is to look at each proposed change, compare it to the existing record, and decide whether a human needs to see it.

The Gatekeeper risk panel for a single scrape run, showing a 60/100 risk score, a manual-review decision, and a list of reasons including 'title identity change' and 'primary link host changed'. The Gatekeeper risk-scores every scrape and decides whether a human needs to look. The reasons it flags are the ones a human would have flagged.

The Gatekeeper scores every run from 0 to 100 against signals like "the title identity has shifted", "the primary link host has changed", "the location scope has changed dramatically". Low-risk changes (a tightened deadline, a refreshed paragraph) auto-publish without anyone in the loop. High-risk changes (a fundamentally different grant under the same URL) get blocked and routed to the review queue. The team sees only the cases that genuinely need their judgement.

Orbit Admin's Proposed Changes view, showing the current value of a grant on the left and the AI's proposed new value on the right, with an accept-or-reject toggle on each field. When the Gatekeeper does flag a change for review, this is what staff see. Field-by-field, accept or reject.

We started cautious. Early in the project the Gatekeeper was strict and most changes flowed through the review queue. We watched what got flagged, watched what the team accepted, watched the false-positives pile up. As confidence in the agents grew, we relaxed the thresholds. Now 93% of updates publish without a human touch, and the queue holds only the genuinely ambiguous cases.

The principle, named: earned autonomy. Not every AI update needs a human signing it off. But the AI has to earn the trust to publish on its own, and the system needs a way to recognise when it should slow down and ask. That's what the Gatekeeper is for. We'll apply this pattern again.

How do you scrape thousands of different grant sites without breaking the bank?

Every grant funder builds their site differently. WordPress, custom CMSes, PDFs, hand-rolled HTML from 2007. There is no shared schema, no shared layout, no shared anything. A traditional scraper would need a custom rule per source and a developer on call when each one changed.

So we did the opposite. Three specialised AI agents, each doing one thing well: one scrapes the page, one categorises the grant, one judges whether the result needs human review. They're orchestrated through Mastra, a TypeScript agent framework, and they share state through a dedicated Payload CMS instance that acts as the AI's working scratchpad. Scraping itself runs on Firecrawl, which handles the awkward parts (JavaScript-rendered pages, anti-bot protection) so the agents don't have to.

The Mastra Studio workflow editor, showing the eight-step grant workflow with scraping, categorisation, region selection, and gatekeeper branches all visualised as a node graph. The full grant workflow in Mastra Studio. Scrape, categorise, review, save: each box a specialised agent doing one job.

The cost discipline was the harder problem. LLM calls aren't free, and a system that scrapes 3,000 grants on a rolling schedule could very easily run a bill that makes the whole thing pointless. The rule we held to: use AI only where it actually earns its keep. Page fetching, link-following, validation, deduplication, database writes: all conventional code. The expensive bits (language understanding, classification, judgement) only run where the page genuinely needs them.

The result is a scrape that completes in around four minutes per grant, runs many in parallel, and produces a structured draft ready for review. The whole pipeline ships into a Nuxt admin dashboard hosted on Railway, with search powered by Meilisearch (text, facets, and vector search in one). Not a workflow you wire up on a Friday afternoon. A pipeline that has to keep running on its own.

How do users actually find the grants?

Grants Online were already running an AI chatbot on their site, built on Chat Thing, Pixelhop's own AI agent product. So we didn't need to invent a new chat surface. We connected the new pipeline straight into the one they already had.

Meet Max, "Your Virtual Funding Advisor". Max sits embedded in the main Grants Online site. A user types in who they are and what they need ("we're a primary school in Cardiff, what grants are available to set up a breakfast club?"). Max asks clarifying questions, runs targeted searches against the Orbit search API, and returns a ranked, sharable shortlist of grants the user can save and revisit.

The Max chatbot embedded on grantsonline.org.uk, asking a user clarifying questions about their biodiversity project before searching for grants. Max asks the clarifying questions a researcher would ask, then searches Orbit's database and returns a ranked shortlist.

Feedback from real users since Max went live has been strong. It's the visible end of the pipeline, but it's only as good as the data underneath. The point of Orbit is that the data underneath is now actually fresh.

Results

A year of manual review compressed into a continuous, automated pipeline. The team still owns the judgement calls. The grunt work runs in the background.

  • ~4 minutes per grant scrape, with many running in parallel. The catalogue refresh that used to take a year now runs on a rolling cadence. New grants get picked up days after they appear, not months later.
  • 93% of AI updates auto-publish without a human touch. We started with the Gatekeeper strict, then relaxed thresholds as the agents earned trust. The review queue is now bounded to the genuinely ambiguous cases, which is the actual measure of success.
  • 4 weeks from kickoff to production MVP, scoped to the Grants for Schools subset. The system now manages the full Grants Online catalogue, around 3,000 grants and growing.

The Orbit Admin dashboard, showing 2,877 total grants, 1,862 active, a 93% workflow success rate, and a chart of recent workflow run activity. The dashboard the Grants Online team checks every morning. Total grants, review queue, weekly activity, all in one view.

What's next

Max is live on grantsonline.org.uk and earning strong feedback from real users. We're tuning the agent's prompting and search behaviour as usage data comes in.

If you're thinking about how to weave AI into a content pipeline you actually trust, take a look at our AI agent development practice or the rest of our work.