For James · Equals Five · ahead of Friday · 3 Jun 2026

Should you build it?

You asked me to have a think about building an AI time-and-billing tool. Here's a straight read: what already exists, what the numbers say at your scale, where I'd push back, what Josh has probably built — and the smallest sensible first step.

The honest version. Your 5% instinct is right — the research backs that range from three directions. But the idea is really three different problems wearing one coat, and most of what you'd build already exists off the shelf for £10–50 a head a month. The clever-looking bit — auto-validating contractor invoices against the budget — is a 40-year-old accounting control called three-way matching, already sold by a dozen vendors. There is a real gap worth owning, but it's narrow. My take: measure the leak first, buy the commodity, and only build the small bit that's genuinely yours.

⚠ Scale check first — the biggest caveat in here

Nearly every figure below comes from mid-sized to large firms. The SPI benchmark is 403 firms averaging dozens-to-hundreds of staff; McKinsey's overruns are "large IT projects"; the time-theft numbers come from hourly, shift-based workforces. You're a 1–10 person shop — and at that size the picture changes hard, mostly against building:

  • Your leak is already visible. At ten people you can see every project and every contractor. A 5% leak is a handful of instances you could name, not a systemic gap that needs software to surface.
  • Time theft barely applies. Buddy punching is a shift-clock problem. Your leak is professional under-recording and unbilled scope — discount that whole bucket.
  • Your invoice volume is tiny. Auto-matching is built for firms processing hundreds a month. You've got a handful — a spreadsheet and ten minutes does it.
  • You have no IT function to carry a build. Maintenance — 78% of the cost, forever — lands on the same people who should be billing.
  • Even heavy PSA may be overkill — but at £30 a head it's cancellable next month, which a build never is.

Net: at your scale this is almost certainly a discipline-and-spreadsheet problem, not a software-product one. The bar for building anything custom is higher for you, not lower.

1 · What you're actually describing

Three different leaks under one "5%" headline. Different causes, different fixes — worth separating before deciding what to build.

What you saidWhat it really isWhere the money goesThe real fix
"Time theft"Cost leak — paying for hours not workedCost lineBehaviour + easy capture
"Poor time tracking"Revenue leak — billable work never capturedTop line (unbilled WIP)Low-friction logging
"Project creep"Margin leak — work past the LoE, unbilledMarginContract + budget discipline
"Validate invoices vs budget"An AP control — three-way matchingCost / overpaymentOff-the-shelf rules, not custom AI

First thing worth nailing down: of your 5%, how much is each? I'd wager you can't split it yet — and the split changes everything about what's worth building.

2 · The picture: buy the commodity, build the edge

Here's the flow you described, mapped against what already exists. Two-thirds of it you can buy. The join in the middle is the only part worth building yourself.

Buy · PSA tool Build · the 10% that's yours Buy · AP rules LoE agreed allocation e.g. 3 days Time logged against the budget live burn Reconciliation join match invoice ↔ allocation ↔ time auto-approve within tolerance else → exception to finance Contractor invoice in Three-way match (PO/recv /invoice) Auto-approve within tolerance Exception human checks · never blanket auto
The PSA tools own the left (time→budget). The AP tools own the right (invoice→match). Nobody sells the join cleanly for a firm your size — that's the only custom-worthy 10%, and even it starts as a rules sheet, not an app.

3 · Does it exist? Yes — most of it, cheaply

You said there are plenty of tools but none do what you need. Fair — but it's worth being honest about whether anyone's trialled the category properly. Tap each to expand. (Wordmarks are representative, not the vendors' actual logos.)

Budget-first PSA — the 90%

Scoro

"Financial visibility first" — every hour ties to a budget, every budget to a margin. Closest to what you described.

Best forAgencies wanting one financial source of truth
AI layerELI — plain-language reporting
Price~£25–50 / user / mo
Rocketlane

Projects + resourcing + financials in one, strong for delivery teams.

Best forClient delivery / implementation
AI layerNitro — agentic, flags risk & rebalances
Price~£20–40 / user / mo
Productive

"Agency OS" — budgets, time, billing in a tidy package. Cheapest serious option.

Best forSmall agencies
AI layerAssistive
Price~£10–20 / user / mo
Kantata

Enterprise-grade resourcing + financials. Almost certainly more than you need.

Best for50+ staff, complex portfolios
AI layerYes
PriceCustom (sales call)
BigTime

Time + billing + project accounting; popular with services firms.

Best forBilling-heavy services
AI layerYes
Price~£20–40 / user / mo
Accelo

Client work end-to-end — quotes, time, retainers, billing.

Best forRetainer + project mix
AI layerAssistive
Price~£20–40 / user / mo

AI-native time capture — the friction killer (genuinely new since ~2024)

Timely

"AutoSheet" builds a finished timesheet draft from passively-captured activity, ready to review in one click.

DoesAuto-drafts the timesheet for you
Why it mattersKills the 15-minute logging friction
Rize

Fully passive capture; categorises activity by client/project automatically.

ClaimAgencies recover ~20% more billable time
CaveatVendor figure — treat as directional
TimeSentry

Purpose-built for professional services; auto-fills timesheets as you work.

Best forConsultancies that bill by the hour

The invoice-validation bit is already a thing — and it's old

The part you think is the clever idea — auto-approve the invoice if it's within budget, flag it if not — is the oldest control in accounts payable: three-way matching. It compares what was agreed (the PO — your Letter of Engagement allocation), what was delivered (the receiving report — logged time), and what's billed (the invoice). Match within a set tolerance → auto-approve; otherwise → a human. Ramp, Tipalti, Precoro and HighRadius already do this at 95–99% accuracy, ~£3 an invoice, 70–85% fewer errors. Workday VNDLY and SAP already track committed-vs-invoiced spend against a Statement of Work. You'd be rebuilding two mature categories from scratch.

4 · The bit that's genuinely yours to build

To be fair to the idea — there is a real seam. No single off-the-shelf product cleanly chains the whole thing: LoE allocation → time logged against it → contractor invoice → auto-approve within tolerance → finance reconciliation, in one flow, for a firm your size, without enterprise pricing or a six-month rollout. The PSA tools own the time→budget half; the AP/procurement tools own the invoice→PO half. The join is the gap (see the diagram). It's worth a thin custom layer — if the measurement proves the leak is really that shape. It is not worth a "much bigger people, project and resource platform."

5 · What the numbers say

ClaimNumberSource
Revenue leakage in services firmstypically 1–5% of annual revenueBeancount / ERP Blog / BigTime
Time theft (payroll)1.5–5% of gross payroll; 43% admit padding hoursAmerican Payroll Assoc.
Scope creep59% call it their #1 challenge; 78% rarely/never bill it; 5–20% margin hitBeancount / agency studies
Billable utilisation (2025)fell to 68.9% — below 75% optimal; margins at decade-low 9.8%SPI 2025 Benchmark (403 firms)
Cost of building software78% of lifetime cost lands after launchForrester 2024
How big custom builds usually go45% over budget; 56% less value than predictedMcKinsey

The 5% is credible — that's the part to trust. But read it with the scale check in mind: these are bigger-firm figures. For you the utilisation and time-theft numbers barely apply; what carries across is the unbilled-scope leak and — pointedly — the build-cost and overrun maths, which only get worse the fewer people you have to absorb them.

The voices behind those numbers

SPI Research
PS Maturity Benchmark
"Billable utilisation fell to 68.9% — below the 75% optimal threshold; margins at a decade low."
2025 PS Maturity Benchmark · 403 firms
Forrester
Total Economic Impact
"~78% of a software's lifetime cost accrues after launch, not during the build."
TEI model, 2024
McKinsey
Large IT projects
"Large IT projects run ~45% over budget and deliver 56% less value than predicted."
IT-project research
American Payroll Assoc.
Time & attendance
"Time theft costs 1.5–5% of gross payroll; 43% of hourly staff admit padding hours."
Industry estimates — large hourly workforces
Adoption research
Harvest · Toggl · PlanetBids
"Tools that need 15+ minutes of manual logging are abandoned within 90 days. Adoption is a behaviour problem, not a software one."
The point that matters most for you
The principle that matters most: getting people to track time is a behaviour problem, not a software one. Adoption typically slides 80% → 60% → dead by month four, and the single biggest predictor of success is how fast one entry takes. If the real issue is people not logging time — or logging it late and wrong — a shinier in-house tool fails for exactly the reason the bought ones "didn't work." Build won't fix that.

6 · Where I'd push back before you build

If I were you, these are the questions I'd want answered before anyone writes a line of code. Tap to open.

The 5% — to what, exactly?
A blended 5% is a feeling, not a baseline. Until it's split into theft vs unbilled vs scope creep, you're prescribing before you've diagnosed — and you can't build a business case against a number you can't break down. No baseline, no build.
Has anyone actually trialled the tools?
"None of them do what we need" is the line every firm says before they find they never properly ran the category. Scoro and Rocketlane already tie every hour to a live budget and auto-flag over-burn. Did someone run a real 30-day trial on a live project, or bounce off a demo? Building your own is the most expensive way to discover you skipped the trial.
"AI at the core" — doing which job?
AI helps the capture half (auto-drafted timesheets). But the invoice-approval half is deterministic rules — tolerance matching — and that's exactly where you don't want a probabilistic model deciding to release money. Name the one job only AI can do here, or the framing is a solution looking for a problem.
Auto-approving "if within budget" removes your control, not adds one.
Within budget ≠ the work was done ≠ it was the right work. Auto-release the moment spend fits the envelope and you've automated away the one human checkpoint — and handed a contractor a clean way to burn the whole LoE on the wrong thing, invisibly. The standard answer is tolerance plus a human exception queue, never blanket auto-approve. As written, you'd be deleting the control you're trying to build.
The build maths is rough — and rougher at your size.
78% of the cost turns up after launch. Build this and the firm quietly becomes a software company maintaining a billing product forever — with dev hours that could have been billable. McKinsey: 45% over budget, 56% less value. With 1–10 people there's no spare capacity to absorb any of it; the maintenance lands on whoever's meant to be earning fees. The opportunity cost alone could swallow the 5% several times over.
Josh's build — read it right.
Definitely get it in front of you — it's the cheapest learning you'll find. But what he built is a personal hack (see the next section). Going from a one-person Claude tool to company-wide, finance-grade, audited, multi-user invoice reconciliation is a category jump, not a copy job. Look at it; don't let a weekend build set the estimate for the real thing.
"Ultimately a much bigger platform."
That line is how this dies. The bigger platform you're picturing is exactly the roadmap Kantata, Scoro and Workday run with hundreds of engineers. Naming the grand vision now is how scope creep gets into the very project meant to fix scope creep.

7 · What has Josh actually built in Claude?

Worth being precise here, because the answer sets the realistic ceiling. People build a few recognisable things with Claude for time and billing — none of them are a finance system. The most likely candidates, in order:

Most likely A personal timesheet drafter

He feeds Claude his calendar, notes or chat logs; Claude returns a categorised, billable timesheet draft he tidies and submits. Possibly wired with an MCP connector (Temponia, Timesheet.io, TallyHo all ship these now) so Claude reads/writes entries directly.

What it really isA one-click draft for one person
CeilingSaves him 20 min/week. Not a system of record.
Plausible A Claude-Code mini app

A small local app (a simple table + SQLite) that logs his hours per project and shows budget burn, maybe spits out an invoice. Looks impressively "real."

What it really isA single-user prototype, no auth, no audit trail
CeilingPolished ≠ production. This is the shadow-IT trap.
Possible An invoice/budget checker

A Claude prompt or sheet that takes an invoice + the agreed budget and flags whether it's within allocation — a manual, one-at-a-time version of three-way matching.

What it really isHelpful triage, still needs his eyes on every one
CeilingDoesn't move money, doesn't reconcile at scale

The five questions to ask Josh

So you can place it on the chart above in two minutes:

  1. What does it take as input — your calendar, manual notes, or Claude's own logs?
  2. Where does the data live — a spreadsheet, a local database, or just inside a Claude chat?
  3. Does it just draft your timesheet, or does it touch money / invoices?
  4. Is anyone else using it, or just you?
  5. When it breaks, who fixes it — and how long does that take?

If the honest answers are "my calendar / a spreadsheet / drafts only / just me / I patch it myself" — then it's a brilliant personal hack and a terrible template for a company finance system. Which is exactly the right thing to learn before Friday.

8 · The smallest sensible first step

Not a platform. Three moves, smallest first.

  1. Measure before you build (a fortnight). Instrument two or three live projects and find out where the 5% actually comes from, by type. The number is the case for everything that follows — and at your size you can probably get most of it from what you already have.
  2. Trial the off-the-shelf 90% in parallel. Run one budget-first PSA — Scoro or Rocketlane — properly, for 30 days, on a real project. Every hour to a budget, live burn, auto-flag on over-allocation. If it covers 90% at ~£30 a head, most of the build question goes away.
  3. Build only the join — and only if step 1 justifies it. A thin reconciliation layer: LoE allocation × logged time × contractor invoice → auto-approve within tolerance → exception queue to finance. Start as a rules sheet on top of the bought tools, not a ground-up app. That — and only that — is the bit genuinely worth owning.

Happy to dig into any of this on Friday, and to take a proper look at whatever Josh sends through.

Sources: SPI Research 2025 PS Maturity Benchmark · Forrester 2024 · American Payroll Association · McKinsey · product material from Scoro, Rocketlane, Kantata, Productive, BigTime, Accelo, Timely, Rize, TimeSentry, Workday VNDLY, SAP · AP automation (Ramp, Tipalti, Precoro, HighRadius) · MCP timesheet connectors (Temponia, Timesheet.io, TallyHo) · adoption research (Harvest, Toggl). Figures are vendor- and analyst-reported; single-vendor stats (e.g. Rize's 20%) are directional. Vendor wordmarks are representative, not the companies' actual logos.

Prepared by Articulate AI for Equals Five · Confidential · 3 June 2026 · back to top