Should you build it?
You asked me to have a think about building an AI time-and-billing tool. Here's a straight read: what already exists, what the numbers say at your scale, where I'd push back, what Josh has probably built — and the smallest sensible first step.
⚠ Scale check first — the biggest caveat in here
Nearly every figure below comes from mid-sized to large firms. The SPI benchmark is 403 firms averaging dozens-to-hundreds of staff; McKinsey's overruns are "large IT projects"; the time-theft numbers come from hourly, shift-based workforces. You're a 1–10 person shop — and at that size the picture changes hard, mostly against building:
- Your leak is already visible. At ten people you can see every project and every contractor. A 5% leak is a handful of instances you could name, not a systemic gap that needs software to surface.
- Time theft barely applies. Buddy punching is a shift-clock problem. Your leak is professional under-recording and unbilled scope — discount that whole bucket.
- Your invoice volume is tiny. Auto-matching is built for firms processing hundreds a month. You've got a handful — a spreadsheet and ten minutes does it.
- You have no IT function to carry a build. Maintenance — 78% of the cost, forever — lands on the same people who should be billing.
- Even heavy PSA may be overkill — but at £30 a head it's cancellable next month, which a build never is.
Net: at your scale this is almost certainly a discipline-and-spreadsheet problem, not a software-product one. The bar for building anything custom is higher for you, not lower.
1 · What you're actually describing
Three different leaks under one "5%" headline. Different causes, different fixes — worth separating before deciding what to build.
| What you said | What it really is | Where the money goes | The real fix |
|---|---|---|---|
| "Time theft" | Cost leak — paying for hours not worked | Cost line | Behaviour + easy capture |
| "Poor time tracking" | Revenue leak — billable work never captured | Top line (unbilled WIP) | Low-friction logging |
| "Project creep" | Margin leak — work past the LoE, unbilled | Margin | Contract + budget discipline |
| "Validate invoices vs budget" | An AP control — three-way matching | Cost / overpayment | Off-the-shelf rules, not custom AI |
First thing worth nailing down: of your 5%, how much is each? I'd wager you can't split it yet — and the split changes everything about what's worth building.
2 · The picture: buy the commodity, build the edge
Here's the flow you described, mapped against what already exists. Two-thirds of it you can buy. The join in the middle is the only part worth building yourself.
3 · Does it exist? Yes — most of it, cheaply
You said there are plenty of tools but none do what you need. Fair — but it's worth being honest about whether anyone's trialled the category properly. Tap each to expand.
Budget-first PSA — the 90%
Scoro
"Financial visibility first" — every hour ties to a budget, every budget to a margin. Closest to what you described.
Rocketlane
Projects + resourcing + financials in one, strong for delivery teams.
Productive
"Agency OS" — budgets, time, billing in a tidy package. Cheapest serious option.
Kantata
Enterprise-grade resourcing + financials. Almost certainly more than you need.
BigTime
Time + billing + project accounting; popular with services firms.
Accelo
Client work end-to-end — quotes, time, retainers, billing.
AI-native time capture — the friction killer (genuinely new since ~2024)
Timely
"AutoSheet" builds a finished timesheet draft from passively-captured activity, ready to review in one click.
Rize
Fully passive capture; categorises activity by client/project automatically.
TimeSentry
Purpose-built for professional services; auto-fills timesheets as you work.
The invoice-validation bit is already a thing — and it's old
The part you think is the clever idea — auto-approve the invoice if it's within budget, flag it if not — is the oldest control in accounts payable: three-way matching. It compares what was agreed (the PO — your Letter of Engagement allocation), what was delivered (the receiving report — logged time), and what's billed (the invoice). Match within a set tolerance → auto-approve; otherwise → a human. Ramp, Tipalti, Precoro and HighRadius already do this at 95–99% accuracy, ~£3 an invoice, 70–85% fewer errors. Workday VNDLY and SAP already track committed-vs-invoiced spend against a Statement of Work. You'd be rebuilding two mature categories from scratch.
4 · The bit that's genuinely yours to build
To be fair to the idea — there is a real seam. No single off-the-shelf product cleanly chains the whole thing: LoE allocation → time logged against it → contractor invoice → auto-approve within tolerance → finance reconciliation, in one flow, for a firm your size, without enterprise pricing or a six-month rollout. The PSA tools own the time→budget half; the AP/procurement tools own the invoice→PO half. The join is the gap (see the diagram). It's worth a thin custom layer — if the measurement proves the leak is really that shape. It is not worth a "much bigger people, project and resource platform."
5 · What the numbers say
| Claim | Number | Source |
|---|---|---|
| Revenue leakage in services firms | typically 1–5% of annual revenue | Beancount / ERP Blog / BigTime |
| Time theft (payroll) | 1.5–5% of gross payroll; 43% admit padding hours | American Payroll Assoc. |
| Scope creep | 59% call it their #1 challenge; 78% rarely/never bill it; 5–20% margin hit | Beancount / agency studies |
| Billable utilisation (2025) | fell to 68.9% — below 75% optimal; margins at decade-low 9.8% | SPI 2025 Benchmark (403 firms) |
| Cost of building software | 78% of lifetime cost lands after launch | Forrester 2024 |
| How big custom builds usually go | 45% over budget; 56% less value than predicted | McKinsey |
The 5% is credible — that's the part to trust. But read it with the scale check in mind: these are bigger-firm figures. For you the utilisation and time-theft numbers barely apply; what carries across is the unbilled-scope leak and — pointedly — the build-cost and overrun maths, which only get worse the fewer people you have to absorb them.
The voices behind those numbers
6 · Where I'd push back before you build
①The 5% — to what, exactly?
②Has anyone actually trialled the tools?
③"AI at the core" — doing which job?
④Auto-approving "if within budget" removes your control, not adds one.
⑤The build maths is rough — and rougher at your size.
⑥Josh's build — read it right.
⑦"Ultimately a much bigger platform."
7 · What has Josh actually built in Claude?
Worth being precise here, because the answer sets the realistic ceiling. People build a few recognisable things with Claude for time and billing — none of them are a finance system. The most likely candidates, in order:
Most likely A personal timesheet drafter
He feeds Claude his calendar, notes or chat logs; Claude returns a categorised, billable timesheet draft he tidies and submits. Possibly wired with an MCP connector (Temponia, Timesheet.io, TallyHo all ship these now) so Claude reads/writes entries directly.
Plausible A Claude-Code mini app
A small local app (a simple table + SQLite) that logs his hours per project and shows budget burn, maybe spits out an invoice. Looks impressively "real."
Possible An invoice/budget checker
A Claude prompt or sheet that takes an invoice + the agreed budget and flags whether it's within allocation — a manual, one-at-a-time version of three-way matching.
The five questions to ask Josh
So you can place it on the chart above in two minutes:
- What does it take as input — your calendar, manual notes, or Claude's own logs?
- Where does the data live — a spreadsheet, a local database, or just inside a Claude chat?
- Does it just draft your timesheet, or does it touch money / invoices?
- Is anyone else using it, or just you?
- When it breaks, who fixes it — and how long does that take?
If the honest answers are "my calendar / a spreadsheet / drafts only / just me / I patch it myself" — then it's a brilliant personal hack and a terrible template for a company finance system. Which is exactly the right thing to learn before Friday.
8 · The smallest sensible first step
Not a platform. Three moves, smallest first.
- Measure before you build (a fortnight). Instrument two or three live projects and find out where the 5% actually comes from, by type. The number is the case for everything that follows — and at your size you can probably get most of it from what you already have.
- Trial the off-the-shelf 90% in parallel. Run one budget-first PSA — Scoro or Rocketlane — properly, for 30 days, on a real project. Every hour to a budget, live burn, auto-flag on over-allocation. If it covers 90% at ~£30 a head, most of the build question goes away.
- Build only the join — and only if step 1 justifies it. A thin reconciliation layer: LoE allocation × logged time × contractor invoice → auto-approve within tolerance → exception queue to finance. Start as a rules sheet on top of the bought tools, not a ground-up app. That — and only that — is the bit genuinely worth owning.
Happy to dig into any of this on Friday, and to take a proper look at whatever Josh sends through.
Prepared by Articulate AI for Equals Five · Confidential · 3 June 2026 · back to top