AI that finds the revenue your clinicians are leaving behind.
Behavioral health organizations systematically undercode, not because clinicians are negligent, but because CPT and E&M coding is genuinely complex, inconsistently trained, and completely invisible until a claim is denied. I'm designing the AI tool that surfaces this gap, with audit-ready justification, before money is lost.
There's money missing from every behavioral health claim.
Behavioral health is one of the most undercoded specialties in medicine. Clinicians spend significant time with patients, time they document meticulously for clinical reasons, and then bill for shorter, lower-reimbursed service codes than the documentation supports.
This isn't negligence. It's a training problem, a workflow problem, and an information problem. CPT coding for behavioral health involves judgment calls, about session duration, service type, add-ons, and payor-specific rules, that most clinicians were never formally taught. The feedback loop is also broken: claims are coded at the point of service, but denials come weeks later, with no connection back to the original coding decision.
For a mid-sized behavioral health organization seeing a few hundred patients a week, the revenue gap can be substantial, often six figures annually, sometimes more. And unlike clinical errors, this one is fixable without changing how care is delivered.
Why now? The data is already there. Behavioral health providers using Eleos generate rich session notes tied to specific clinicians, dates, and durations. With SFTP access to historical billing records, we can cross-reference what was documented against what was billed, without requiring any clinician onboarding or workflow change for the initial analysis.
I started by learning the domain, not the design.
Before I opened Figma, I spent time with billing analysts, RCM directors, and clinicians to understand how coding actually works at behavioral health organizations. This is a domain where the terminology alone, CPT codes, E&M levels, HCPCS modifiers, add-on codes, prior auth, can obscure more than it reveals.
I wanted to understand the jobs-to-be-done at different levels. A billing director wants a portfolio view: where are the biggest opportunities across my organization? A coder working the back office wants a workflow: show me the claims with the highest yield, give me the evidence, let me act. A clinical supervisor wants confidence: is this defensible if we get audited?
"We know we're undercoding. We just have no way to know where or by how much. It shows up in denials occasionally, but by then it's too late."
RCM Director · Discovery interview
What emerged from discovery was a clear primary persona: not the clinician, but the billing back office. The person who already understands coding, is already working claims, and just needs better data. That framing changed everything about the product.
This isn't a minor pivot. Point-of-care suggestions require clinician onboarding, EHR integration, and behavior change at scale. Back office analysis requires SFTP access and a browser. The right initial product was much smaller, and much more deployable.
Four types of uncaptured revenue, each with different evidence requirements.
Not all undercoding looks the same. During discovery, we identified four distinct opportunity types, each with different causes, different evidence quality, and different implications for how confident the system should be in its recommendations.
The ordering matters for the product. Time-based undercoding becomes the default starting point, highest confidence, clearest evidence, easiest for billing staff to verify and act on. Other opportunity types are surfaced with explicit confidence signals and different levels of review friction.
The hardest design question wasn't the layout. It was how certain to sound.
Coding recommendations aren't just suggestions, they have downstream consequences. If a billing coder acts on a bad recommendation, they file an incorrect claim. If they file enough incorrect claims, they're audited. In behavioral health, an audit can mean clawbacks, compliance investigations, and reputational damage.
So the question I kept returning to was: what's the right interaction model for this? I mapped out the spectrum.
I landed somewhere deliberate: not a calculator (too narrow, misses complexity), not an unqualified advisor (too much trust too fast). The system surfaces a recommended code with an explicit evidence chain, the documentation that supports it, the CPT threshold that applies, the specific language from the note. The coder reviews and approves. The AI doesn't act; it informs.
The audit requirement shaped everything. Billing staff told us that any recommendation they act on needs to be defensible if they're ever asked why the code changed. That's not a nice-to-have, it's a compliance requirement. Every recommendation in the interface is built around the evidence that supports it.
Confidence isn't uniform, and the design reflects that.
The four opportunity types don't have equal evidence quality, and pretending they do would be a design failure. Time-based undercoding has objective evidence: duration is a number in the note, the CPT threshold is a fixed rule, the delta is calculable. The model's confidence here is high, and the interface presents it directly, here's what was documented, here's the code it supports, here's the threshold. It's close to arithmetic.
Add-on codes and level-of-service upgrades are different. They require reading the note for clinical content, not just extracting a number. The model is interpreting documentation, not matching it against a threshold. For these opportunity types, I'm designing more friction into the review flow, not less. The evidence panel shows more supporting context. The expected yield is labeled as estimated. The call-to-action language shifts from "apply this" to "review this opportunity."
Designing for the wrong recommendation.
If a coder acts on a bad recommendation and files an incorrect claim, two things need to be true: the coder should have had enough evidence to evaluate it before acting, and there needs to be a traceable record of what supported the decision. This is both an audit requirement and a trust architecture.
Every recommendation has a named approver, the coder who reviewed and acted, and a timestamp. The supporting documentation that triggered the recommendation is preserved alongside it. If the claim gets audited later, the organization can reconstruct exactly what the note said, what threshold applied, and what the coder saw when they made the call. The system doesn't act autonomously. It recommends. The human approves. The record reflects that chain.
This also shapes how I think about false positives. A billing coder who reviews a recommendation, disagrees, and doesn't act has done exactly what the interface was designed for. The friction is intentional. The goal isn't to maximize the number of codes that get changed, it's to maximize the number of correct coding decisions, whether that's an upgrade or a confirmation that the original code was right.
Decisions I've made, and what I gave up to make them.
Active design work always involves tradeoffs. Here's where I've taken a clear position, and what's sitting on the other side of each decision.
What I'm designing against.
This isn't a greenfield product. It's shipping on a specific timeline, into a specific regulatory environment, for a specific buyer persona with specific compliance anxieties.
The constraint I find most interesting is the no-write-back limitation. The product surfaces opportunities, but billing staff have to act on them in their existing EHR or billing system. That means the interface is purely informational, which actually clarifies the design challenge. I'm designing a decision-support tool, not an action tool. Every screen asks: does this person have what they need to go act elsewhere?
This is the work in front of me.
Alpha goes live in April 2026 with our first customer. Here's where I'm focused between now and then, and the questions I'm still working through.
This case study will continue growing as the product does. If you want to talk through any of this, the domain, the trust design, the AI recommendation problem, I'm always up for it.