Eleos Health · 2024–2026

app.clearchart.io  ·  Mary Thompson
Dashboard Notes Scheduling
Save
Delete
Send Alert
Presenting Problem
Client presents with ongoing anxiety related to work stress. Reports disrupted sleep and difficulty concentrating. Engaged and motivated for therapy.
Interventions
Applied CBT techniques including thought challenging and behavioral activation. Reviewed homework from last session.
Diagnosis *
No diagnosis entered
Note Quality
Mary Thompson · Apr 3, 2025
Open Items (4)
!
Actionable Plan

Consider adding specific, measurable goals to the treatment plan.

!
Golden Thread

Link between interventions and treatment goals not explicit.

!
Short / Empty Fields
!
Copy / Paste

Catching documentation errors before they become billing problems.

Behavioral health organizations operate under strict documentation requirements, and the stakes are real. Their clinical notes determine reimbursement, drive compliance audits, and in many cases protect state funding. The problem was timing: compliance review happened after submission, often months later. By then, there was nothing useful left to fix. I designed Live Quality Assist to move that check into the moment of writing.

Role

Lead Product Designer

Timeline

2024–2026 (sprint to beta)

Platform

Chrome extension / EHR sidebar

Collaborators

Product, Engineering, Clinical

NDA note

This work is covered by an NDA, so I don't share screenshots or internal artifacts. This case study focuses on the thinking behind it: the research, directions explored, and the key decisions and tradeoffs that shaped what shipped. I'm happy to walk through the product and design work in more detail in conversation.

Behavioral health organizations live and die by their documentation.

Medicaid reimbursement in behavioral health is tightly tied to clinical documentation. Every session a provider writes about needs to meet specific compliance criteria, or the claim risks being rejected, delayed, or clawed back in an audit. Behavioral health claims are denied at nearly twice the rate of other medical specialties, around 30% vs. 19% industry-wide. According to OIG audits, 61% of mental health Medicare claims contain some type of regulatory error. For many organizations, this documentation also determines their standing with state funders. A pattern of non-compliant notes isn't just a billing headache, it's an existential risk.

The people responsible for catching these issues are Clinical Quality Improvement (CQI) teams. But their process was entirely manual. With hundreds of providers writing notes daily, they could realistically review only 5 to 10 percent of charts. The rest went out the door unchecked. For even a small practice, documentation-related denials can represent $85,000–$120,000 in lost annual revenue, and 65% of those denied claims are never resubmitted. The money is simply written off.

The core tension: Claims are submitted within days of documentation. But compliance review happened weeks or months later. By the time a problem was flagged, the clinician had no memory of the session, the note couldn't meaningfully be corrected, and the organization was already exposed. In 2024 alone, dollars at risk from payer audits increased fivefold and coding-related denials surged by over 125%. The window for catching these errors early was getting more valuable, not less.

This was the problem Eleos set out to solve, not by building better audit tools after the fact, but by moving compliance guidance into the moment of documentation itself.

30%

behavioral health claim denial rate, nearly double the 19% average across other medical specialties

61%

of mental health Medicare claims contain a regulatory error, per OIG audits, most of which trace back to documentation

65%

of denied claims are never resubmitted, the revenue is simply written off, making prevention far more valuable than correction

We started with a design sprint. The problem was bigger than we expected.

This project started as a design sprint internally named "Verify." Over four sessions with product, engineering, and clinical stakeholders, we mapped two distinct problems: CQI teams were overwhelmed trying to review documentation at scale, and clinicians were writing notes without any real-time feedback on compliance quality.

Early on, we explored tools for both audiences. The CQI side was compelling, but a key insight changed the direction: claims are submitted within days of documentation. Even if a CQI team caught a problem immediately, there was a very short window for a clinician to reopen, fix, and resubmit a note. The further the review happened from the moment of writing, the less actionable the feedback.

What CQI teams needed

Visibility into compliance patterns across hundreds of providers. The ability to prioritize which notes to review and act on the highest-risk ones. A way to scale their oversight beyond the 5–10% they could manually review.

What clinicians needed

Real-time feedback during documentation, while the session was still fresh. Specific, actionable guidance rather than vague post-submission corrections. Something that felt like a coach, not an auditor.

We decided to build clinician-facing first. Preventing a bad note from being submitted was worth more than catching it afterward. If the check happened at the right moment, while someone was still writing, the same feedback that would trigger a correction weeks later could instead guide a better note in real time.

That decision also defined the product boundaries clearly. LQA is strictly a clinician-facing tool. It analyzes a single note while it's being written, before submission. It has no visibility into historical notes, no org-level analytics, and no CQI dashboard functionality. That's a separate product entirely.

This case study

Live Quality Assist (LQA)

Clinician-facing. Analyzes the note currently being written, upstream before submission. Shows real-time pass/fail for 7 quality checks with expandable AI reasoning. Lives in the EHR sidebar as a dedicated Quality tab.

Separate product

Compliance Dashboard

CQI and admin-facing. Analyzes all submitted notes at the organization level. Shows quality trends, provider comparisons, and compliance patterns across hundreds of clinicians. Post-submission. Built for leadership and quality teams.

Verify Sprint · Storyboard Understand Diverge Decide
Problems & Insights
Pain Point
CQI teams can only review 5–10% of notes. The rest go out unchecked.
Insight
Claims submitted within days of writing. Compliance review happens weeks later.
User need
Clinicians want feedback while the session is still fresh, not after.
Flows Mapped
Clinician flow
Write note
Save & submit
Denial (weeks later)
CQI review flow
Pull charts
Flag issues (too late)
Key gap
Timing: review after submission has almost no impact on fixing.
Sprint Decision
Direction
Build clinician-facing first. Prevention beats correction every time.
HMW
How might we move compliance review into the moment of writing?
Out of scope
No CQI dashboard, no historical notes. Single note, in context, before submit.

Early storyboards from the Verify design sprint, mapping clinician and CQI team workflows. The sprint surfaced the timing gap that became the central design problem.

We went through a lot of directions before landing on the one that actually made sense.

Before any wireframes, I did a competitive analysis of writing quality tools, Grammarly, ProWritingAid, Quillbot, and IBM Watson NLU. Not because any of them solved a clinical compliance problem, but because they'd already made hard decisions about the exact interaction questions we were wrestling with: when to surface feedback, how to communicate severity, and how to separate what's required from what's just preferable. Grammarly's tiered feedback categories gave us a vocabulary for "must change vs. nice to change." ProWritingAid's toggle for real-time vs. summary mode shaped how we thought about trigger timing. IBM Watson's text and sentiment analysis showed both the promise and the ceiling of AI-driven feedback in specialized domains. None of these tools were built for a 15-minute clinical session with legal documentation stakes, but mapping where they succeeded, and where they failed, gave us sharper language for the tradeoffs we were about to make.

That research sharpened three foundational questions we brought into the sprint, none of which were really about UI:

1. Live help during writing, or analysis after the note is complete? Intervening in real time risked disrupting flow mid-session. Post-writing was less intrusive but arrived after the clinical context had already faded.

2. Auto-correct the note, or show the clinician what's wrong and let them decide? Auto-fix was faster, but a therapy note is a legal record, software rewriting it wholesale isn't a UX question, it's a documentation integrity question.

3. How do we define "quality" in a clinical note, and is it the same as "compliance"? Compliance is regulatory. Quality is clinical. They overlap, but they're not the same, and conflating them would send clinicians the wrong signal about what actually matters.

From there the sprint generated seven distinct directions, each built on a different hypothesis about where and how to intervene. Some failed under scrutiny. Some were right in instinct but wrong in execution. A few informed the final design without making it in themselves.

Not chosen
Notes Quality Check
Review before submitting
!
!
Fix All Issues

Post-writing checklist with a "Fix All" button

After finishing a note, the clinician sees a compliance checklist with all required elements flagged. A single "Fix Issues" button handles everything at once and shows a diff of what changed. The interaction is clean and fast, but the premise is wrong. Auto-correcting clinical documentation removes the clinician from the judgment call entirely. A therapy note is a legal record of what happened in a session. We can't have software rewriting it wholesale, and any design that made that feel frictionless was hiding the real problem.

Evolved into LQA
72
Quality score
2 issues to fix
Must Change
Nice to Change

Overall quality score with tiered "Must Change / Nice to Change" feedback

Real-time analysis gives a score (0–100) divided into two tiers: compliance requirements ("Must Change") and grammar/style suggestions ("Nice to Change"). The tiering was a genuinely useful insight, separating what's clinically required from what's stylistically preferred clarified what actually mattered. But the score itself invited gaming rather than reflection. And mixing a regulatory failure with a passive-voice flag in the same list blurred urgency in ways that could send the wrong message. The tiering concept carried forward. The numerical score and the style suggestions did not.

Deprioritized
Your documentation quality
78
↑ 5 pts this week

Personal progress dashboard: track your quality improvement over time

A clinician-facing view showing their own improvement metrics over time, fewer errors, higher scores, notes written, trends by check category. The intent was to create investment in the tool by making progress visible and adding something that felt rewarding rather than corrective. The problem surfaced immediately in research: every clinician had the same first reaction. "Who else can see this?" The surveillance concern was real and consistent. A progress dashboard for an individual clinician mid-session felt less like coaching and more like being watched. This concept moved to the org-level Compliance Dashboard, where it makes sense for leadership and CQI teams, not for the clinician writing a note.

Explored
What best describes your role?
We'll tailor feedback to your documentation requirements
Therapist
Counselor
Psychiatrist
Nurse Practitioner
Peer Coordinator

Role-based onboarding quiz to personalize the experience

A short onboarding quiz would let clinicians self-select their credential, Therapist, Counselor, Psychiatrist, Nurse Practitioner, Peer Coordinator, and the product would tailor its feedback accordingly. Two things were right about this idea: first, a product that corrects people's work needs delight built into the experience, not just utility. Second, documentation requirements genuinely differ by credential, so personalization wasn't just a nice-to-have. What was too complex was the full branching logic it required. A simplified version of the thinking, understanding that "quality" means different things for different providers, shaped how we defined the 7 checks and their weighting rather than surfacing as a literal quiz.

Explored
What are you documenting today?

Set documentation intent before writing; receive tailored feedback

Inspired by Grammarly's goal-setting model: before writing, you declare your intent and audience, and the feedback engine calibrates itself. For clinical documentation this translated to options like "Documenting Progress," "Risk Management," or "Crisis Intervention Documentation", each emphasizing different quality criteria. The concept also raised a question that shaped later roadmap thinking: should goals be set per clinician, or per organization? A supervisor defining documentation standards for their team is a fundamentally different use case than a provider setting their own preferences per note. The org-level configuration question eventually fed into the Custom Rules feature. Per-note goal setting was too much friction for a 15-minute session.

Informed the approach
How much feedback do you want?
Minimal
Detailed
You control the experience

Let clinicians control how assertive or detailed the feedback is

A slider or preference setting that would let clinicians dial in how aggressively the tool surfaced issues, more detailed for providers who wanted to learn, lighter-touch for experienced writers who just needed a compliance check. The appeal was reducing friction with resistant users by giving them agency over the experience rather than having it imposed on them. In practice, the added complexity wasn't worth the benefit, and it risked creating a "low" setting that let things slip through. The core insight, that different providers have wildly different tolerances for interruption, did carry forward directly into the "minimal by default, expandable on demand" principle that defines the final design.

What shipped
Note Quality
Mary Thompson
Medical Necessity
Presenting Problem
!
Golden Thread
Link between interventions and treatment goals not explicit.
Treatment Plan Alignment
!
Actionable Plan
Check Note Quality

Dedicated Quality tab: pass/fail per check, expandable AI reasoning

A permanent Quality tab in the Eleos sidebar showing each of the 7 checks as a simple pass or flagged indicator. No score. No style suggestions. Tap any result to expand the AI's specific reasoning, tied directly to the note content. Fast users glance and move on. Clinicians who want to understand a flag, or contest one, can dig in. The feature is always in the same place regardless of EHR state. It draws on what worked from six rejected directions: the tiering clarity from Wants vs Needs, the "no auto-fix" principle from the Checklist failure, the delight-through-personalization instinct from Gamification, and the restraint learned from Tone Adjuster.

The final direction wasn't a single breakthrough. It came from the accumulation of everything that didn't work; six rejected directions whose failures each removed one wrong assumption.

Every major call involved a tradeoff between what would be ideal and what was actually buildable.

I was the only designer on LQA across its full lifecycle, from the initial sprint through the Beta redesign. That meant owning the end-to-end experience: when analysis fires, what it shows, how clinicians interact with feedback mid-session, and how the feature surfaces across different EHR states. Here are the decisions that mattered most.

From overlay to dedicated tab

Early concepts surfaced compliance feedback in an overlay that appeared on demand. In testing, it read as a pop-up rather than a tool, something to dismiss rather than engage with. Moving to a dedicated Quality tab gave LQA a permanent, consistent home in the Eleos sidebar. Clinicians always knew where to find it. It also made the feature feel integrated into the product rather than bolted on, which mattered for adoption in an environment where clinicians are already resistant to new workflows.

Design constraint: needed to work within existing sidebar architecture without rebuilding tabs

Triggering on the last text field, not every keystroke

When to fire the analysis was genuinely harder than it looked. Firing on every field change would have been expensive, noisy, and disruptive mid-session. Firing only on demand meant users had to remember to initiate it. The insight came from watching how clinicians actually worked: clicking into the last text field is a natural signal that you're nearly done. It's not the last field in sequential order, it's the last empty one. That distinction matters because providers don't fill out notes linearly. Auto-triggering on that moment runs the check at exactly the right time, without interrupting the flow of documentation earlier.

Engineering constraint: LLM analysis cost → capped auto-triggers per session, manual re-check available

Minimal by default, expandable on demand

The primary view shows each of the 7 checks as pass or flagged, nothing else. Tap to expand and you get the AI's specific reasoning, tied directly to the note content. This wasn't just a design preference; it was a response to the clinical environment. A 15-minute session doesn't leave room for a detailed report card. Fast users need to glance and move on. The clinicians who want to understand a flag and contest it can dig in. The same information is available to both, just surfaced at the right level of detail for each.

UX constraint: avoid making LQA feel like an audit tool, which would drive avoidance

Solving the "go green" problem in the redesign

Alpha user research revealed a friction point that wasn't visible in the original design: if a clinician submitted a note without clicking out of the last text field, LQA never triggered, the note never went green, and the user was left confused about why. One alpha user described the issue clearly. A supervisor at the same organization didn't know she could click the sidebar component to see full results at all. These weren't edge cases, they were gaps in the experience that I addressed directly in the LQA redesign before Beta, along with a revised onboarding flow and clearer entry points.

Research finding: discovered through alpha user interviews conducted by PM, Carly assigned to resolve in redesign

Designing multiple entry points for different sidebar states

The Eleos sidebar isn't always open. Clinicians configure their EHR workspace differently, some keep the sidebar expanded, others minimize it, and some use an enhanced button view. LQA needed to be accessible in all three states without feeling like a different product in each. I designed entry points for each configuration so the behavior was consistent regardless of how a provider had set up their workspace. This was critical for ensuring adoption across a diverse user base that we couldn't control or train uniformly.

Technical constraint: Chrome extension architecture across EHRs required careful state management

Binary states, not confidence scores

The model produces a result that could be expressed as a confidence continuum. We chose not to show it. A 73% confidence score on a clinical compliance check shifts the question from "does my note have a problem?" to "do I trust this number?" Clinicians in the middle of a session shouldn't be doing either. It creates cognitive overhead without improving decisions, and in a clinical environment where clinicians are already skeptical of AI tooling, numerical uncertainty reads as unreliability rather than honesty. The binary pass/flag isn't hiding complexity, it's resolving it at the right layer. If the model isn't confident enough to flag, it doesn't flag. Threshold calibration happens on the engineering side; the UX presents a clean decision. This only works if the threshold is set correctly, which is an ongoing collaboration with the ML team as the model trains on more data.

Ongoing: threshold calibration is a live collaboration between design and ML as the model improves

Designing for the wrong flag

A pass/fail binary looks decisive, but the model will occasionally flag a note that a clinician knows is correct. If they see a false positive and can't contest it, they learn to distrust the tool. Enough false positives and they stop reading the flags entirely, the clinical version of alert fatigue, which is a well-documented failure mode in clinical decision support. I made two decisions in response. First: the expand-to-see-reasoning interaction isn't just for transparency, it's the mechanism by which a clinician can evaluate whether a flag is right or wrong. The reasoning gives them the information to contest it in their own judgment, even without a formal dispute flow. Second: I pushed for the dismiss mechanism that eventually got cut due to engineering constraints. Not having it meant clinicians who encountered a false positive had no recourse except submitting with the flag unresolved. That's a trust problem the design acknowledged rather than hid, and it sits at the top of the roadmap for the next sprint.

Design debt: dismiss/dispute mechanism cut in Beta; now roadmapped, unresolved false positives are the main driver of churn risk

app.eleos.com · Mary Thompson
Mary Thompson
Apr 3, 2025 · Individual Therapy · 50 min
Presenting Problem
Client presents with ongoing anxiety related to work stress. Reports disrupted sleep and difficulty concentrating. Engaged and motivated for therapy.
Interventions
Applied CBT techniques including thought challenging and behavioral activation. Reviewed homework from last session. Discussed breathing exercises.
Goals
Client will practice 10-min mindfulness daily. Follow up on sleep hygiene strategies from last session.
Diagnosis *
No diagnosis entered
Note Quality
Mary Thompson · Apr 3, 2025
Analyze your Note!
Run LQA to see quality feedback for this session note.
Note Quality
Analyzing your note…
Running 7 compliance checks…
Note Quality
Mary Thompson · Apr 3, 2025
Open Items (4)
!
Actionable Plan

Consider adding specific, measurable goals to the client’s treatment plan.

!
Golden Thread

Link between interventions and treatment goals is not explicitly described in this note.

!
Copy / Paste

Intervention section appears similar to previous session notes. Consider adding session-specific detail.

!
Short / Empty Fields

Diagnosis field is empty. Complete this field before submitting the note.

Completed (2)

Interactive · click Check Note Quality to run LQA · click any open item to expand or collapse

The hardest design problems weren't the UI. They were the constraints I had to design around.

Business Engineering Behavioral

The save-before-analysis problem. LQA required notes to be saved before analysis could run. But that's not how clinicians work. Their natural sequence is: write, submit, move on. We were asking them to add an extra step mid-session, save, check, fix, re-save, submit, without ever making that ask explicit in the UI. It showed in alpha engagement data. Not all active users were engaging with LQA consistently, and the ones who were weren't expanding results to read the reasoning. The feature was visible, but it wasn't landing. The redesign before Beta addressed the entry points and onboarding directly, but the underlying workflow mismatch was a real constraint on adoption.

The cost problem I couldn't explain. LLM analysis isn't free, and unlimited auto-triggering wasn't viable. We landed on a capped number of auto-triggers per session with a manual Re-check button for additional runs. I had to design around a limit I couldn't explain to users without making the product feel broken or arbitrary. The solution was to be explicit about what triggers the check while being quiet about the limit itself, surfacing the manual re-check prominently enough that it felt like a feature rather than a workaround.

The dismiss functionality that got cut. Users wanted to contest LQA flags they disagreed with. A dismiss or dispute mechanism was on the original spec. It was cut due to engineering resource constraints. That meant a clinician who knew a flag was wrong had no recourse, they just couldn't go green. This created real frustration with the highest-engaged users, exactly the people we needed as advocates. It's on the roadmap, but shipping without it was a conscious tradeoff I had to acknowledge in the design rather than obscure.

Performance. Target latency for analysis was under a few seconds. Early builds ran significantly longer, long enough to lose someone mid-session. I designed loading states that communicated progress rather than just spinning, something to hold trust while the backend caught up. By the time we shipped Beta, the team had the latency to a place where the experience felt responsive. But getting there required designing for a broken state first.

Sidebar open · Quality tab visible
Notes Scheduling
Save
Note Quality
Open Items (3)
!
!
Re-Run Analysis
Sidebar minimized · icon + badge
Notes
Save
Sidebar closed · floating launcher
Notes
Save
Check Note Quality

Three entry point states designed for different EHR configurations. Consistent behavior regardless of how a provider had set up their workspace.

Quality scores improved. The more interesting finding was behavioral.

In early alpha, LQA analysis had a success rate of roughly 10–20% on saved notes. By the final pre-release build, that figure reached 100% across the test cohort. Beta is in pre-release as of early 2026.

In the initial alpha cohort, quality scores improved measurably across most users. The biggest gains came in Golden Thread and Intervention documentation, the checks that reflect whether a session's clinical goals and actions are actually tied together.

But the more significant finding was behavioral. The most engaged users told us LQA wasn't just improving their notes. It was changing how they ran their sessions. One clinician said it "helps connect interventions explicitly to treatment goals" and "reminds her to document next session planning." The feedback loop was getting upstream of documentation entirely, further than the design had aimed for.

"With LQA, staff would have to choose to write a bad note. This is going to be a game changer."

CEO, regional behavioral health organization

"Huge thanks to Carly Raizon for seeing this project through so many iterations."

My manager, Eleos Health

As Beta moved into pre-release, I joined the compliance team on a multi-site field tour across customer organizations, running user testing sessions in person. No engagement dashboard would have surfaced what those conversations did. Organizations described their current manual processes with real frustration, and their reactions to LQA with genuine relief.

"We have a lot of people coming to us out of college. LQA will allow us to be more confident in our documentation."

Clinical supervisor, behavioral health organization

"It's the best thing you can use." She started recommending LQA to colleagues who had been resistant to adopting it.

Clinician, alpha cohort

In alpha, every check was happening at the moment of writing, not weeks later in a compliance backlog. That's the shift the product was designed to make.

LQA is in active alpha use and Beta is in pre-release, expanding with a Custom Rules feature that lets organizations define their own compliance criteria in natural language, built directly from what we heard on the field tour.

Next case study

Coding Back Office →