Why Most SOAP Notes Fail Their Downstream Readers
The SOAP framework — Subjective, Objective, Assessment, Plan — was developed by Lawrence Weed in the late 1960s as part of his Problem-Oriented Medical Record system. It was designed to impose a specific logic on clinical documentation: what the patient reports, what the clinician observes and measures, what the clinician concludes, and what the clinician intends to do. That logic is sound. The problem is that the label "SOAP note" now gets applied to documents that violate the framework's internal logic at nearly every section.
When reviewing notes in an EHR — whether during resident training or documentation quality consultation — the most common failures are not omissions. They are structural confusions: diagnostic reasoning buried in the Subjective section, physical exam findings scattered across the Plan, an Assessment that restates the chief complaint without actually asserting a diagnosis. A note like this does not fail a compliance audit. It fails its primary clinical purpose: enabling any other clinician who reads it to understand, quickly and accurately, what this physician believed was happening with this patient and what they decided to do about it.
The Subjective Section: Precision Over Narrative
The Subjective section should capture the patient's experience: chief complaint, history of present illness (HPI), relevant review of systems, and pertinent historical context. What it should not capture is the physician's interpretation of that experience. That belongs in the Assessment.
The HPI is where the most variation in quality appears. A high-quality HPI uses the standard mnemonic elements — onset, location, duration, character, aggravating and alleviating factors, timing, severity — but applies them selectively based on what is clinically relevant to the visit. "Patient reports left knee pain, onset 3 weeks ago, worse with stairs, mildly improved with ibuprofen 400mg, denies trauma, no prior episodes" is more useful than a prose paragraph that buries the same elements in narrative.
The challenge with AI-generated Subjective sections is that the system must distinguish between what the patient reported and what the physician inferred or stated. Encounter audio contains both. A physician saying "so it sounds like this started about three weeks ago" is not the patient reporting onset — it is the physician summarizing. A well-designed clinical NLU layer recognizes this distinction and attributes statements to the correct source. One that does not will produce a Subjective section that blurs physician and patient voice in ways that create downstream confusion about the evidentiary basis of the history.
The Objective Section: What Not to Include
The Objective section is frequently misused. Common structural errors include auto-populating vital signs from a prior encounter, placing a full review of systems in the Objective when it belongs in the Subjective, listing normal findings without clinical context in ways that create note bloat, and — particularly for telehealth visits — leaving the physical exam template intact for elements that could not be assessed remotely.
For telehealth encounters, which now represent a meaningful fraction of outpatient visit volume in many specialties, the Objective section requires deliberate attention. Documenting the modality, noting limitations on physical assessment, and capturing what is observable on video — gait during a standing request, visible skin changes, respiratory pattern — produces a more defensible record than an Objective section that simply lists physical exam elements as "deferred" without clinical context. The former demonstrates judgment. The latter can raise questions about whether the encounter meets medical necessity documentation standards for the billed level of service.
The Assessment: Where Most Notes Actually Fail
The Assessment section should contain the physician's clinical conclusions: working diagnoses, their supporting basis, and diagnostic uncertainty that is itself clinically meaningful. This is the hardest section to generate well — for both human physicians and AI systems — because it requires the capture of reasoning, not just facts.
The ICD-10 relationship to the Assessment is where structural quality has direct revenue cycle implications. An Assessment that documents "hypertension, Type 2 diabetes" without specifying complication status, chronic kidney disease stage if present, or whether hypertension is essential or secondary maps to less specific codes. In value-based care contracting, where Hierarchical Condition Category (HCC) risk adjustment depends on coded diagnosis specificity, this gap matters both for revenue and for accurate population health risk stratification.
A Documentation Pattern Worth Examining
A pattern common in growing internal medicine groups: physicians document chronic conditions at insufficiently specific levels. Meaningful portions of Type 2 diabetes encounters get coded to E11.9 — diabetes mellitus without complications — even when stage 3 chronic kidney disease is an active problem being addressed in the same visit. With MEAT criteria applied correctly (Monitoring, Evaluation, Assessment, Treatment), those encounters would support more specific HCC-relevant coding. The issue is rarely clinical knowledge. It is that note structure does not force the connection between the Assessment problem list and the conditions being actively managed. This is a structural documentation problem, and it is addressable.
The Plan: Specificity That Enables Action
A clinically useful Plan section is specific enough that another clinician could execute it without calling the original physician. "Follow up in 3 months" is not a plan — it is a placeholder. "Follow up in 3 months; repeat HbA1c and comprehensive metabolic panel at that visit; continue metformin 1000mg BID; patient counseled on carbohydrate intake and referred to registered dietitian for medical nutrition therapy" is a plan that can be read, actioned, and audited.
The Plan should close loops on diagnostic workup initiated during the visit: orders placed, labs collected, imaging scheduled, referrals submitted. When this information is embedded in the Plan section and structured for EHR extraction, it enables downstream tracking and supports quality metric documentation for chronic disease management and preventive care gap closure.
We are not saying every Plan section needs to be exhaustive to the point of unreadability — notes are clinical tools, not legal briefs. The goal is the minimum specificity required to support continuity of care and downstream action. That threshold is higher than many physicians currently meet in their day-to-day documentation, particularly under time pressure at the end of a long clinic day.
What AI-Assisted SOAP Notes Require from the Reviewing Physician
The promise of ambient AI for SOAP note generation is real, but it shifts rather than eliminates physician responsibility. When a physician dictated or typed a note themselves, every word passed through their cognition. When the note is generated by an AI system from encounter audio, the physician's role shifts to verification. The system's output is a draft that must be read with the same clinical attention one would bring to a colleague's handoff note.
Errors in AI-generated notes tend to have a different character than errors in physician-authored notes: they are often confident-sounding misattributions or specificity gaps rather than omissions, which makes them easier to miss in a hurried review. A physician who signs a draft without genuine engagement has not gained efficiency — they have transferred documentation liability to a system without fulfilling their role in the documentation chain.
Tools designed for responsible ambient documentation build in review affordances: highlighting AI-generated ICD-10 codes for explicit physician confirmation, flagging lower-confidence passages in the Assessment, surfacing medication changes for explicit med-rec review before sign-off. These features are not added friction. They are the architecture of a workflow where physician judgment remains the final clinical authority in the documentation chain.
Structure as a Downstream Quality Signal
Note quality is increasingly measurable at the group level. EHR analytics can identify notes with missing HPI elements, Assessment sections that lack ICD-10 specificity, or Plan sections that leave diagnostic loops open. When these patterns are visible across a practice, they become quality improvement data rather than individual physician performance issues — a more productive framing for engaging physicians in documentation improvement.
The structural discipline of a well-formed SOAP note is not primarily about compliance or coding accuracy. It is about what happens to the patient when the next clinician reads that note in an urgent care visit six months from now, or when a specialist receives a referral based on that Assessment and Plan. A note that forces the next reader to reconstruct the original physician's clinical reasoning from fragmentary documentation creates risk at every subsequent handoff. Getting the structure right is a clinical quality investment with returns that compound across every future encounter that touches the same patient record.