FHIR Clinical Insight Engine

Published: June 2026

Health InformaticsHealthcare & MedicineData Science

Architecture Blueprint: Local Edge AI for Clinical Decision Support

SMART on FHIR App Launch

Authenticate via EHR OAuth2 Authorization Code Flow

Launch Clinical Intelligence Interface

1. Project Purpose & Clinical Context

The integration of Large Language Models (LLMs) into clinical workflows presents a fundamental conflict between advanced computational reasoning and strict patient data privacy regulations (e.g., HIPAA, PIPEDA). Traditional cloud-based AI architectures require transmitting sensitive health records to external servers, introducing latency, recurring API costs, and substantial compliance risks.

This project engineers a completely decentralized, zero-egress clinical dashboard utilizing SMART (Substitutable Medical Applications, Reusable Technologies) on FHIR (Fast Healthcare Interoperability Resources). By leveraging local Edge AI, the application allows major EHR (Electronic Health Record) vendors—such as Epic and Cerner—to augment their existing interfaces with AI-driven summaries that execute entirely within the physician’s local browser environment.

The primary objective is to ingest complex, highly nested FHIR R4 JSON bundles, parse them deterministically for visual charting, and run an offline LLM to synthesize an “Enterprise Assessment” (Risk Flags, Care Gaps, Evidence-based Findings) without ever connecting to an external AI API.

Key Details


Healthcare Digital Interoperability Ecosystem infographic depicts the Healthcare Digital Ecosystem at its center, from which organized rings radiate outward representing distinct layers of the modern healthcare technology stack.These layers progress from foundational healthcare providers and EHR systems like Epic and Oracle Health, through critical interoperability standards including HL7 v2, FHIR R4/R5, SMART on FHIR, and DICOM, out to cloud platforms and AI analytics capabilities like Medical LLMs and Precision Medicine. The outermost ring captures the broader connected periphery — Wearables, IoT Sensors, clinical trials, remote monitoring etc whilce scattered micro-labels like SNOMED CT, LOINC, OAuth2, USCDII, REST API - reinforce technical depth thorughout.

Healthcare Ecocystem


HEalthcare ecosystem
Healthcare Digital Interoperability Ecosystem.

2. Operational Workflow

3. Technical Architecture & Engineering Decisions

WebAssembly (WASM) & WebGPU Acceleration

Executing transformer models in standard JavaScript is computationally unfeasible. This architecture relies on Transformers.js compiled to WASM, leveraging WebGPU APIs to access the client device’s native graphics hardware. This paradigm shifts the compute burden from centralized servers to edge devices, enabling real-time inference on sensitive data. The AI pipeline is deliberately isolated within a Web Worker (ai-worker.js) to maintain a 60fps UI render cycle while the model processes tokens.

Model Selection Strategy: The 80MB vs. 600MB Paradigm

Initial architectural prototypes utilized a ~500 million parameter model (quantized to approximately 600MB). While this model demonstrated superior clinical reasoning and natural language formatting capabilities, it introduced fatal bottlenecks:

  1. Bandwidth Limitations: Downloading a 600MB payload on constrained hospital networks resulted in unacceptable Time-To-First-Token (TTFT).
  2. VRAM Exhaustion: Older clinical workstations lacked the integrated graphics memory to compile the WebGPU shaders for a model of that scale, resulting in silent browser tab crashes.

The Solution: The architecture was downgraded to SmolLM2-135M-Instruct, an aggressively quantized (q8) model weighing approximately 80MB. This guaranteed universal hardware compatibility and sub-3-second load times. The inherent tradeoff—reduced logical depth and poor formatting adherence—was mitigated by shifting diagnostic responsibility to the deterministic Svelte parsing layer and aggressively constraining the LLM’s generation parameters.

4. Critical Troubleshooting & Code Implementations

Addressing Model Degeneration Loops

Ultra-lightweight models (sub-200M parameters) are highly susceptible to contextual degeneration. When fed repetitive tabular data (e.g., lists of FHIR Observations), the attention mechanism often collapses into infinite loops (e.g., outputting “history of having a history of”).

To forcefully break this hardware-level looping, specific algebraic constraints were applied to the generation pipeline, specifically leveraging repetition_penalty and a low temperature.

javascript
// Executed within ai-worker.js
const output = await generator(prompt, { 
    max_new_tokens: 150, 
    temperature: 0.1, 
    repetition_penalty: 1.15, // Forces the attention heads to seek novel tokens
    do_sample: false,         // Deterministic greedy decoding
    return_full_text: false 
});

Resolving Circular JSON Serialization in FHIR Bundles

To provide clinical explainability, the architecture includes a Raw JSON Inspector. However, passing the FHIR payload through the fhirpath evaluation library mutated the original objects, injecting dataRoot pointers that referenced parent nodes. This created cyclic dependencies that crashed native JSON.stringify() executions.

This was resolved by implementing a WeakSet closure during serialization, effectively mapping and discarding cyclical references before they trigger a stack overflow in the DOM.

svelte
<div class="font-mono text-xs text-green-400 bg-slate-900 p-4">
  {JSON.stringify(rawPatientData, (() => { 
    const seen = new WeakSet(); 
    return (key, value) => { 
      if (typeof value === 'object' && value !== null) { 
        if (seen.has(value)) return '[Circular Reference]'; 
        seen.add(value); 
      } 
      return value; 
    }; 
  })(), 2)}
</div>

Deterministic Rule Engine

Because the 80MB LLM lacks the parametric memory to reliably diagnose complex vitals, a deterministic logic layer was written to evaluate the arrays directly using fhirpath.

javascript
// Extracting LOINC 85354-9 (Blood Pressure)
const bps = observations.filter(obs => obs.code?.coding?.[0]?.code === '85354-9');
if (bps.length > 0) {
    const latestBP = bps.sort((a, b) => new Date(b.effectiveDateTime) - new Date(a.effectiveDateTime))[0];
    const sys = latestBP.component?.find(c => c.code?.coding?.[0]?.code === '8480-6')?.valueQuantity?.value;
    const dia = latestBP.component?.find(c => c.code?.coding?.[0]?.code === '8462-4')?.valueQuantity?.value;

    if (sys > 140 || dia > 90) {
        alerts.push(`Abnormal Vitals: Hypertensive BP reading detected (${sys}/${dia}).`);
    }
}

5. Future Development Roadmap

  1. Cloud API Fallback Architecture: Implement an adaptive fallback mechanism. If explicit patient consent is digitally verified, the system will bypass the local 135M edge model and route the de-identified FHIR bundle to an Anthropic/OpenAI endpoint for deep clinical reasoning.
  2. Multi-Modal Data Integration: Expand the parsing engine to map high-density longitudinal data, specifically writing D3.js or Canvas integrations to render raw ECG waveforms stored within FHIR DiagnosticReport arrays.
  3. Traceable Evidence Mapping: Upgrade the UI so that LLM-generated bullet points act as anchor links. Clicking an AI insight (e.g., “Patient exhibits chronic sinusitis”) will automatically expand the JSON inspector and highlight the exact FHIR Condition resource ID that triggered the conclusion.


© Dr. Balaji Ramanathan

Enhanced by JavaScript • Based on Slick Portfolio