When a healthtech team wants to add an oncology feature, the variant work is almost never the feature. It's the tax. Before anyone can match a patient to a trial or check a prior-auth policy, someone has to parse a Foundation Medicine PDF, reconcile a Tempus JSON, decide whether L858R and p.Leu858Arg are the same thing, and keep all of it from drifting as vendors change their formats.
We built that substrate once: a Neo4j knowledge graph over public oncology sources, fed by a cross-vendor parser, rendered through deterministic templates. Four product surfaces now sit on top of it. Two are for the teams shipping oncology software. Two are decision APIs that answer a specific question with a citation. This post walks through all four and what each one returns.
Synthetic data — demonstration only. All example variants and payloads below are illustrative and do not correspond to real patient data.
Tier 1A: mCODE-compatible output from the NGS Interpretation API
The NGS Interpretation API parses reports from Foundation Medicine, Tempus, Caris, Guardant, Natera, and other major lab vendors into normalized FHIR R4 Genomics: HGVS variants, biomarkers (TMB, MSI, HRD, PD-L1), and companion-diagnostic eligibility flags.
The new piece is mCODE-compatible output. mCODE (minimal Common Oncology Data Elements) is the HL7 FHIR profile set that EHR vendors and oncology platforms standardize on for cancer data. If your downstream systems or a cancer-registry pipeline expect mCODE, the API now renders that shape directly from the parsed report. The genomics that come out of a scanned PDF land as structured oncology data, not free text.
The output is template-rendered, not model-generated. The same (gene, variant, tumor type) input produces the same FHIR every time. That matters when the consumer is a clinical system and a diff between two runs has to mean a real change in the data, not a sampling artifact.
See the NGS Interpretation API
Tier 1B: trial matching that won't conflate two EGFR variants
The Trial Matching API answers one question. Given a variant, which open trials is this patient eligible for? It's built to sit behind an existing matcher as the variant-grounding layer, with per-call pricing.
The reason it's a separate product is the failure mode it avoids. EGFR L858R and EGFR T790M are the same gene, often the same paragraph in a review, and they point to different drugs. A retrieval layer built on cosine similarity treats them as near-identical and returns the wrong trial set with full confidence. We measured this and published it: at a similarity threshold of 0.95, vector retrieval conflated every clinically distinct variant pair we tested. The write-up is in our preprint on why vector RAG fails for oncology.
UNMIRI resolves the variant through the typed graph first, so identity is exact before any matching happens. Each returned trial carries a citation back to the eligibility criterion that matched. You can run a sample match in the sandbox without an account.
Run a sample match and read the trial-matching page.
Tier 2: genomics-aware clinical decision support
The CDS API takes the parsed genomics and reasons over them across five categories: drug-variant interactions, hereditary-cancer triggers, trial eligibility, companion-diagnostic flags, and biomarker thresholds. It runs on the same knowledge graph as the NGS API, so the evidence behind a recommendation traces to the same CIViC, ClinVar, openFDA, and ClinicalTrials.gov nodes.
It's designed to meet the FDA's non-device CDS criteria. The output shows its basis so a clinician can independently review the reasoning rather than rely on the software as a black box. Recommendations are rendered deterministically and tiered with the AMP/ASCO/CAP 2017 framework (I-A, I-B, II-C, and so on), never a proprietary level system. Pharmacogenomic alerts draw on CPIC Level A guidance, the gene-drug pairs that actually change oncology dosing: DPYD with fluoropyrimidines, UGT1A1 with irinotecan, TPMT with thiopurines.
Tier 2C: prior-authorization decisions with the receipt attached
The PA Decision Engine is the newest surface and the most opinionated. You send an anonymized tuple: variant, drug, tumor type, prior therapies, payer. You get back one of four decisions (likely covered, requires PA, not on policy, or insufficient data) with the source excerpt and a citation URL behind it.
The grounding is FDA labels, CMS LCDs, and openFDA. No patient identifiers go in the request, and no PHI touches the LLM path. CMS LCD coverage is live today; payer-specific policies are being added. If you have a license for guideline content such as NCCN or a proprietary actionability set, you plug it in through the external_levels hook and the engine layers your criteria on top of the FDA baseline. UNMIRI does not ship that licensed content itself. The default answer is the defensible one: what the label says, with the paragraph that says it.
Run a sample decision and read the PA engine page.
Why they share a foundation
Four APIs, one graph. That's the whole design.
The parser and the knowledge graph are the expensive part to build and the easy part to get wrong, so they're built once and shared. The clinical output is rendered by deterministic templates rather than generated prose, because a CDS recommendation or a PA decision has to be reproducible and auditable. LLMs are scoped narrowly, to extraction edge cases, and flagged when present. The knowledge sources are public and attributable: CIViC, ClinVar, ClinicalTrials.gov, openFDA, and CPIC.
UNMIRI's architecture runs the PHI path on AWS under a signed BAA, with narrow LLM inference on Microsoft Azure OpenAI under the Microsoft BAA, US-only. The posture is HIPAA-ready by design.
The public developer sandboxes are live now. Full production access, with arbitrary inputs, BAA coverage, and SLAs, is design-partner tier while we are pre-revenue. If you're building oncology software and the variant substrate is your tax, that's the part we want to carry. Start with the API that maps to your problem. The second one is already on the same data plane when you need it.
Related references
Product
Platform overview
All four APIs and how they share one data plane.
Product
Trial Matching API
Tier 1B: variant-grounded matching that won't conflate L858R with T790M.
Product
PA Decision Engine
Tier 2C: FDA-label-aligned prior-auth decisions with citations.
Live demo
Run a sample trial match
Synthetic sample, no account required.
Live demo
Run a sample PA decision
Anonymized tuple in, citation-backed decision out.
Reference
Why vector RAG fails for oncology
The result the trial-matching identity check is built on.
Frequently asked questions
- What are the four new capabilities?
- Two Tier 1 surfaces and two Tier 2 surfaces, all on the same knowledge graph and parser. Tier 1A: the NGS Interpretation API now renders mCODE-compatible output alongside its FHIR R4 Genomics Bundle, so parsed genomics can feed cancer-registry pipelines. Tier 1B: a Trial Matching API that resolves variant identity through a typed graph before matching, so two distinct mutations in the same gene never collapse to the same trial set. Tier 2: a genomics-aware CDS API across five reasoning categories (therapy-match, drug-interaction, hereditary-flag, evidence-gap, CDx-eligibility). Tier 2C: a prior-authorization Decision Engine that returns a citation-backed decision from FDA labels and CMS LCDs.
- Why does trial matching need a typed graph instead of vector search?
- EGFR L858R and EGFR T790M are the same gene, often the same paragraph in a review, and point to different drugs. Cosine-similarity retrieval treats them as near-identical and returns the wrong trial set. UNMIRI's published bioRxiv result measured this: at a similarity threshold of 0.95, vector retrieval conflated every clinically distinct variant pair tested. The Trial Matching API resolves the variant through the typed graph first, so identity is exact before any matching happens, and each returned trial carries a citation back to the eligibility criterion that matched.
- Does the prior-authorization engine handle PHI?
- No. The PA Decision Engine takes an anonymized tuple of variant, drug, tumor type, prior therapies, and payer. No patient identifiers go in the request, and no PHI touches the LLM path. The grounding is FDA labels, CMS LCDs, and openFDA. Each response carries the policy section, a source excerpt, and a citation URL. If you license guideline content such as NCCN or a proprietary actionability set, you plug it in through the external_levels hook and the engine layers your criteria on top of the FDA baseline; UNMIRI does not ship that licensed content itself.
- What evidence tiers and sources does the CDS API use?
- The CDS API tiers findings with the AMP/ASCO/CAP 2017 framework (I-A, I-B, II-C, II-D, III, IV), never a proprietary level system. Evidence traces to public sources: CIViC, ClinVar, ClinicalTrials.gov, openFDA, and CPIC pharmacogenomics guidelines. The output is rendered deterministically and shows its basis so a clinician can independently review the reasoning, which is the design intent behind the FDA non-device CDS criteria.
- Are these generally available?
- The public developer sandboxes are live and need no account: run a sample variant lookup, a sample trial match, and a sample PA decision against synthetic inputs. Full production access (arbitrary inputs, higher rate limits, BAA coverage, and SLAs) is partner and design-partner tier. UNMIRI is pre-revenue and in the design-partner phase, so treat the four capabilities as developer-sandbox and design-partner ready, not as a general-availability launch.
- How is PHI handled across the platform?
- UNMIRI's architecture runs the PHI path on AWS under a signed Business Associate Agreement, with narrow LLM inference on Microsoft Azure OpenAI under the Microsoft Online Services BAA, US-only. Final clinical output is rendered by deterministic templates rather than generated prose, so a CDS recommendation or a PA decision is reproducible and auditable. LLMs are scoped to extraction edge cases and flagged when present. The posture is HIPAA-ready by design.
Umair Khan
Founder and CTO, UNMIRI
Building UNMIRI, a precision oncology infrastructure company with four product surfaces: cross-vendor NGS interpretation, genomics-aware decision support, oncology literature intelligence, and a free cross-vendor unification tool for clinicians. Writing here on architecture, clinical data, and HIPAA-ready AI.
Related posts
Architecture & engineering · 6 min read
Open-Sourcing the UNMIRI NGS Interpretation Schema
UNMIRI is releasing a vendor-agnostic, FHIR-Genomics-aligned API contract for cross-vendor NGS interpretation under Apache 2.0: JSON Schemas, TypeScript types, pydantic models, and 8 synthetic worked examples covering the most common finding patterns.
Architecture & engineering · 10 min read
Cross-Vendor NGS Report Parsing: Why Foundation, Tempus, Caris, and Guardant Outputs Don't Compose
Why Foundation Medicine, Tempus, Caris, and Guardant NGS report formats don't compose, and what defensible cross-vendor parsing actually requires.