UNMIRI is open-sourcing the cross-vendor NGS interpretation API contract today: the JSON Schemas, TypeScript types, and Python pydantic v2 models that describe what an NGS interpretation pipeline returns after parsing a vendor report. The repo is at github.com/unmirihealth/unmiri-ngs-fhir-schema, Apache 2.0, tagged v0.1.0, with eight synthetic worked example payloads covering the most common cancer types and finding patterns in cross-vendor NGS interpretation.
This post explains what's in the release, why we're putting the contract out before the implementation, what the schema does and does not cover, and how it aligns with the HL7 FHIR Genomics Implementation Guide.
Synthetic data — demonstration only. All example payloads in the repo are illustrative and do not correspond to real patient data.
Why open-source the contract first
The hardest thing about cross-vendor NGS interpretation is not the parsing. It is agreeing on the shape of the output. We have written about why Foundation, Tempus, Caris, and Guardant report formats don't compose and what defensible parsing actually requires. The downstream half of that problem is what the parser hands off to the rest of the stack.
Most companies that ingest NGS reports end up with one of two outcomes. Either they invent a private schema that no one else can read, and the schema decays as new vendors and biomarkers arrive. Or they emit a literal FHIR Genomics Bundle, which is the right alignment target for an EHR but is verbose and indirect for product code that just wants to ask "what CDx flags fired."
We think the right answer is a public, narrow, FHIR-aligned contract that the parser writes into and that everything downstream reads from. So we wrote one and put it on GitHub. The contract is independent of any specific implementation, including ours; the package and class names are deliberately neutral so the schema can be adopted by anyone building cross-vendor NGS interpretation, not just UNMIRI customers.
What's in the release
schemas/ JSON Schemas (Draft 2020-12)
audit-envelope.schema.json
specimen.schema.json
variant.schema.json
biomarker.schema.json
cdx-flag.schema.json
trial-match.schema.json
contraindication.schema.json
ngs-interpretation-response.schema.json top-level
types/typescript/ hand-written TypeScript types
(publishable as @unmiri/ngs-interpretation-types)
types/python/ pydantic v2 models
(publishable as unmiri-ngs-interpretation)
examples/ 8 example payloads (synthetic, watermarked)
scripts/validate.py validates examples against schemas + pydantic
Eight resource shapes:
- audit envelope carries engine version, schema version, the vendor source report metadata, the knowledge-base versions consulted, mandatory data-source attributions, the synthetic-versus-production watermark, and an optional reasoning trace. Required on every payload because clinical reproducibility is non-negotiable.
- specimen captures tumor type, histology, specimen substrate (FFPE, plasma cfDNA, fresh frozen), cellularity, and ctDNA tumor fraction for liquid biopsies. SNOMED, ICD-O-3, and a free-text display are accepted; engineers can pin to whichever the source report carries.
- variant is the closest analogue to a FHIR Genomics
genomic-variantObservation. Gene with HGNC ID, transcript, HGVS coding and protein and genomic notation, assembly, coordinates, reference and alternate alleles, variant type, Sequence Ontology consequence, functional effect, VAF, copy number, fusion partner, ClinVar and COSMIC IDs, clinical significance, germline-versus-somatic origin, and AMP/ASCO/CAP 2017 evidence tier as the public taxonomy. AnexternalLevelsmap is provided as an extension point for implementations that license proprietary level systems separately. - biomarker covers TMB, MSI, MMR, HRD, LOH, PD-L1, ER, PR, HER2 (IHC and ISH), AR, Ki-67, tumor fraction, and tumor purity. LOINC codes where standardized. Antibody clone, scoring system, score string, threshold, and method are first-class fields because they materially change the clinical meaning.
- cdxFlag records a companion-diagnostic match: triggered by a variant or biomarker reference, drug with INN and brand name and RxNorm CUI and ATC code, indication with tumor type and line of therapy, approval regime (FDA, EMA, MHRA, others) and approval date, and the FDA-approved assay product if the originating test is itself the regulator-approved CDx. Citations point to FDA labels and primary trial publications.
- trialMatch records a ClinicalTrials.gov candidate: NCT ID, title, phase, status, sponsor, match strength (likely-eligible, possibly-eligible, ineligible-but-relevant), per-criterion eligibility hints, and coarse geographic facets. The pipeline returns matches; final eligibility is a clinician decision.
- contraindication records a negative therapy implication: resistance mutation, lack of benefit, missing required biomarker, drug-gene interaction, immunotherapy in low-TMB, regulator warning. Triggered by a variant or biomarker reference, with evidence tier and citation.
- NgsInterpretationResponse is the top-level shape that composes all of the above.
Eight worked examples ship in examples/. They are synthetic constructs, watermarked, and shaped to exercise the schema across the cancer types and finding patterns most commonly encountered in cross-vendor NGS interpretation. Distinctive variant names and combinations are invented; resemblance to a real case or vendor template is coincidental.
| File | Vendor / product | Tumor | Headline finding |
|---|---|---|---|
| 01-fmi-f1cdx-nsclc-egfr-l858r.json | FoundationOne CDx | NSCLC | EGFR L858R + osimertinib CDx |
| 02-fmi-f1cdx-nsclc-met-exon14.json | FoundationOne CDx | NSCLC | MET exon 14 skipping + capmatinib CDx |
| 03-fmi-f1cdx-nsclc-ntrk1-fusion.json | FoundationOne CDx | NSCLC | NTRK1-TPM3 fusion + tumor-agnostic CDx |
| 04-fmi-f1lcdx-lung-erbb2-tp53.json | FoundationOne Liquid CDx | NSCLC | ERBB2 G776delinsVC + ctDNA tumor fraction high |
| 05-tempus-xt-nsclc-kras-g12c.json | Tempus xT | NSCLC | KRAS G12C + sotorasib + adagrasib CDx |
| 06-caris-endometrium-dmmr-msih.json | Caris MI Profile | Endometrium | dMMR + MSI-H + pembrolizumab |
| 07-caris-breast-tnbc-pdl1.json | Caris MI Profile | Breast (TNBC) | PD-L1 SP142 IC 1% + sacituzumab govitecan |
| 08-caris-breast-er-pr-tmb-pik3ca.json | Caris MI Profile | Breast (HR+) | TMB-H + PIK3CA M1043I + INAVO120 candidate |
Every example is validated against the JSON Schemas and the pydantic models in CI. Round-trip parity between the two is part of the contract.
How it aligns with FHIR Genomics
This contract is alignment-shaped, not a literal FHIR profile. The mapping is one-to-one for the cases where it matters.
| This contract | FHIR Genomics R5 |
|---|---|
| NgsInterpretationResponse | Bundle rooted on DiagnosticReport (Genomics Reporting profile) |
| audit | Provenance |
| specimen | Specimen (Genomics Specimen profile) |
| variant | Observation with the genomic-variant profile (LOINC 69548-6) |
| biomarker (TMB) | Observation with LOINC 94076-7 |
| biomarker (MSI) | Observation with LOINC 81695-9 |
| cdxFlag | Observation with the therapeutic-implication profile |
| trialMatch | ResearchStudy reference plus structured match metadata |
| contraindication | Observation with therapeutic-implication (negative inference) |
Implementations that need a literal FHIR Bundle can render one deterministically from this contract, and the inverse direction is straightforward when the source FHIR conforms to the Genomics IG. The contract picks ergonomic shapes (top-level arrays, snake-case-or-camel pairs preserved through pydantic aliases, UUID-based cross-references inside a single response) where ergonomics and FHIR conformance trade off.
Using it
TypeScript:
import type { NgsInterpretationResponse, CdxFlag } from "@unmiri/ngs-interpretation-types";
function tierIaFdaCdx(response: NgsInterpretationResponse): CdxFlag[] {
return response.cdxFlags.filter(
(f) => f.evidence?.ampAscoCapTier === "I-A" && f.approvalRegime === "FDA"
);
}
Python:
from unmiri_ngs_interpretation import NgsInterpretationResponse
response = NgsInterpretationResponse.model_validate_json(payload)
for variant in response.variants:
if variant.evidence and variant.evidence.amp_asco_cap_tier == "I-A":
print(variant.gene.symbol, variant.hgvs_protein)
Validate any payload:
pip install jsonschema pydantic
python3 scripts/validate.py
What's not in scope
- No implementation. This is the API surface, not the parser, not the knowledge graph, not the LLM glue. The schema does not constrain how a payload is produced.
- No PHI. Patient identifiers, dates of birth, medical record numbers, and report identifiers are not modeled. Identity belongs to the calling system.
- No FHIR Bundle assembly. Producing FHIR resources is a separate rendering concern.
- No clinical advice. The contract describes the data; clinical judgment belongs to qualified clinicians.
- No licensed third-party content. Examples and descriptions in the open contract reference only public sources (FDA labels, ClinVar, ClinicalTrials.gov, openFDA, peer-reviewed publications) and the AMP/ASCO/CAP 2017 academic tier system. Proprietary KBs (OncoKB, COSMIC, NCCN) plug in through
evidence.externalLevelsunder each implementation's own licensing terms.
Versioning and roadmap
The release is tagged 0.1.0 and is explicitly unstable. We expect breaking changes during the 0.x series as design partners push back on the shape with real-world vendor-format quirks we have not yet seen.
Each payload carries the schema version it was produced against in audit.schemaVersion, so consumers can pin against a specific version and migrate deliberately. A 1.0 release will commit to semver-compatible evolution from then forward.
The next two areas we plan to work on, in order:
- Vendor-coverage gaps. Guardant360 and Natera Signatera are not represented in the example set yet. Adding them will likely surface representation gaps in the contract, especially for MRD-style assays whose output is closer to a longitudinal series than a single-timepoint genomic profile.
- Conformance test suite. A standalone repo that asserts a producer's output conforms to the contract under a fixed corpus of inputs. Useful for vendors who want to claim schema compatibility without an integration with UNMIRI specifically.
If you are building product against multi-vendor NGS reports, we would like to hear what the schema gets wrong. Open an issue at github.com/unmirihealth/unmiri-ngs-fhir-schema/issues.
Frequently asked questions
- What does the schema cover?
- Eight resource shapes: an audit envelope (engine version, knowledge-base versions, watermark, optional reasoning trace), specimen (tumor type, histology, cellularity, ctDNA tumor fraction), variant (gene, HGVS, transcript, assembly, type, consequence, VAF, copy number, evidence tier), biomarker (TMB, MSI, MMR, HRD, PD-L1, ER/PR/HER2 with antibody clones), CDx flag (companion-diagnostic match anchored to FDA approval), trial match (ClinicalTrials.gov match with eligibility hints), contraindication (negative therapy implication), and the top-level NgsInterpretationResponse that composes them. Each shape is defined as a JSON Schema with matching TypeScript and pydantic representations.
- Why not just use FHIR Genomics directly?
- FHIR Genomics is the right alignment target for the output of an NGS interpretation pipeline, but a literal FHIR Bundle is verbose and indirect for the engineering layer that produces it. The contract is shaped to map cleanly to a FHIR Genomics Bundle (DiagnosticReport rooted, with Observation profiles for variant, biomarker, therapeutic-implication, and a Provenance for the audit envelope) while staying ergonomic in product code. Implementations that need a literal FHIR Bundle can render one deterministically from this contract; the inverse direction works when the source FHIR conforms to the Genomics IG.
- What's the license, and can I use it commercially?
- Apache 2.0. Commercial use is permitted. We chose Apache rather than MIT specifically for the explicit patent grant, which matters for healthcare software where downstream consumers want defensive clarity. The license covers the schemas, the TypeScript types, the pydantic models, and the example payloads. Public-source knowledge bases referenced in the contract (ClinVar, ClinicalTrials.gov, openFDA) carry their own attribution requirements, which the schema documents but does not relicense. Proprietary or licensed sources (OncoKB, COSMIC, NCCN, others) are not part of the open contract; implementations that integrate them are responsible for their own licensing terms and plug values into evidence.externalLevels.
- Is the schema stable?
- Not yet. The release is tagged 0.1.0 and is explicitly unstable. We expect breaking changes during the 0.x series as design partners and contributors push back on the shape with real-world vendor-format quirks. Each payload carries the schema version it was produced against in audit.schemaVersion, so consumers can pin against a specific version and migrate deliberately. A 1.0 release will commit to semver-compatible evolution from then forward.
- Why is there no PHI in the schema?
- The interpretation contract is stateless. Identity belongs to the calling system, under its own privacy controls. The schema intentionally omits patient identifiers, dates of birth, medical record numbers, and report identifiers. The audit envelope captures vendor-issued report IDs as opaque strings only when present in the source. This separation keeps the contract usable in environments where PHI handling has not yet been resolved and prevents accidental PHI exposure in interpretation logs or analytics pipelines.
- How do the example payloads relate to real reports?
- The eight examples are synthetic constructs, watermarked Synthetic data — demonstration only, and shaped to exercise the schema across the cancer types and finding patterns most commonly encountered in cross-vendor NGS interpretation. Distinctive variant names and combinations are invented; resemblance to a real case or to a vendor template is coincidental. They are not derived from any patient-level dataset. Examples include EGFR L858R NSCLC, MET exon 14 NSCLC, NTRK1 fusion (tumor-agnostic), liquid biopsy with ERBB2 G776delinsVC, KRAS G12C NSCLC, dMMR/MSI-H endometrial, TNBC with PD-L1, and HR+ breast with PIK3CA M1043I plus TMB-H.
Umair Khan
Founder and CTO, UNMIRI
Building UNMIRI, a precision oncology infrastructure company with four product surfaces: cross-vendor NGS interpretation, genomics-aware decision support, oncology literature intelligence, and a free cross-vendor unification tool for clinicians. Writing here on architecture, clinical data, and HIPAA-ready AI.
Related posts
Architecture & engineering · 10 min read
Cross-Vendor NGS Report Parsing: Why Foundation, Tempus, Caris, and Guardant Outputs Don't Compose
Why Foundation Medicine, Tempus, Caris, and Guardant NGS report formats don't compose, and what defensible cross-vendor parsing actually requires.
Clinical data & genomics · 9 min read
BRCA2 in Metastatic Prostate Cancer: How PARP Inhibitor Decision Logic Should Actually Work
How PARP inhibitor decision logic for BRCA2-mutated metastatic prostate cancer requires variant-aware, line-of-therapy-aware reasoning that generic CDS systems struggle with.