MedMapper.

ICD-10-CM · phrase to code

Clinical text, standardized.

MedMapper turns clinical shorthand into ICD-10 codes, and shows you the confidence and the tier that resolved it. Sub-millisecond on the common path. Self-hosted, open source, retargetable to any code system.

runs live · no signup · the real Tier 1 and 2 pipeline

classify · live
Dictionary
fuzzy match
NLP
scispaCy · UMLS
LLM🔒
self-host only

The public demo runs Tier 1 (dictionary) and Tier 2 (NLP). Tier 3 (LLM) is locked here to keep it free. It's one env var away when you self-host.

the pipeline

Three tiers. Each one stops when it's sure.

A request enters at Tier 1 and moves down only when a tier isn't confident enough. Most requests never leave the dictionary. The slow, expensive tiers run for the cases that actually need them.

tier 01sub-ms

Dictionary

A fuzzy match against a SQLite term-and-code table. Anything MedMapper has seen before is answered here, in under a millisecond.

stops the cascade when the match is confident

tier 02~40 ms

NLP

scispaCy with the UMLS entity linker. It reads concepts the dictionary has never met: abbreviations, shorthand, phrasing it hasn't been taught.

runs only if Tier 1 wasn't sure

tier 03optional

LLM🔒

Any OpenAI-compatible endpoint. It handles the long tail the first two miss. Off in this demo to keep it free; one env var when you self-host.

runs only if Tier 2 wasn't sure

auto-learn: when Tier 2 resolves a new term with high confidence, MedMapper writes it back into the dictionary. The next request for that term is answered by Tier 1, in under a millisecond.

auto-learn

It gets faster the more you use it.

Clinical vocabularies are long-tailed but repetitive. The same shorthand turns up again and again across a service. MedMapper learns each term once, then never spends Tier 2 on it again.

There is no training run and no model to retrain. The dictionary is a table, and Tier 2 writes to it the moment it's confident. Your busiest terms collapse to a sub-millisecond lookup on their own.

first request41.6 ms

ckd 2/2 htn

resolved by Tier 2 · NLP

written back into the dictionary
every request after0.6 ms

ckd 2/2 htn

resolved by Tier 1 · dictionary

any code system

Not just ICD-10.

MedMapper doesn't care which system you're coding to. The reference codebook is a CSV with two columns. Swap the file and the whole service retargets.

It ships with the ICD-10-CM FY2025 codebook. Point REFERENCE_PATH at your own list and set the coding system. The tiers, the cascade, and the auto-learn loop work exactly the same.

  • ICD-10-CM
  • SNOMED CT
  • CPT
  • ICD-11
  • your taxonomy
data/reference/codes.csv
code,description
N18.9,"Chronic kidney disease, unspecified"
I10,"Essential (primary) hypertension"
E11.9,"Type 2 diabetes mellitus without complications"

the api

Built for engineers.

A FastAPI service with a typed JSON contract. Send text, get a code, a confidence, the tier that resolved it, and how long it took.

request
POST /api/v1/classify
Content-Type: application/json

{ "text": "ckd 2/2 htn" }
200 · response
{
  "result": {
    "code": "N18.9",
    "description": "Chronic kidney disease, unspecified",
    "confidence": 0.87,
    "tier_used": "nlp",
    "match_method": "umls_entity_link"
  },
  "alternatives": [ … ],
  "processing_time_ms": 41.6
}
Interactive explorers
Swagger UI at /docs and ReDoc at /redoc on every running instance.
Batch endpoint
POST up to 100 items at once and get per-item results and timings back.
Tunable confidence
Set accept and reject thresholds per tier. Cap the cascade with max_tier.
Auth on writes
Reads stay open. Writes take an Authorization: Bearer key when you set one.

honest limits

When not to use this.

A short list of jobs MedMapper is the wrong tool for. Knowing where it stops is part of trusting where it works.

  • One known phrase to one known code

    Use a grep, or a lookup table. You don't need a pipeline for a constant.

  • Reasoning over multi-page chart notes

    That's chart review. MedMapper is built for phrase-to-code, not document understanding.

  • Crosswalking ICD-10 to SNOMED

    Mapping between coding systems isn't in v0.2. MedMapper maps text to one system at a time.

self-host · MIT

No PHI leaves your network.

MedMapper runs as containers in your own infrastructure. Clinical text is classified locally and never reaches us or any third party. Clone it, bring it up with Docker, and curl your first classification in about three minutes. The first boot pulls the ~1.2 GB scispaCy model once, roughly ten minutes; every restart after that is seconds.

Tier 1 + Tier 2 run out of the box. Add an LLM endpoint to switch on Tier 3.

bash
git clone https://github.com/okraks/medmapper.git
cd medmapper
docker compose up -d
your first classify
curl -s -X POST http://localhost:8000/api/v1/classify \
  -H "Content-Type: application/json" \
  -d '{"text": "hypertension"}'

hosted version

Prefer not to run it?

A managed MedMapper is in the works. Same engine, scaled and kept current, nothing to deploy. Leave your email and you'll hear from us when it opens.

No spam. One email when it opens.