The interface used to human-audit a random sample of the session-LLM extractions behind the paper.
The machine already confirmed every quote is a verbatim substring of its case
(hallucination rate 0.000). The step it can't do is the semantic one: does the green quote actually
justify the extracted value? Read each, pick a verdict. Everything is local —
your verdicts autosave in this browser and never leave your machine; use Export to download them
as a JSON record (and Import to resume on another device). This is a static page: there is no
server, by design.