Can AI Extract Echo Report Data as Accurately as Expert Annotation?

A large language model achieved 92.5% exact-match agreement with expert annotation when extracting structured cardiovascular data from free-text echocardiography reports, according to a finalist abstract for the 2026 Arthur E. Weyman Young Investigator’s Award Competition. Researchers from Johns Hopkins University in Baltimore, Maryland, evaluated whether GPT-5 mini could abstract 55 cardiovascular fields from de-identified reports in the MIMIC-III EchoNotes dataset. Fifty reports were independently annotated by a board-certified echocardiographer and separately extracted by GPT-5 mini. A blinded independent cardiologist adjudicated 193 field-level disagreements. Using human annotation as the reference, precision ranged from 96% to 98% by category, and recall ranged from 85% to 95%. In blinded review of discordant fields, the large language model extraction was judged superior in 60% of comparisons, or 101 of 171. The model also identified 120 additional clinical values present in the source reports but not documented by human annotators. Researcher Robert B. Barrett said the additional values varied in significance, noting the model’s tendency to over-extract normal or trivial findings and that some missed annotations were caused by human workflow issues.