Team's prediction task compares GPT-4o with classic machine learning

A team from the Department of Biomedical Informatics, including research fellow Congning Ni, PhD, and Associate Professor Zhijun Yin, PhD, compared the large language model GPT-4o from San Francisco-based OpenAI with four traditional machine learning (ML) models for predicting which patients would discontinue their home cancer medications before planned treatment completion. Using electronic health records and pharmacy surveys from 2,364 cancer patients, the LLM achieved an F1 score of 87%, while the closest ML model scored 83%. For interpreting the ML model, the team used SHAP, and for GPT-4o, they used a new method called mimic-SHAP. The two models agreed on top features—body mass index and age—but for secondary features, the LLM leaned more on patients’ prior conditions, while the ML model focused on drug exposures and health care procedures. The findings were reported in the e-book series “Studies in Health Technology and Informatics.” The study was supported by National Institutes of Health award R37CA237452.