Interpretable and Lightweight Predictive Modelingfor Congestive Heart Failure Using ICU Electronic Health Records

Authors

  • Ziwei Wang Carnegie Mellon University, Pittsburgh, United States Author
  • Haoyun Zhang University of Pennsylvania, Philadelphia, United States Author
  • Di Zhu Santa Clara University, Santa Clara, United States Author
  • Chen Xie University of Massachusetts Amherst, Amherst, United States Author

DOI:

https://doi.org/10.71465/fair724

Keywords:

congestive heart failure, disease prediction, MIMIC-III, tabular machine learning, XGBoost, logistic regression, small language models, interpretability

Abstract

Congestive Heart Failure (CHF) is a prevalent cardiovascular condition and a major contributor to hospitalizations and mortality worldwide. Early identification of CHF risk from electronic health records (EHR) can support proactive clinical monitoring and intervention. This paper presents an interpretable and lightweight predictive modeling workflow for CHF prediction using structured ICU data from the MIMIC-III database. A patient-level dataset of approximately 44,000 adult ICU patients with 115 demographic and laboratory-derived features is used to evaluate classical tabular machine learning models, including logistic regression, stochastic gradient descent classifiers, Random Forest, Gradient Boosting, and XGBoost. Tree-based ensemble models achieve the strongest performance, with XGBoost reaching an accuracy of 0.858 and sensitivity of 0.861 for CHF detection. The study also examines interpretability through logistic regression coefficients and feature-importance analysis, and compares these models with a prompted small language model baseline. The findings suggest that compact and interpretable machine learning models provide an effective and deployable approach for disease risk prediction using structured EHR data, especially in resource-constrained clinical environments.

Downloads

Download data is not yet available.

Downloads

Published

2026-03-24