Interpretable and Lightweight Predictive Modelingfor Congestive Heart Failure Using ICU Electronic Health Records
DOI:
https://doi.org/10.71465/fair724Keywords:
congestive heart failure, disease prediction, MIMIC-III, tabular machine learning, XGBoost, logistic regression, small language models, interpretabilityAbstract
Congestive Heart Failure (CHF) is a prevalent cardiovascular condition and a major contributor to hospitalizations and mortality worldwide. Early identification of CHF risk from electronic health records (EHR) can support proactive clinical monitoring and intervention. This paper presents an interpretable and lightweight predictive modeling workflow for CHF prediction using structured ICU data from the MIMIC-III database. A patient-level dataset of approximately 44,000 adult ICU patients with 115 demographic and laboratory-derived features is used to evaluate classical tabular machine learning models, including logistic regression, stochastic gradient descent classifiers, Random Forest, Gradient Boosting, and XGBoost. Tree-based ensemble models achieve the strongest performance, with XGBoost reaching an accuracy of 0.858 and sensitivity of 0.861 for CHF detection. The study also examines interpretability through logistic regression coefficients and feature-importance analysis, and compares these models with a prompted small language model baseline. The findings suggest that compact and interpretable machine learning models provide an effective and deployable approach for disease risk prediction using structured EHR data, especially in resource-constrained clinical environments.
Downloads
Downloads
Published
Issue
Section
License
Copyright (c) 2026 Ziwei Wang, Haoyun Zhang, Di Zhu, Chen Xie (Author)

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.