Predicting hospital readmissions in diabetes patients: A comparative study of machine learning models

Alekhya Gandra

doi:10.53730/ijhs.v8n3.15189

Authors

Alekhya Gandra
alugandra04@gmail.com
Atlanta, Georgia, United States

Keywords:

Diabetes, healthcare analytics, hospital readmission, machine learning, predictive modelling

Abstract

This study addresses the high hospital readmission rates among diabetes patients, which contribute to increased healthcare costs and strain on resources. By leveraging machine learning (ML) techniques, the objective is to predict readmissions and help healthcare providers identify high-risk patients for early intervention. Six machine learning models—Logistic Regression, Random Forest, Gradient Boosting, XGBoost, LightGBM, and CATBoost—were employed using the Diabetes 130-US hospitals dataset, incorporating patient demographics, clinical data, and discharge information. The models were evaluated based on metrics such as accuracy, precision, recall, and AUC-ROC. Among the models, CATBoost performed the best, achieving an AUC score of 0.70 and an accuracy of 64.2%. The most critical predictive features were the number of inpatient visits, medications prescribed, and the length of hospital stays. These results highlight the potential of machine learning in predicting hospital readmissions, providing actionable insights for improving patient outcomes. Future research should explore integrating real-time health data from wearables and examine the role of social determinants to further enhance predictive accuracy and optimize healthcare resources.

Downloads

Download data is not yet available.

References

Artetxe, A., Beristain, A., & Grana, M. (2018). Predictive models for hospital readmission risk: A systematic review of methods. Computer methods and programs in biomedicine, 164, 49-64. https://doi.org/10.1016/j.cmpb.2018.06.006 DOI: https://doi.org/10.1016/j.cmpb.2018.06.006

Artiga, S., & Hinton, E. (2018). Beyond health care: the role of social determinants in promoting health and health equity. Kaiser Family Foundation, 10.

Basu, S., Berkowitz, S. A., Davis, C., Drake, C., Phillips, R. L., & Landon, B. E. (2023). Estimated costs of intervening in health-related social needs detected in primary care. JAMA Internal Medicine, 183(8), 762-774. DOI: https://doi.org/10.1001/jamainternmed.2023.1964

Bates, D. W., Saria, S., Ohno-Machado, L., Shah, A., & Escobar, G. (2014). Big data in health care: using analytics to identify and manage high-risk and high-cost patients. Health affairs, 33(7), 1123-1131. DOI: https://doi.org/10.1377/hlthaff.2014.0041

Björk, S. (2001). The cost of diabetes and diabetes care. Diabetes research and clinical practice, 54, 13-18. https://doi.org/10.1016/S0168-8227(01)00304-7 DOI: https://doi.org/10.1016/S0168-8227(01)00304-7

Breiman, L. (2001). Random forests. Machine learning, 45, 5-32. DOI: https://doi.org/10.1023/A:1010933404324

Caron, F., Vanthienen, J., & Baesens, B. (2013). Healthcare analytics: Examining the diagnosis–treatment cycle. Procedia Technology, 9, 996-1004. https://doi.org/10.1016/j.protcy.2013.12.111 DOI: https://doi.org/10.1016/j.protcy.2013.12.111

Conget, I. (2002). Diagnóstico, clasificación y patogenia de la diabetes mellitus. Revista española de cardiología, 55(5), 528-535. https://doi.org/10.1016/S0300-8932(02)76646-3 DOI: https://doi.org/10.1016/S0300-8932(02)76646-3

Deo, R. C. (2015). Machine learning in medicine. Circulation, 132(20), 1920-1930. DOI: https://doi.org/10.1161/CIRCULATIONAHA.115.001593

Fernández, A., García, S., Galar, M., Prati, R. C., Krawczyk, B., & Herrera, F. (2018). Learning from imbalanced data sets (Vol. 10, No. 2018). Cham: Springer. DOI: https://doi.org/10.1007/978-3-319-98074-4

Friedman, J. H. (2001). Greedy function approximation: a gradient boosting machine. Annals of statistics, 1189-1232. DOI: https://doi.org/10.1214/aos/1013203451

Halfon, P., Eggli, Y., van Melle, G., Chevalier, J., Wasserfallen, J. B., & Burnand, B. (2002). Measuring potentially avoidable hospital readmissions. Journal of clinical epidemiology, 55(6), 573-587. https://doi.org/10.1016/S0895-4356(01)00521-2 DOI: https://doi.org/10.1016/S0895-4356(01)00521-2

Hansen, L. O., Young, R. S., Hinami, K., Leung, A., & Williams, M. V. (2011). Interventions to reduce 30-day rehospitalization: a systematic review. Annals of internal medicine, 155(8), 520-528. DOI: https://doi.org/10.7326/0003-4819-155-8-201110180-00008

Hosmer Jr, D. W., Lemeshow, S., & Sturdivant, R. X. (2013). Applied logistic regression. John Wiley & Sons. DOI: https://doi.org/10.1002/9781118548387

Houthooft, R., Ruyssinck, J., van der Herten, J., Stijven, S., Couckuyt, I., Gadeyne, B., ... & De Turck, F. (2015). Predictive modelling of survival and length of stay in critically ill patients using sequential organ failure scores. Artificial intelligence in medicine, 63(3), 191-207. https://doi.org/10.1016/j.artmed.2014.12.009 DOI: https://doi.org/10.1016/j.artmed.2014.12.009

Kansagara, D., Englander, H., Salanitro, A., Kagen, D., Theobald, C., Freeman, M., & Kripalani, S. (2011). Risk prediction models for hospital readmission: a systematic review. Jama, 306(15), 1688-1698. DOI: https://doi.org/10.1001/jama.2011.1515

Khalifa, M., & Zabani, I. (2016). Utilizing health analytics in improving the performance of healthcare services: A case study on a tertiary care hospital. Journal of Infection and Public Health, 9(6), 757-765. https://doi.org/10.1016/j.jiph.2016.08.016 DOI: https://doi.org/10.1016/j.jiph.2016.08.016

Kononenko, I. (2001). Machine learning for medical diagnosis: history, state of the art and perspective. Artificial Intelligence in medicine, 23(1), 89-109. https://doi.org/10.1016/S0933-3657(01)00077-X DOI: https://doi.org/10.1016/S0933-3657(01)00077-X

Lestari, Y. D., Armi, A., Koniasari, K., Setiawan, Y., Sartika, M., Rohmah, H. N. F., Nurpratiwi, Y., & Fahrudin, A. (2022). Effectiveness of the emotional freedom techniques to reducing stress in diabetic patients. International Journal of Health Sciences, 6(2), 555–562. https://doi.org/10.53730/ijhs.v6n2.6728 DOI: https://doi.org/10.53730/ijhs.v6n2.6728

Lundberg, S. M., & Lee, S. I. (2017). A unified approach to interpreting model predictions. In Advances in Neural Information Processing Systems (pp. 4765-4774).

Mair, C., Kadoda, G., Lefley, M., Phalp, K., Schofield, C., Shepperd, M., & Webster, S. (2000). An investigation of machine learning based prediction systems. Journal of systems and software, 53(1), 23-29. https://doi.org/10.1016/S0164-1212(00)00005-4 DOI: https://doi.org/10.1016/S0164-1212(00)00005-4

McHugh, M. D., Berez, J., & Small, D. S. (2013). Hospitals with higher nurse staffing had lower odds of readmissions penalties than hospitals with lower staffing. Health Affairs, 32(10), 1740-1747. DOI: https://doi.org/10.1377/hlthaff.2013.0613

Miotto, R., Li, L., Kidd, B. A., & Dudley, J. T. (2016). Deep patient: an unsupervised representation to predict the future of patients from the electronic health records. Scientific reports, 6(1), 1-10. DOI: https://doi.org/10.1038/srep26094

Ogundokun, R. O., Lukman, A. F., Kibria, G. B., Awotunde, J. B., & Aladeitan, B. B. (2020). Predictive modelling of COVID-19 confirmed cases in Nigeria. Infectious Disease Modelling, 5, 543-548. https://doi.org/10.1016/j.idm.2020.08.003 DOI: https://doi.org/10.1016/j.idm.2020.08.003

Piwek, L., Ellis, D. A., Andrews, S., & Joinson, A. (2016). The rise of consumer health wearables: promises and barriers. PLoS medicine, 13(2), e1001953. DOI: https://doi.org/10.1371/journal.pmed.1001953

Powers, A. C., & D'Alessio, D. (2016). Endocrine physiology of diabetes. Diabetes Care, 39(S1), S1-S102. DOI: https://doi.org/10.2337/dc16-S001

Prokhorenkova, L., Gusev, G., Vorobev, A., Dorogush, A. V., & Gulin, A. (2018). CatBoost: unbiased boosting with categorical features. Advances in neural information processing systems, 31.

Strack, B., DeShazo, J. P., Gennings, C., Olmo, J. L., Ventura, S., Cios, K. J., & Clore, J. N. (2014). Impact of HbA1c measurement on hospital readmission rates: analysis of 70,000 clinical database patient records. BioMed research international, 2014(1), 781670. DOI: https://doi.org/10.1155/2014/781670

van Walraven, C., et al. (2010). The utility of case-mix adjustment in readmission rate comparisons among hospitals. BMC Health Services Research, 10(1), 1-11.

Zhang, H., Huang, M., Yang, J., & Sun, W. (2020). A data preprocessing method for automatic modulation classification based on CNN. IEEE Communications Letters, 25(4), 1206-1210. DOI: https://doi.org/10.1109/LCOMM.2020.3044755

Zhang, Z., et al. (2019). Data preprocessing in predictive modeling. Current Medical Research and Opinion, 35(4), 655-660.

Zheng, L., et al. (2017). Predicting hospital readmission using machine learning and data mining techniques: A systematic review. PLoS One, 12(4), e0174680.