Predicting hospital readmissions in diabetes patients: A comparative study of machine learning models
Keywords:
Diabetes, healthcare analytics, hospital readmission, machine learning, predictive modellingAbstract
This study addresses the high hospital readmission rates among diabetes patients, which contribute to increased healthcare costs and strain on resources. By leveraging machine learning (ML) techniques, the objective is to predict readmissions and help healthcare providers identify high-risk patients for early intervention. Six machine learning models—Logistic Regression, Random Forest, Gradient Boosting, XGBoost, LightGBM, and CATBoost—were employed using the Diabetes 130-US hospitals dataset, incorporating patient demographics, clinical data, and discharge information. The models were evaluated based on metrics such as accuracy, precision, recall, and AUC-ROC. Among the models, CATBoost performed the best, achieving an AUC score of 0.70 and an accuracy of 64.2%. The most critical predictive features were the number of inpatient visits, medications prescribed, and the length of hospital stays. These results highlight the potential of machine learning in predicting hospital readmissions, providing actionable insights for improving patient outcomes. Future research should explore integrating real-time health data from wearables and examine the role of social determinants to further enhance predictive accuracy and optimize healthcare resources.
Downloads
References
Artetxe, A., Beristain, A., & Grana, M. (2018). Predictive models for hospital readmission risk: A systematic review of methods. Computer methods and programs in biomedicine, 164, 49-64. https://doi.org/10.1016/j.cmpb.2018.06.006 DOI: https://doi.org/10.1016/j.cmpb.2018.06.006
Artiga, S., & Hinton, E. (2018). Beyond health care: the role of social determinants in promoting health and health equity. Kaiser Family Foundation, 10.
Basu, S., Berkowitz, S. A., Davis, C., Drake, C., Phillips, R. L., & Landon, B. E. (2023). Estimated costs of intervening in health-related social needs detected in primary care. JAMA Internal Medicine, 183(8), 762-774. DOI: https://doi.org/10.1001/jamainternmed.2023.1964
Bates, D. W., Saria, S., Ohno-Machado, L., Shah, A., & Escobar, G. (2014). Big data in health care: using analytics to identify and manage high-risk and high-cost patients. Health affairs, 33(7), 1123-1131. DOI: https://doi.org/10.1377/hlthaff.2014.0041
Björk, S. (2001). The cost of diabetes and diabetes care. Diabetes research and clinical practice, 54, 13-18. https://doi.org/10.1016/S0168-8227(01)00304-7 DOI: https://doi.org/10.1016/S0168-8227(01)00304-7
Breiman, L. (2001). Random forests. Machine learning, 45, 5-32. DOI: https://doi.org/10.1023/A:1010933404324
Caron, F., Vanthienen, J., & Baesens, B. (2013). Healthcare analytics: Examining the diagnosis–treatment cycle. Procedia Technology, 9, 996-1004. https://doi.org/10.1016/j.protcy.2013.12.111 DOI: https://doi.org/10.1016/j.protcy.2013.12.111
Conget, I. (2002). Diagnóstico, clasificación y patogenia de la diabetes mellitus. Revista española de cardiología, 55(5), 528-535. https://doi.org/10.1016/S0300-8932(02)76646-3 DOI: https://doi.org/10.1016/S0300-8932(02)76646-3
Deo, R. C. (2015). Machine learning in medicine. Circulation, 132(20), 1920-1930. DOI: https://doi.org/10.1161/CIRCULATIONAHA.115.001593
Fernández, A., García, S., Galar, M., Prati, R. C., Krawczyk, B., & Herrera, F. (2018). Learning from imbalanced data sets (Vol. 10, No. 2018). Cham: Springer. DOI: https://doi.org/10.1007/978-3-319-98074-4
Friedman, J. H. (2001). Greedy function approximation: a gradient boosting machine. Annals of statistics, 1189-1232. DOI: https://doi.org/10.1214/aos/1013203451
Halfon, P., Eggli, Y., van Melle, G., Chevalier, J., Wasserfallen, J. B., & Burnand, B. (2002). Measuring potentially avoidable hospital readmissions. Journal of clinical epidemiology, 55(6), 573-587. https://doi.org/10.1016/S0895-4356(01)00521-2 DOI: https://doi.org/10.1016/S0895-4356(01)00521-2
Hansen, L. O., Young, R. S., Hinami, K., Leung, A., & Williams, M. V. (2011). Interventions to reduce 30-day rehospitalization: a systematic review. Annals of internal medicine, 155(8), 520-528. DOI: https://doi.org/10.7326/0003-4819-155-8-201110180-00008
Hosmer Jr, D. W., Lemeshow, S., & Sturdivant, R. X. (2013). Applied logistic regression. John Wiley & Sons. DOI: https://doi.org/10.1002/9781118548387
Houthooft, R., Ruyssinck, J., van der Herten, J., Stijven, S., Couckuyt, I., Gadeyne, B., ... & De Turck, F. (2015). Predictive modelling of survival and length of stay in critically ill patients using sequential organ failure scores. Artificial intelligence in medicine, 63(3), 191-207. https://doi.org/10.1016/j.artmed.2014.12.009 DOI: https://doi.org/10.1016/j.artmed.2014.12.009
Kansagara, D., Englander, H., Salanitro, A., Kagen, D., Theobald, C., Freeman, M., & Kripalani, S. (2011). Risk prediction models for hospital readmission: a systematic review. Jama, 306(15), 1688-1698. DOI: https://doi.org/10.1001/jama.2011.1515
Khalifa, M., & Zabani, I. (2016). Utilizing health analytics in improving the performance of healthcare services: A case study on a tertiary care hospital. Journal of Infection and Public Health, 9(6), 757-765. https://doi.org/10.1016/j.jiph.2016.08.016 DOI: https://doi.org/10.1016/j.jiph.2016.08.016
Kononenko, I. (2001). Machine learning for medical diagnosis: history, state of the art and perspective. Artificial Intelligence in medicine, 23(1), 89-109. https://doi.org/10.1016/S0933-3657(01)00077-X DOI: https://doi.org/10.1016/S0933-3657(01)00077-X
Lestari, Y. D., Armi, A., Koniasari, K., Setiawan, Y., Sartika, M., Rohmah, H. N. F., Nurpratiwi, Y., & Fahrudin, A. (2022). Effectiveness of the emotional freedom techniques to reducing stress in diabetic patients. International Journal of Health Sciences, 6(2), 555–562. https://doi.org/10.53730/ijhs.v6n2.6728 DOI: https://doi.org/10.53730/ijhs.v6n2.6728
Lundberg, S. M., & Lee, S. I. (2017). A unified approach to interpreting model predictions. In Advances in Neural Information Processing Systems (pp. 4765-4774).
Mair, C., Kadoda, G., Lefley, M., Phalp, K., Schofield, C., Shepperd, M., & Webster, S. (2000). An investigation of machine learning based prediction systems. Journal of systems and software, 53(1), 23-29. https://doi.org/10.1016/S0164-1212(00)00005-4 DOI: https://doi.org/10.1016/S0164-1212(00)00005-4
McHugh, M. D., Berez, J., & Small, D. S. (2013). Hospitals with higher nurse staffing had lower odds of readmissions penalties than hospitals with lower staffing. Health Affairs, 32(10), 1740-1747. DOI: https://doi.org/10.1377/hlthaff.2013.0613
Miotto, R., Li, L., Kidd, B. A., & Dudley, J. T. (2016). Deep patient: an unsupervised representation to predict the future of patients from the electronic health records. Scientific reports, 6(1), 1-10. DOI: https://doi.org/10.1038/srep26094
Ogundokun, R. O., Lukman, A. F., Kibria, G. B., Awotunde, J. B., & Aladeitan, B. B. (2020). Predictive modelling of COVID-19 confirmed cases in Nigeria. Infectious Disease Modelling, 5, 543-548. https://doi.org/10.1016/j.idm.2020.08.003 DOI: https://doi.org/10.1016/j.idm.2020.08.003
Piwek, L., Ellis, D. A., Andrews, S., & Joinson, A. (2016). The rise of consumer health wearables: promises and barriers. PLoS medicine, 13(2), e1001953. DOI: https://doi.org/10.1371/journal.pmed.1001953
Powers, A. C., & D'Alessio, D. (2016). Endocrine physiology of diabetes. Diabetes Care, 39(S1), S1-S102. DOI: https://doi.org/10.2337/dc16-S001
Prokhorenkova, L., Gusev, G., Vorobev, A., Dorogush, A. V., & Gulin, A. (2018). CatBoost: unbiased boosting with categorical features. Advances in neural information processing systems, 31.
Strack, B., DeShazo, J. P., Gennings, C., Olmo, J. L., Ventura, S., Cios, K. J., & Clore, J. N. (2014). Impact of HbA1c measurement on hospital readmission rates: analysis of 70,000 clinical database patient records. BioMed research international, 2014(1), 781670. DOI: https://doi.org/10.1155/2014/781670
van Walraven, C., et al. (2010). The utility of case-mix adjustment in readmission rate comparisons among hospitals. BMC Health Services Research, 10(1), 1-11.
Zhang, H., Huang, M., Yang, J., & Sun, W. (2020). A data preprocessing method for automatic modulation classification based on CNN. IEEE Communications Letters, 25(4), 1206-1210. DOI: https://doi.org/10.1109/LCOMM.2020.3044755
Zhang, Z., et al. (2019). Data preprocessing in predictive modeling. Current Medical Research and Opinion, 35(4), 655-660.
Zheng, L., et al. (2017). Predicting hospital readmission using machine learning and data mining techniques: A systematic review. PLoS One, 12(4), e0174680.
Published
How to Cite
Issue
Section
Copyright (c) 2024 International journal of health sciences
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.
Articles published in the International Journal of Health Sciences (IJHS) are available under Creative Commons Attribution Non-Commercial No Derivatives Licence (CC BY-NC-ND 4.0). Authors retain copyright in their work and grant IJHS right of first publication under CC BY-NC-ND 4.0. Users have the right to read, download, copy, distribute, print, search, or link to the full texts of articles in this journal, and to use them for any other lawful purpose.
Articles published in IJHS can be copied, communicated and shared in their published form for non-commercial purposes provided full attribution is given to the author and the journal. Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgment of its initial publication in this journal.
This copyright notice applies to articles published in IJHS volumes 4 onwards. Please read about the copyright notices for previous volumes under Journal History.