Minimum relevant features to obtain AI explainable system for predicting breast cancer in WDBC

Agarwal Rashi; Revanth Madamala

doi:10.53730/ijhs.v6nS9.12538

Authors

Agarwal Rashi Department of Computer Science and Engineering, Harcourt Butler Technical University, Kanpur
Revanth Madamala Computer Science Department, University of Southern California, Los Angeles, CA, U.S.A

Keywords:

XAI, SHAP, LIME, Skope Rules, feature selection, breast cancer, WDBC, Decision Tree, Ensemble

Abstract

The potential to explain why a machine learning model produces a certain prediction in incomprehensible terms is becoming increasingly crucial, as it provides accountability and confidence in the algorithm's decision-making process. The interpretation of complex models is difficult. Various approaches to dealing with this issue are being offered. These problems are typically handled in tree ensemble methods by assigning priority levels to input features globally or for a specific prediction. We show that current feature attribution approaches are inconclusive, and develop solutions using SHAP (SHapley Additive Explanation) values, LIME (Local Interpretable Model-Agnostic Explanations), and the Skope Rules package. We employ feature selection methods from SHAP and LIME in this work, which uses the Breast cancer Wisconsin data sets. In the suggested method, features are chosen at the first level of feature selection using Decision tree entropy values. Based on the SHAP and LIME reports, level 2 features are chosen from fewer options. The features are tested on a Decision Tree (DT) model and a DT and Support Vector Machine (SVM) ensemble. Experiments suggest that the ensemble works better as compared to DT. We have also used the Skope Rules package to generate global rules for generalization.

Downloads

Download data is not yet available.

References

Agarwal R, H, P, SS (2021a) Deep Learning augmented with Contour Detection Designed for Diagnosis of COVID-19 in X-rays. Design Engineering(Toronto) 8:10445–10461

Agarwal R, Hariharan S, Nagabhushana Rao M, Agarwal A (2021) Weed Identification using K-Means Clustering with Color Spaces Features in Multi-Spectral Images Taken by UAV. In: 2021 IEEE International Geoscience and Remote Sensing Symposium IGARSS. IEEE, pp 7047–7050

Agarwal RAA (2021b) Decision Support System designed to detect yellow mosaic in Pigeon pea using Computer Vision. Design Engineering(Toronto) 8:832–844

Baehrens D, Schroeter T, Harmeling S, et al (2009) How to Explain Individual Classification Decisions

Camburu O-M, Giunchiglia E, Foerster J, et al (2019) Can I Trust the Explainer? Verifying Post-hoc Explanatory Methods

Chen L-C, Zhu Y, Papandreou G, et al (2018) Encoder-Decoder with Atrous Separable Convolution for Semantic Image Segmentation

Chih-Wei Hsu, Chih-Jen Lin (2002) A comparison of methods for multiclass support vector machines. IEEE Transactions on Neural Networks 13:415–425. https://doi.org/10.1109/72.991427

Dabkowski P, Gal Y (2017) Real Time Image Saliency for Black Box Classifiers Fong R, Vedaldi A (2017) Interpretable Explanations of Black Boxes by Meaningful Perturbation. https://doi.org/10.1109/ICCV.2017.371

Frank, A. & Asuncion, A. (2010). UCI Machine Learning Repository [http://archive.ics.uci.edu/ml]. Irvine, CA: University of California, School of Information and Computer Science.

Gilpin LH, Bau D, Yuan BZ, et al (2018) Explaining Explanations: An Overview of Interpretability of Machine Learning

Goncharuk, N., Lysenko, O., Nataliya, P., Kovyda, N., & Tsekhmister, Y. (2022). Management of psychological changes at pregnant women during the COVID-19 pandemic. International Journal of Health Sciences, 6(2), 870–879. https://doi.org/10.53730/ijhs.v6n2.8528

Gosiewska A, Biecek P (2019) Do Not Trust Additive Explanations

Katuwal GJ, Chen R (2016a) Machine Learning Model Interpretability for Precision Medicine

Kim K (2003) Financial time series forecasting using support vector machines. Neurocomputing 55:307–319. https://doi.org/10.1016/S0925-2312(03)00372-2

Lakkaraju H, Bastani O (2019) “How do I fool you?”: Manipulating User Trust via Misleading Black Box Explanations

Lei J, G'Sell M, Rinaldo A, et al (2016) Distribution-Free Predictive Inference For Regression

Letham B, Rudin C, McCormick TH, Madigan D (2015) Interpretable classifiers using rules and Bayesian analysis: Building a better stroke prediction model. https://doi.org/10.1214/15-AOAS848

Lundberg S, Lee S-I (2017a) A Unified Approach to Interpreting Model Predictions

Lundberg SM, Erion G, Chen H, et al (2020) From local explanations to global understanding with explainable AI for trees. Nature Machine Intelligence 2:56–67. https://doi.org/10.1038/s42256-019-0138-9

Malvia S, Bagadi SA, Dubey US, Saxena S (2017) Epidemiology of breast cancer in Indian women. Asia-Pacific Journal of Clinical Oncology 13:289–295. https://doi.org/10.1111/ajco.12661

Mohseni S, Zarei N, Ragan ED (2018) A Multidisciplinary Survey and Framework for Design and Evaluation of Explainable AI Systems

Nurwahidah, H., Maswan, M., & Fathoni, A. (2020). The effect of deep breathing technique and lo’i sto combination on decreasing the symptomps of asthma patients at the area of public health center of Penana’e. International Journal of Social Sciences and Humanities, 4(1), 140–150. https://doi.org/10.29332/ijssh.v4n1.431

Rahnama AHA, Boström H (2019) A study of data and label shift in the LIME framework

Ribeiro MT, Singh S, Guestrin C (2016a) "Why Should I Trust You?": Explaining the Predictions of Any Classifier

Robnik-Sikonja M, Kononenko I (2008) Explaining Classifications For Individual Instances. IEEE Transactions on Knowledge and Data Engineering 20:589–600. https://doi.org/10.1109/TKDE.2007.190734

Ross AS, Hughes MC, Doshi-Velez F (2017) Right for the Right Reasons: Training Differentiable Models by Constraining their Explanations. In: Proceedings of the Twenty-Sixth International Joint Conference on Artificial Intelligence. International Joint Conferences on Artificial Intelligence Organization, California, pp 2662–2670

Simonyan K, Vedaldi A, Zisserman A (2013) Deep Inside Convolutional Networks: Visualising Image Classification Models and Saliency Maps

Singh SD, Henley SJ, Ryerson AB (2017) Surveillance for Cancer Incidence and Mortality - United States, 2013. Morbidity and mortality weekly report Surveillance summaries (Washington, DC : 2002) 66:1–36. https://doi.org/10.15585/mmwr.ss6604a1

Sundararajan M, Taly A, Yan Q (2017) Axiomatic Attribution for Deep Networks

Suryasa, I. W., Rodríguez-Gámez, M., & Koldoris, T. (2021). Get vaccinated when it is your turn and follow the local guidelines. International Journal of Health Sciences, 5(3), x-xv. https://doi.org/10.53730/ijhs.v5n3.2938

Valdes G, Luna JM, Eaton E, et al (2016) MediBoost: a Patient Stratification Tool for Interpretable Decision Making in the Era of Precision Medicine. Scientific Reports 6:37854. https://doi.org/10.1038/srep37854

Widodo A, Yang B-S, Han T (2007) Combination of independent component analysis and support vector machines for intelligent faults diagnosis of induction motors. Expert Systems with Applications 32:299–312. https://doi.org/10.1016/j.eswa.2005.11.031

Zeiler MD, Fergus R (2013) Visualizing and Understanding Convolutional Networks

Zhou B, Khosla A, Lapedriza A, et al (2015) Learning Deep Features for Discriminative Localization