A hybrid cluster based classification model for high dimensional disease prediction databases

P. Ramya; T. Bhaskar Reddy

doi:10.53730/ijhs.v6nS9.13989

Authors

P. Ramya Research scholar, Dept. of Computer Science & Technology, Sri Krishnadevaraya University Ananthapuram, India
T. Bhaskar Reddy Professor, Dept. of Computer Science & Technology, Sri Krishnadevaraya University Ananthapuram, India

Keywords:

Hybrid Cluster, high dimensional, databases

Abstract

As biomedical databases continue to expand, it becomes increasingly difficult to identify a crucial feature for a classification task due to big data size and sparsity issues. Traditional feature subset models rely on fixed-sized dimensions for the feature ranking and classification process, which is not suitable for addressing concerns with sparsity, missing values, and imbalance in the selection of crucial features for the data classification process. To enhance disease prediction effectiveness, this article proposes a hybrid ensemble feature selection method that employs an advanced cluster-based classification model. The model uses an ensemble of rated features to classify the disease with high accuracy and true positive rate. To improve the effectiveness of tree pruning and classification, we introduce a novel cluster-based classification model. We simulated experimental results using various training datasets to predict accuracy. Our proposed results demonstrate that the gene-chemical disease clustering-based classification framework outperforms traditional methods, statistical metrics, and classification models in terms of optimization.

Downloads

Download data is not yet available.

References

N. K. Berry et al., “Enrichment of atypical hyperdiploidy and IKZF1 deletions detected by SNP-microarray in high-risk Australian AIEOP-BFM B-cell acute lymphoblastic leukaemia cohort,” Cancer Genetics, vol. 242, pp. 8–14, Apr. 2020, doi: 10.1016/j.cancergen.2020.01.051.

N. K. Berry, R. J. Scott, P. Rowlings, and A. K. Enjeti, “Clinical use of SNP-microarrays for the detection of genome-wide changes in haematological malignancies,” Critical Reviews in Oncology/Hematology, vol. 142, pp. 58–67, Oct. 2019, doi: 10.1016/j.critrevonc.2019.07.016.

B. Cao et al., “Multiobjective feature selection for microarray data via distributed parallel algorithms,” Future Generation Computer Systems, vol. 100, pp. 952–981, Nov. 2019, doi: 10.1016/j.future.2019.02.030.

R. Dash, “A two stage grading approach for feature selection and classification of microarray data using Pareto based feature ranking techniques: A case study,” Journal of King Saud University - Computer and Information Sciences, vol. 32, no. 2, pp. 232–247, Feb. 2020, doi: 10.1016/j.jksuci.2017.08.005.

R. Dash, R. Dash, and R. Rautray, “An evolutionary framework based microarray gene selection and classification approach using binary shuffled frog leaping algorithm,” Journal of King Saud University - Computer and Information Sciences, Apr. 2019, doi: 10.1016/j.jksuci.2019.04.002.

N. A. Firdausanti and Irhamah, “On the Comparison of Crazy Particle Swarm Optimization and Advanced Binary Ant Colony Optimization for Feature Selection on High-Dimensional Data,” Procedia Computer Science, vol. 161, pp. 638–646, Jan. 2019, doi: 10.1016/j.procs.2019.11.167.

M. Ghosh, S. Begum, R. Sarkar, D. Chakraborty, and U. Maulik, “Recursive Memetic Algorithm for gene selection in microarray data,” Expert Systems with Applications, vol. 116, pp. 172–185, Feb. 2019, doi: 10.1016/j.eswa.2018.06.057.

B. I. Grisci, B. C. Feltes, and M. Dorn, “Neuroevolution as a tool for microarray gene expression pattern identification in cancer research,” Journal of Biomedical Informatics, vol. 89, pp. 122–133, Jan. 2019, doi: 10.1016/j.jbi.2018.11.013.

Y. He, J. Zhou, Y. Lin, and T. Zhu, “A class imbalance-aware Relief algorithm for the classification of tumors using microarray gene expression data,” Computational Biology and Chemistry, vol. 80, pp. 121–127, Jun. 2019, doi: 10.1016/j.compbiolchem.2019.03.017.

C. Kang, Y. Huo, L. Xin, B. Tian, and B. Yu, “Feature selection and tumor classification for microarray data using relaxed Lasso and generalized multi-class support vector machine,” Journal of Theoretical Biology, vol. 463, pp. 77–91, Feb. 2019, doi: 10.1016/j.jtbi.2018.12.010.

K. Kappel, E. Eschbach, M. Fischer, and J. Fritsche, “Design of a user-friendly and rapid DNA microarray assay for the authentication of ten important food fish species,” Food Chemistry, vol. 311, p. 125884, May 2020, doi: 10.1016/j.foodchem.2019.125884.

A. Kumar, S. C. Pandey, and M. Samant, “DNA-based microarray studies in visceral leishmaniasis: identification of biomarkers for diagnostic, prognostic and drug target for treatment,” Acta Tropica, vol. 208, p. 105512, Aug. 2020, doi: 10.1016/j.actatropica.2020.105512.

M. Momenzadeh, M. Sehhati, and H. Rabbani, “A novel feature selection method for microarray data classification based on hidden Markov model,” Journal of Biomedical Informatics, vol. 95, p. 103213, Jul. 2019, doi: 10.1016/j.jbi.2019.103213.

H. F. Ong, N. Mustapha, H. Hamdan, R. Rosli, and A. Mustapha, “Informative top-k class associative rule for cancer biomarker discovery on microarray data,” Expert Systems with Applications, vol. 146, p. 113169, May 2020, doi: 10.1016/j.eswa.2019.113169.

S. P. Potharaju and M. Sreedevi, “Distributed feature selection (DFS) strategy for microarray gene expression data to improve the classification performance,” Clinical Epidemiology and Global Health, vol. 7, no. 2, pp. 171–176, Jun. 2019, doi: 10.1016/j.cegh.2018.04.001.