An Efficient multi-class SVM and Bayesian network based biomedical document ranking and classification framework using Gene-disease and ICD drug discovery databases

V. Shiva Narayana Reddy; Divya Midhunchakkaravarthy

doi:10.53730/ijhs.v6nS3.5551

Authors

V. Shiva Narayana Reddy
Shiva_reddy5017@yahoo.com
Research Scholar, Lincoln University College, Malaysia
Divya Midhunchakkaravarthy Professor, Lincoln University College, Malaysia

Keywords:

gene data, machine learning, classification, keyphrase ranking, drug discovery

Abstract

Biomedical document feature extraction and ranking play an essential role in the real-time document key phrase extraction and ranking. International classification of disease (ICD-10) is a list of medical related terms such as disease symptoms, abnormal discovery and disease signs. In most of the conventional methods, finding, extraction and ranking of biomedical disease patterns with the gene terms help to rank the phrase or document. However, the contextual disease patterns of these methods areindependent of gene entities, disease entities and drug discovery codes for document ranking and summarization.Conventional word embedding models such as gain ratio, entropy,Glove, chi-square and probabilistic measures are used to find the essential key terms and its relationships using static gene disease databases.The main objective of the proposed work is to optimize the word embedding model along with the key-phrase ranking and classification. Most of the biomedical applications use pre-trained gene-disease database with limited number of gene names for keyphrase ranking and classification process. In this work, an integrated gene-disease database and ICD drug database codes are used to train the model using the optimized SVM classification model and Bayesian estimation model.

Downloads

Download data is not yet available.

References

M. Almagro, R. Martínez, S. Montalvo, and V. Fresno, “A cross-lingual approach to automatic ICD-10 coding of death certificates by exploring machine translation,” Journal of Biomedical Informatics, vol. 94, p. 103207, Jun. 2019, doi: 10.1016/j.jbi.2019.103207.

M. Amith, Z. He, J. Bian, J. A. Lossio-Ventura, and C. Tao, “Assessing the practice of biomedical ontology evaluation: Gaps and opportunities,” Journal of Biomedical Informatics, vol. 80, pp. 1–13, Apr. 2018, doi: 10.1016/j.jbi.2018.02.010.

Y. Balarajan and M. R. Reich, “Political economy of child nutrition policy: A qualitative study of India’s Integrated Child Development Services (ICDS) scheme,” Food Policy, vol. 62, pp. 88–98, Jul. 2016, doi: 10.1016/j.foodpol.2016.05.001.

C. Cabot, S. Darmoni, and L. F. Soualmia, “Cimind: A phonetic-based tool for multilingual named entity recognition in biomedical texts,” Journal of Biomedical Informatics, vol. 94, p. 103176, Jun. 2019, doi: 10.1016/j.jbi.2019.103176.

D. Dinh, L. Tamine, and F. Boubekeur, “Factors affecting the effectiveness of biomedical document indexing and retrieval based on terminologies,” Artificial Intelligence in Medicine, vol. 57, no. 2, pp. 155–167, Feb. 2013, doi: 10.1016/j.artmed.2012.08.006.

F. Duarte, B. Martins, C. S. Pinto, and M. J. Silva, “Deep neural models for ICD-10 coding of death certificates and autopsy reports in free-text,” Journal of Biomedical Informatics, vol. 80, pp. 64–77, Apr. 2018, doi: 10.1016/j.jbi.2018.02.011.

A. Duque, H. Fabregat, L. Araujo, and J. Martinez-Romo, “A keyphrase-based approach for interpretable ICD-10 code classification of Spanish medical reports,” Artificial Intelligence in Medicine, vol. 121, p. 102177, Nov. 2021, doi: 10.1016/j.artmed.2021.102177.

G. Harerimana, J. W. Kim, and B. Jang, “A deep attention model to forecast the Length Of Stay and the in-hospital mortality right on admission from ICD codes and demographic data,” Journal of Biomedical Informatics, vol. 118, p. 103778, Jun. 2021, doi: 10.1016/j.jbi.2021.103778.

M. A. Ibrahim, M. U. Ghani Khan, F. Mehmood, M. N. Asim, and W. Mahmood, “GHS-NET a generic hybridized shallow neural network for multi-label biomedical text classification,” Journal of Biomedical Informatics, vol. 116, p. 103699, Apr. 2021, doi: 10.1016/j.jbi.2021.103699.

A. G. Jácome, F. Fdez-Riverola, and A. Lourenço, “BIOMedical Search Engine Framework: Lightweight and customized implementation of domain-specific biomedical search engines,” Computer Methods and Programs in Biomedicine, vol. 131, pp. 63–77, Jul. 2016, doi: 10.1016/j.cmpb.2016.03.030.

I. Kamkar, S. K. Gupta, D. Phung, and S. Venkatesh, “Stable feature selection for clinical prediction: Exploiting ICD tree structure using Tree-Lasso,” Journal of Biomedical Informatics, vol. 53, pp. 277–290, Feb. 2015, doi: 10.1016/j.jbi.2014.11.013.

A. Khalifa et al., “A qualitative investigation of biomedical informatics interoperability standards for genetic test reporting: benefits, challenges, and motivations from the testing laboratory’s perspective,” Genetics in Medicine, vol. 23, no. 11, pp. 2178–2185, Nov. 2021, doi: 10.1038/s41436-021-01301-y.

L. Li et al., “Real-world data medical knowledge graph: construction and applications,” Artificial Intelligence in Medicine, vol. 103, p. 101817, Mar. 2020, doi: 10.1016/j.artmed.2020.101817.

J. Noh and R. Kavuluru, “Improved biomedical word embeddings in the transformer era,” Journal of Biomedical Informatics, vol. 120, p. 103867, Aug. 2021, doi: 10.1016/j.jbi.2021.103867.

O. Rouane, H. Belhadef, and M. Bouakkaz, “Combine clustering and frequent itemsets mining to enhance biomedical text summarization,” Expert Systems with Applications, vol. 135, pp. 362–373, Nov. 2019, doi: 10.1016/j.eswa.2019.06.002.

J. Sankhavara, R. Dave, B. Dave, and P. Majumder, “Query specific graph-based query reformulation using UMLS for clinical information access,” Journal of Biomedical Informatics, vol. 108, p. 103493, Aug. 2020, doi: 10.1016/j.jbi.2020.103493.

A. Sonabend W et al., “Automated ICD coding via unsupervised knowledge integration (UNITE),” International Journal of Medical Informatics, vol. 139, p. 104135, Jul. 2020, doi: 10.1016/j.ijmedinf.2020.104135.

L. Wang, P. J. Haug, and G. Del Fiol, “Using classification models for the generation of disease-specific medications from biomedical literature and clinical data repository,” Journal of Biomedical Informatics, vol. 69, pp. 259–266, May 2017, doi: 10.1016/j.jbi.2017.04.014.

Q. Wang et al., “A study of entity-linking methods for normalizing Chinese diagnosis and procedure terms to ICD codes,” Journal of Biomedical Informatics, vol. 105, p. 103418, May 2020, doi: 10.1016/j.jbi.2020.103418.

X. Zhan, M. Humbert-Droz, P. Mukherjee, and O. Gevaert, “Structuring clinical text with AI: Old versus new natural language processing techniques evaluated on eight common cardiovascular diseases,” Patterns, vol. 2, no. 7, p. 100289, Jul. 2021, doi: 10.1016/j.patter.2021.100289.

D. Zhao, J. Wang, Y. Chu, Y. Zhang, Z. Yang, and H. Lin, “Improving biomedical word representation with locally linear embedding,” Neurocomputing, vol. 447, pp. 172–182, Aug. 2021, doi: 10.1016/j.neucom.2021.02.071.