An Efficient multi-class SVM and Bayesian network based biomedical document ranking and classification framework using Gene-disease and ICD drug discovery databases
Keywords:
gene data, machine learning, classification, keyphrase ranking, drug discoveryAbstract
Biomedical document feature extraction and ranking play an essential role in the real-time document key phrase extraction and ranking. International classification of disease (ICD-10) is a list of medical related terms such as disease symptoms, abnormal discovery and disease signs. In most of the conventional methods, finding, extraction and ranking of biomedical disease patterns with the gene terms help to rank the phrase or document. However, the contextual disease patterns of these methods areindependent of gene entities, disease entities and drug discovery codes for document ranking and summarization.Conventional word embedding models such as gain ratio, entropy,Glove, chi-square and probabilistic measures are used to find the essential key terms and its relationships using static gene disease databases.The main objective of the proposed work is to optimize the word embedding model along with the key-phrase ranking and classification. Most of the biomedical applications use pre-trained gene-disease database with limited number of gene names for keyphrase ranking and classification process. In this work, an integrated gene-disease database and ICD drug database codes are used to train the model using the optimized SVM classification model and Bayesian estimation model.
Downloads
References
M. Almagro, R. Martínez, S. Montalvo, and V. Fresno, “A cross-lingual approach to automatic ICD-10 coding of death certificates by exploring machine translation,” Journal of Biomedical Informatics, vol. 94, p. 103207, Jun. 2019, doi: 10.1016/j.jbi.2019.103207.
M. Amith, Z. He, J. Bian, J. A. Lossio-Ventura, and C. Tao, “Assessing the practice of biomedical ontology evaluation: Gaps and opportunities,” Journal of Biomedical Informatics, vol. 80, pp. 1–13, Apr. 2018, doi: 10.1016/j.jbi.2018.02.010.
Y. Balarajan and M. R. Reich, “Political economy of child nutrition policy: A qualitative study of India’s Integrated Child Development Services (ICDS) scheme,” Food Policy, vol. 62, pp. 88–98, Jul. 2016, doi: 10.1016/j.foodpol.2016.05.001.
C. Cabot, S. Darmoni, and L. F. Soualmia, “Cimind: A phonetic-based tool for multilingual named entity recognition in biomedical texts,” Journal of Biomedical Informatics, vol. 94, p. 103176, Jun. 2019, doi: 10.1016/j.jbi.2019.103176.
D. Dinh, L. Tamine, and F. Boubekeur, “Factors affecting the effectiveness of biomedical document indexing and retrieval based on terminologies,” Artificial Intelligence in Medicine, vol. 57, no. 2, pp. 155–167, Feb. 2013, doi: 10.1016/j.artmed.2012.08.006.
F. Duarte, B. Martins, C. S. Pinto, and M. J. Silva, “Deep neural models for ICD-10 coding of death certificates and autopsy reports in free-text,” Journal of Biomedical Informatics, vol. 80, pp. 64–77, Apr. 2018, doi: 10.1016/j.jbi.2018.02.011.
A. Duque, H. Fabregat, L. Araujo, and J. Martinez-Romo, “A keyphrase-based approach for interpretable ICD-10 code classification of Spanish medical reports,” Artificial Intelligence in Medicine, vol. 121, p. 102177, Nov. 2021, doi: 10.1016/j.artmed.2021.102177.
G. Harerimana, J. W. Kim, and B. Jang, “A deep attention model to forecast the Length Of Stay and the in-hospital mortality right on admission from ICD codes and demographic data,” Journal of Biomedical Informatics, vol. 118, p. 103778, Jun. 2021, doi: 10.1016/j.jbi.2021.103778.
M. A. Ibrahim, M. U. Ghani Khan, F. Mehmood, M. N. Asim, and W. Mahmood, “GHS-NET a generic hybridized shallow neural network for multi-label biomedical text classification,” Journal of Biomedical Informatics, vol. 116, p. 103699, Apr. 2021, doi: 10.1016/j.jbi.2021.103699.
A. G. Jácome, F. Fdez-Riverola, and A. Lourenço, “BIOMedical Search Engine Framework: Lightweight and customized implementation of domain-specific biomedical search engines,” Computer Methods and Programs in Biomedicine, vol. 131, pp. 63–77, Jul. 2016, doi: 10.1016/j.cmpb.2016.03.030.
I. Kamkar, S. K. Gupta, D. Phung, and S. Venkatesh, “Stable feature selection for clinical prediction: Exploiting ICD tree structure using Tree-Lasso,” Journal of Biomedical Informatics, vol. 53, pp. 277–290, Feb. 2015, doi: 10.1016/j.jbi.2014.11.013.
A. Khalifa et al., “A qualitative investigation of biomedical informatics interoperability standards for genetic test reporting: benefits, challenges, and motivations from the testing laboratory’s perspective,” Genetics in Medicine, vol. 23, no. 11, pp. 2178–2185, Nov. 2021, doi: 10.1038/s41436-021-01301-y.
L. Li et al., “Real-world data medical knowledge graph: construction and applications,” Artificial Intelligence in Medicine, vol. 103, p. 101817, Mar. 2020, doi: 10.1016/j.artmed.2020.101817.
J. Noh and R. Kavuluru, “Improved biomedical word embeddings in the transformer era,” Journal of Biomedical Informatics, vol. 120, p. 103867, Aug. 2021, doi: 10.1016/j.jbi.2021.103867.
O. Rouane, H. Belhadef, and M. Bouakkaz, “Combine clustering and frequent itemsets mining to enhance biomedical text summarization,” Expert Systems with Applications, vol. 135, pp. 362–373, Nov. 2019, doi: 10.1016/j.eswa.2019.06.002.
J. Sankhavara, R. Dave, B. Dave, and P. Majumder, “Query specific graph-based query reformulation using UMLS for clinical information access,” Journal of Biomedical Informatics, vol. 108, p. 103493, Aug. 2020, doi: 10.1016/j.jbi.2020.103493.
A. Sonabend W et al., “Automated ICD coding via unsupervised knowledge integration (UNITE),” International Journal of Medical Informatics, vol. 139, p. 104135, Jul. 2020, doi: 10.1016/j.ijmedinf.2020.104135.
L. Wang, P. J. Haug, and G. Del Fiol, “Using classification models for the generation of disease-specific medications from biomedical literature and clinical data repository,” Journal of Biomedical Informatics, vol. 69, pp. 259–266, May 2017, doi: 10.1016/j.jbi.2017.04.014.
Q. Wang et al., “A study of entity-linking methods for normalizing Chinese diagnosis and procedure terms to ICD codes,” Journal of Biomedical Informatics, vol. 105, p. 103418, May 2020, doi: 10.1016/j.jbi.2020.103418.
X. Zhan, M. Humbert-Droz, P. Mukherjee, and O. Gevaert, “Structuring clinical text with AI: Old versus new natural language processing techniques evaluated on eight common cardiovascular diseases,” Patterns, vol. 2, no. 7, p. 100289, Jul. 2021, doi: 10.1016/j.patter.2021.100289.
D. Zhao, J. Wang, Y. Chu, Y. Zhang, Z. Yang, and H. Lin, “Improving biomedical word representation with locally linear embedding,” Neurocomputing, vol. 447, pp. 172–182, Aug. 2021, doi: 10.1016/j.neucom.2021.02.071.
Published
How to Cite
Issue
Section
Copyright (c) 2022 International journal of health sciences

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.
Articles published in the International Journal of Health Sciences (IJHS) are available under Creative Commons Attribution Non-Commercial No Derivatives Licence (CC BY-NC-ND 4.0). Authors retain copyright in their work and grant IJHS right of first publication under CC BY-NC-ND 4.0. Users have the right to read, download, copy, distribute, print, search, or link to the full texts of articles in this journal, and to use them for any other lawful purpose.
Articles published in IJHS can be copied, communicated and shared in their published form for non-commercial purposes provided full attribution is given to the author and the journal. Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgment of its initial publication in this journal.
This copyright notice applies to articles published in IJHS volumes 4 onwards. Please read about the copyright notices for previous volumes under Journal History.