Hybrid multi-document text summarization via categorization based on BERT deep learning models

https://doi.org/10.53730/ijhs.v6nS1.6095

Authors

  • S. Sudha Lakshmi Research scholar, Dept. of Computer Science SPMVV, Tirupati, India
  • M. Usha Rani Professor, Dept. of Computer Science SPMVV, Tirupati, India

Keywords:

Text Summarization, Category_id Score based categorization, BERT, Deep Learning

Abstract

Text summarization is the process of employing a system to shorten a document or a collection of documents into brief paragraphs or sentences using various approaches. This paper presents text categorization using BERT to improve summarization task which is a state-of-the-art deep learning language processing model that performs significantly better than all other previous language models. Multi-document summarization (MDS) has got its bottleneck due to lack of training data and varied categories of documents. Aiming in this direction, the proposed novel hybrid summarization B-HEATS (Bert based Hybrid Extractive Abstractive Text Summarization)framework is a combination of extractive summary via categorization and abstractive summary using deep learning architecture RNN-LSTM-CNN to fine-tune BERT which results in the qualitative summary for multiple documents and overcomes out of vocabulary (OOV). The output layer of BERT is replaced using RNN-LSTM-CNN architecture to fine tune which improves the summarization model. The proposed automatic text summarization is compared over the existing models in terms of performance measures like ROUGE metrics achieves high scores as R1 score 43.61, R2 score 22.64, R3 score 44.95 and RL score is 44.27 on Benchmark DUC datasets.

Downloads

Download data is not yet available.

References

Milad Moradi, Georg Dorffner, Matthias Samwald,"Deep contextualized embeddings for quantifying the informative content in biomedical text summarization",Computer Methods and Programs in Biomedicine, 2019 DOI: https://doi.org/10.1016/j.cmpb.2019.105117

Jinming Zhao, Ming Liu,Longxiang Gao,Yuan Jin,Lan Du and He Zhao,"SummPip: Unsupervised Multi-Document Summarization with Sentence Graph Compression",SIGIR,2020 DOI: https://doi.org/10.1145/3397271.3401327

Akanksha Joshi, E. Fidalgo,E. Alegre,Laura Fernández-Robles,"SummCoder: An unsupervised framework for extractive text summarization based on deep auto-encoders",Expert Systems With Applications,vol.129,pp.200-215,2019 DOI: https://doi.org/10.1016/j.eswa.2019.03.045

Deepa Anand and Rupali Wagh,"Effective deep learning approaches for summarization of legal texts",Journal of King Saud University – Computer and Information Sciences,2019 DOI: https://doi.org/10.1016/j.jksuci.2019.11.015

Qasem A. Al-Radaideh and Dareen Q.Bataineh, “ A Hybrid approach for Arabic text summarization Using Domain Knowledge and Genetic algorithms”, Cognitive Computation, March, 2018 DOI: https://doi.org/10.1007/s12559-018-9547-z

Shengli Song,Haitao Huang & Tongxiao Ruan ,"Abstractive text summarization using LSTM-CNN based deep learning",Multimedia Tools and Applications , vol.78,pp.857-875,2019 DOI: https://doi.org/10.1007/s11042-018-5749-3

Nabil Alami, Noureddine En-nahnahi, Said Alaoui Ouatik & Mohammed Meknassi ,"Using Unsupervised Deep Learning for Automatic Summarization of Arabic Documents",Arabian Journal for Science and Engineering , vol.43, pp.7803-7815,2018 DOI: https://doi.org/10.1007/s13369-018-3198-y

Minakshi Tomer & Manoj Kumar,"Improving Text Summarization using Ensembled Approach based on Fuzzy with LSTM",Arabian Journal for Science and Engineering,2020 DOI: https://doi.org/10.1007/s13369-020-04827-6

Zhenrong Deng, Fuxin Ma, Rushi Lan, Wenming Huang, Xiaonan Luo,"A Two-stage Chinese text summarization algorithm using keyword information and adversarial learning",Neurocomputing, in communication, 2020 DOI: https://doi.org/10.1016/j.neucom.2020.02.102

Asad Abdi, Siti Mariyam Shamsuddin, Shafaatunnur Hasan, Jalil Piran,"Machine learning-based multi-documents sentiment-oriented summarization using linguistic treatment",Expert Systems with Applications,2018 DOI: https://doi.org/10.1016/j.eswa.2018.05.010

Nabil Alami, Mohammed Meknassi, Noureddine En-nahnahi,"Enhancing unsupervised neural networks based text summarization with word embedding and ensemble learning",Expert Systems with Application,2019 DOI: https://doi.org/10.1016/j.eswa.2019.01.037

Arturo Curiel, Claudio Gutiérrez-Soto, José-Rafael Rojano-Cáceres,"An online multi-source summarization algorithm for text readability in topic-based search",Computer Speech & Language, in communication,2020 DOI: https://doi.org/10.1016/j.csl.2020.101143

X. Lin, M. Liu and J. Zhang, "A Top-Down Binary Hierarchical Topic Model for Biomedical Literature," in IEEE Access, vol. 8, pp. 59870-59882, 2020, doi: 10.1109/ACCESS.2020.2983265. DOI: https://doi.org/10.1109/ACCESS.2020.2983265

Rupal Bhargava, Yashvardhan Sharma,"Deep Extractive Text Summarization",Procedia Computer Science,2020 DOI: https://doi.org/10.1016/j.procs.2020.03.191

Shengluan Hou, Ruqian Lu,"Knowledge-guided unsupervised rhetorical parsing for text summarization",Information Systems,2020

Rupal Bhargava, Gargi Sharma, Yashvardhan Sharma,"Deep Text Summarization using Generative Adversarial Networks in Indian Languages",Procedia Computer Science,2020 DOI: https://doi.org/10.1016/j.procs.2020.03.192

Amy J. C. Trappey, Charles V. Trappey, Jheng-Long Wu, Jack W. C. Wang,"Intelligent compilation of patent summaries using machine learning and natural language processing techniques",Advanced Engineering Informatics,2020 DOI: https://doi.org/10.1016/j.aei.2019.101027

Jiang Z, Liu M, Yin Y, Yu H, Cheng Z and Gu Q. Learning from Graph Propagation via Ordinal Distillation for One-Shot Automated Essay Scoring Proceedings of the Web Conference 2021, (2347-2356) DOI: https://doi.org/10.1145/3442381.3450017

J. Jiang et al.,“Enhancements of Attention-Based Bidirectional LSTM for Hybrid Automatic Text Summarization,” in IEEE Access, vol. 9, pp. 123660-123671, 2021. DOI: https://doi.org/10.1109/ACCESS.2021.3110143

Ramesh Nallapati, FeifeiZhai, and Bowen Zhou. 2017. “SummaRuNNer: a recurrent neural network based sequence model for extractive summarization of documents”. In Proceedings of the Thirty-First AAAI Conference on Artificial Intelligence, AAAI Press, 3075–3081.

Farooq Zaman, Matthew Shardlow, Saeed-Ul Hassan, Naif Radi Aljohani, Raheel Nawaz ,"HTSS: A novel hybrid text summarisation and simplification architecture",Information Processing & Management,2020 DOI: https://doi.org/10.1016/j.ipm.2020.102351

Adhika Pramita Widyassari, Supriadi Rustad, Guruh Fajar Shidik, Edi Noersasongko, De Rosal Ignatius Moses Setiadi,"Review of automatic text summarization techniques & methods",Journal of King Saud University - Computer and Information Sciences,2020

Min Yang, Xintong Wang, Yao Lu, Jianming Lv, Chengming Li,"Plausibility-promoting generative adversarial network for abstractive text summarization with multi-task constraint",Information Sciences,2020 DOI: https://doi.org/10.1016/j.ins.2020.02.040

Jiyuan Zheng, Zhou Zhao, Zehan Song, Min Yang, Xiaohui Yan,"Abstractive meeting summarization by hierarchical adaptive segmental network learning with multiple revising steps",Neurocomputing,2020 DOI: https://doi.org/10.1016/j.neucom.2019.10.019

Duy Duc An Bui, Guilherme Del Fiol, John F. Hurdle, Siddhartha Jonnalagadda,"Extractive text summarization system to aid data extraction from full text in systematic review development",Journal of Biomedical Informatics,2016.

Cao, Ziqiang & Li, Wenjie & Li, Sujian & Wei, Furu, “ Improving Multi-Document Summarization via Text Classification”, 2016.

Upadhyay, Abhishek, Javed Khan Ghazala, Balabantaray, Rakesh Chandra, Rautray Rasmita,'Multi-document Summarization Using Deep Learning', 'Intelligent and Cloud Computing', Springer, Year 2021. DOI: https://doi.org/10.1007/978-981-15-5971-6_20

Rush, Alexander & Chopra, Sumit & Weston, Jason. 'A Neural Attention Model for Abstractive Sentence Summarization'. Comput. Sci. , Year 2015. DOI: https://doi.org/10.18653/v1/D15-1044

Yuliska and T. Sakai, 'A Comparative Study of Deep Learning Approaches for Query-Focused Extractive Multi-Document Summarization', 2019 IEEE 2nd International Conference on Information and Computer Technologies (ICICT), Year 2019 DOI: https://doi.org/10.1109/INFOCT.2019.8710851

Ren P., Z. Chen, Z. Ren, F. Wei., L. Nie., J. Ma. and M.D. Ridjke, 'Sentence Relation for Extractive Summarization with DeepNeural Network'. ACM Transaction on Information System (TOIS), 2018, DOI: https://doi.org/10.1145/3200864

Volume 36 Issue 4, Article No. 39.

D. Park, S. Kim, J. Lee, J. Choo, N. Diakopoulos and N. Elmqvist, "ConceptVector: Text Visual Analytics via Interactive Lexicon Building Using Word Embedding," in IEEE Transactions on Visualization and Computer Graphics, vol. 24, no. 1, pp. 361-370, Jan. 2018, doi: 10.1109/TVCG.2017.2744478. DOI: https://doi.org/10.1109/TVCG.2017.2744478

E. Yulianti, R. Chen, F. Scholer, W. B. Croft and M. Sanderson, "Document Summarization for Answering Non-Factoid Queries," in IEEE Transactions on Knowledge and Data Engineering, vol. 30, no. 1, pp. 15-28, 1 Jan. 2018, doi: 10.1109/TKDE.2017.2754373. DOI: https://doi.org/10.1109/TKDE.2017.2754373

Published

20-04-2022

How to Cite

Lakshmi, S. S., & Rani, M. U. (2022). Hybrid multi-document text summarization via categorization based on BERT deep learning models. International Journal of Health Sciences, 6(S1), 5346–5369. https://doi.org/10.53730/ijhs.v6nS1.6095

Issue

Section

Peer Review Articles

Most read articles by the same author(s)