Implementation of NLP based automatic text summarization using spacy
Keywords:
Empirical Methods, Text Summarization, Extraction, Abstraction, Reinforcement Learning, Supervised, Unsupervised, NLP, Spacy AlgorithmsAbstract
The amount of data on the Internet has increased exponentially over the past decade. Therefore, we need a solution that converts this massive amount of raw information into useful information that the human brain can understand. One such common technique in research that helps when dealing with large amounts of data is text summarization. Automatic summarization is a well-known approach to reduce documents to key ideas. This works by storing important information by creating a shortened version of the text. Text summaries are divided into extraction and abstraction methods. The extraction summary method minimizes the summarization burden by selecting a subset of relevant sentences from the actual text. There are many methods, but researchers specializing in natural language processing (NLP) are particularly attracted to the extraction method. The meaning of the sentence is calculated using linguistic and statistical features. In this work, extractive and abstract methods for summarizing texts were examined. This white paper uses a spacey algorithm to analyze the above methods, resulting in fewer iterations and a more focused summary.
Downloads
References
A. P. N and C. D. Guruprakash, "A Relay Node Scheme for Energy Redeemable and Network Lifespan Enhancement," 2018 4th International Conference on Applied and Theoretical Computing and Communication Technology (iCATccT), Mangalore, India, 2018, pp. 266-274.
Abbasi-ghalehtaki, R., Khotanlou, H., and Esmaeilpour, M. (2016). Fuzzy evolutionary cellular learning automata model for text summarization. Swarm and Evolutionary Computation, 30:11–26.
Abdi, A., Shamsuddin, S. M., and Aliguliyev, R. M. (2018). Qmos: Query-based multi-documents opinion-oriented summarization. Information Processing & Management, 54(2):318–338. J. Clerk Maxwell, A Treatise on Electricity and Magnetism, 3rd ed., vol. 2. Oxford: Clarendon, 1892, pp.68–73.
Achyutha Prasad, N., Guruprakash, C.D., 2019. A relay mote wheeze for energy saving and network longevity enhancement in WSN. International Journal of Recent Technology and Engineering 8, 8220–8227. doi:10.35940/ijrte.C6707.
Achyutha Prasad, N., Guruprakash, C.D., 2019. A relay node scheme of energy redeemable and network lifespan enhancement for wireless sensor networks and its analysis with standard channel models. International Journal of Innovative Technology and Exploring Engineering 8, 605–612.
Achyutha Prasad, N., Guruprakash, C.D., 2019. A two hop relay battery aware mote scheme for energy redeemable and network lifespan improvement in WSN. International Journal of Engineering and Advanced Technology 9, 4785–4791. doi:10.35940/ijeat.A2204.109119.
Achyutha, P. N., Hebbale, S., & Vani, V. (2022). Real time COVID-19 facemask detection using deep learning. International Journal of Health Sciences, 6(S4), 1446–1462. https://doi.org/10.53730/ijhs.v6nS4.6231.
Automatic text summarization using reinforcement learning with embedding features. In Proceedings of the Eighth International Joint Conference on Natural Language Processing (Volume 2: Short Papers), volume 2, pages 193–197.
Chetana Srinivas, Ambrish G, Bharathi Ganesh, Anitha Ganesh, Dhanraj, Kiran M, “Logistic Regression Technique for Prediction of Cardiovascular Disease”, International Conference on Intelligent Engineering Approach,(ICIEA) India, 12th February 2022.
Chetana Srinivas, Ambrish G, Supritha N, Bharathi G, Anitha G, “Survey on Recent Trends in Machine Learning and Deep Learning in Healthcare”, International Conference on Recent Trends in Machine Learning and Computing System,(RTMCS) India, 17th -18th December 2021.
Chetana Srinivas, Nandini Prasad K S,"A Comparative study on Medical Image Processing Using Big Data Analytics Frameworks”, 2018 Third International Conference on Electrical, Electronics, Communication, Computer Technologies and Optimization Techniques (ICEECCOT), Mysuru, India, 2018.
Chetana Srinivas, Nandini Prasad K S,” A Comparative Study on Different Types of Image Pre-processing Methods for Noise Removal”, Internal Journal of Computing, Communication & Networking (IJCCN), ISBN: 2319-2720, Vol.7, Issue 2, April 2018.
Chetana Srinivas, Nandini Prasad K. S., Mohammed Zakariah, Yousef Ajmi Alothaibi , Kamran Shaukat , B. Partibane, and Halifa Awal, “Deep Transfer Learning Approaches in Performance Analysis of Brain Tumor Classification Using MRI Images”, Hindawi Journal of Healthcare Engineering Volume 2022, Article ID 3264367, 17 pages https://doi.org/10.1155/2022/3264367.
Daum´e III, H. and Marcu, D. (2004). A tree-position kernel for document compression. In Proceedings of DUC2004.
Dr.Balakrishna R, Piyush Kumar Pareek et al, ’Data Mining for Healthy Tomorrow with the implementation of Software Project Management technique’, Springer AISC Series/ SCOUPS INDEXED JOURNAL, Paper Id : IT -187-ICPCIT2015, June 2015.
Dr.Balakrishna R, Piyush Kumar Pareek et al, ’Study on Six Sigma approach to improve the quality of process outputs in business processes in Small & Medium Level Software Firms’ Springer AISC Series/ SCOUPS INDEXED JOURNAL, Paper Id : IT -221-ICPCIT2015.
Dr.Piyush Kumar Pareek et al, ‘A survey on approaches for predicting performance of students’,International Journal of Engineering Research and Science, ISSN No.2395-6992 Paper Id:IJOER-Jun-2016-25.
Dr.Piyush Kumar Pareek et al, ‘A survey on Long term product planning and requirements prioritization to customer value creation’, International Journal of Engineering Research and Science, ISSN No.2395-6992 Paper Id: IJOER-Jun-2016-27.
Dr.Piyush Kumar Pareek et al, ‘Education Data Mining –Perspectives of Engineering Students ’, International Journal of Innovative Research in Computer Science & Technology (IJIRCST), ISSN: 2347-5552, Volume-4, Issue-5, September-2016.
Edmundson, H. P. (1969). New methods in automatic extracting. Journal of the ACM, 16(2):264–285.
Hebbale, S., Marndi, A., Achyutha, P. N., Manjula, G., Mohan, B. R., & Jagadeesh, B. N. (2022). Automated medical image classification using deep learning. International Journal of Health Sciences, 6(S5), 1650–1667. https://doi.org/10.53730/ijhs.v6nS5.9153.
Hebbale, S., Marndi, A., Manjunatha Kumar, B. H., Mohan, B. R. ., Achyutha, P. N., & Pareek, P. K. (2022). A survey on automated medical image classification using deep learning. International Journal of Health Sciences, 6(S1), 7850–7865. https://doi.org/10.53730/ijhs.v6nS1.6791.
Hovy, E. and Lin, C. Y. (1999). Automated text summarization in summarist. In Mani, I. and Maybury, M. T., editors, Advances in Automatic Text Summarization, pages 81–94. MIT Press.
Jipeng, T., Neelagar, M. B., & Rekha, V. S. (2021). Design of an embedded control scheme for control of remote appliances. Journal of Advanced Research in Instrumentation and Control Engineering, 7(3 & 4), 5-8.
Kadakadiyavar, S., Prasad, A. N., Pareek, P. K., Vani, V., Rekha, V. S., & Nirmala, G. (2022). Recognition efficiency enhancement of control chart pattern using ensemble MLP neural network. International Journal of Health Sciences, 6(S3), 4295–4306. https://doi.org/10.53730/ijhs.v6nS3.6851.
Kalshetty, J. N., Achyutha Prasad, N., Mirani, D., Kumar, H., & Dhingra, H. (2022). Heart health prediction using web application. International Journal of Health Sciences, 6(S2), 5571–5578. https://doi.org/10.53730/ijhs.v6nS2.6479.
Knight, K. and Marcu, D. (2000). Statistics-based summarization - step one: Sentence compression. In AAAI/IAAI, pages 703–710.
Kogilavani, A. and Balasubramanie, P. (2010). Clustering based optimal summary generation using genetic algorithm. In Communication and Computational Intelligence (INCOCCI), 2010 International Conference on, pages 324–329. IEEE. Lee, G. H. and Lee, K. J. (2017).
Kupiec, J., Pedersen, J., and Chen, F. (1995). A trainable document summarizer. In Proceedings SIGIR ’95, pages 68–73, New York, NY, USA.
Lebanon, G. (2006). Sequential document representations and simplicial curves. In Proceedings of the 22nd Conference on Uncertainty in Artificial Intelligence.
Luhn, H. P. (1958). The automatic creation of literature abstracts. IBM Journal of Research Development, 2(2):159–165.
Mani, I. and Bloedorn, E. (1997). Multi-document summarization by graph search and matching. In AAAI/IAAI, pages 622–628.
Manjunatha Kumar, B. H., Achyutha , P. N., Kalashetty, J. N., Rekha, V. S., & Nirmala, G. (2022). Business analysis and modelling of flight delays using artificial intelligence. International Journal of Health Sciences, 6(S1), 7897–7908. https://doi.org/10.53730/ijhs.v6nS1.6735.
Mehdi Allahyari and Krys Kochut. 2015. Automatic topic labeling using ontology-based topic models. In Machine Learning and Applications (ICMLA), 2015 IEEE 14th International Conference on. IEEE, 259–264.
Mr. Piyush Kumar Pareek, Dr. A. N. Nandakumar, Lean software development Survey on Agile and Lean usage in small and medium level firms in Bangalore, International Journal of Advanced Research in Computer Science and Software Engineering , Volume 4, Issue 12, December 2014 , ISSN: 2277 128X .pp 1-7 Impact Factor : 2.08.
Mr.Piyush Kumar Pareek, Dr. A. N. Nandakumar, ’Lean software development Survey on Benefits and challenges in Agile and Lean usage in small and medium level firms in Bangalore’ , International Journal of Advanced Research in Computer Science and Software Engineering , Volume 4, Issue 12, December 2014 , ISSN: 2277 128X .pp 1-11.
N. A. Prasad and C. D. Guruprakash, "An ephemeral investigation on energy proficiency mechanisms in WSN," 2017 3rd International Conference on Applied and Theoretical Computing and Communication Technology (iCATccT), Tumkur, 2017, pp. 180-185.
N. G and G. C. D, "Unsupervised Machine Learning Based Group Head Selection and Data Collection Technique," 2022 6th International Conference on Computing Methodologies and Communication (ICCMC), 2022, pp. 1183-1190, doi: 10.1109/ICCMC53470.2022.9753995.
Narayan, S., Cohen, S. B., and Lapata, M. (2018). Ranking sentences for extractive summarization with reinforcement learning. arXiv preprint arXiv:1802.08636.
Oufaida, H., Nouali, O., and Blache, P. (2014). Minimum redundancy and maximum relevance for single and multidocument arabic text summarization. Journal of King Saud University-Computer and Information Sciences, 26(4):450– 461.
Parveen, D., Mesgar, M., and Strube, M. (2016). Generating coherent summaries of scientific articles using coherence patterns. In Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, pages 772– 783.
Piyush Kumar Pareek & Dr. A. N. Nandakumar, ’To Implement Lean software development frame- work for minimizing waste in terms of non-value added activities’, Research Publishing, Jain University ICISTSI-15 , Innovative Partners for Publishing Solutions, Singapore (May 2015).
Piyush Kumar Pareek & Dr.A.N.Nandakumar, ’Failure Mode Effective Analysis of Requirements Phase in small software Firms’, Paper ID: ICSTM/YMCA/2015/292, International Conference on Science, Technology and Management (ICSTM-2015). International Journal of Advance Research in Science and Engineering (IJARSE, ISSN- 2319-8354, Impact Factor- 1.142) [www.ijarse.com], Special Issue Jan2015.
Piyush Kumar Pareek & Dr.A.N.Nandakumar, ’Identifying Wastes in software, International Journal of Engineering Studies and Technical Approach’. January Issue 2015.
Piyush Kumar Pareek , Dr.Praveen Gowda , et al ’Ergonomics in a Foundry in Bangalore to improve productivity’,International Journal of Engineering and Social Science , ISSN: 2249- 9482 ,Volume 2,Issue 5 (May 2012) , pp 1-6.
Piyush Kumar Pareek , Dr.Praveen Gowda, et al ’FMEA Implementation in a Foundry in Ban- galore to Improve Quality and Reliability’, International Journal of Mechanical Engineering and Robotics Research, ISSN :2278-0149,Volume 1,Issue 2(June 2012),pp 81-87.
Piyush Kumar Pareek et al, ‘Survey on Challenges in Devops ’, International Journal of Innovative Research in Computer Science & Technology (IJIRCST), ISSN: 2347-5552, Volume-4, Issue-5, September-2016.
Piyush Kumar Pareek, Dr. A. N. Nandakumar, et al ’Methodology and Functioning of Project Management Techniques in Agile Software Development Process’, International Journal of Research in IT, Management and Engineering, ISSN: 2249-1619, Volume2, Issue12 (December2012), pp 76-85.
Piyush Kumar Pareek, Dr. Vasanth Kumar S A , et al ’Reduction of Cycle Time By Implementation of a Lean Model Carried Out In a Manufacturing Industry’, International Journal of Engineering and Social Science , ISSN: 2249- 9482,Volume 2, Issue 5, pp 114-123.
Piyush Kumar Pareek, Dr.Vasanth Kumar S A , et al ’Implementation of a Lean Model for Carrying out Value Stream Mapping in a Manufacturing Industry’, International Journal of Mechanical Engineering and Robotics Research, ISSN :2278-0149,Volume 1,Issue 2(June 2012),pp 88-95.
Pooja Chopra, Vijay Suresh Gollamandala, Ahmed Najat Ahmed, S. B. G. Tilak Babu, Chamandeep Kaur, N. Achyutha Prasad, Stephen Jeswinde Nuagah, " Automated Registration of Multiangle SAR Images Using Artificial Intelligence & quot, Mobile Information Systems, vol. 2022, Article ID 4545139, 10 pages, 2022. https://doi.org/10.1155/2022/4545139.
Prasad N. Achyutha, Sushovan Chaudhury, Subhas Chandra Bose, Rajnish Kler, Jyoti Surve, Karthikeyan Kaliyaperumal, "User Classification and Stock Market-Based Recommendation Engine Based on Machine Learning and Twitter Analysis", Mathematical Problems in Engineering, vol. 2022, Article ID 4644855, 9 pages, 2022. https://doi.org/10.1155/2022/4644855.
R. V S and Siddaraju, "Defective Motes Uncovering and Retrieval for Optimized Network," 2022 6th International Conference on Computing Methodologies and Communication (ICCMC), 2022, pp. 303-313, doi: 10.1109/ICCMC53470.2022.9754109.
Rautray, R. and Balabantaray, R. C. (2017). An evolutionary framework for multi document summarization using cuckoo search approach: Mdscsa. Applied Computing and Informatics. Salton, G. and McGill, M. J. (1986). Introduction to modern information retrieval. Sanchez-Gomez, J. M., Vega-Rodr´ıguez, M. A., and Perez, ´ C. J. (2018). Extractive multi-document text summarization using a multi-objective artificial bee colony optimization approach. Knowledge-Based Systems, 159:1–8.
Rekha VS, Siddaraju., “An Ephemeral Analysis on Network Lifetime Improvement Techniques for Wireless Sensor Networks”, International Journal of Innovative Technology and Exploring Engineering, vol. 8, issue 9, 2278-3075, pp. 810–814, 2019.
Sagar, Y.S. and Achyutha Prasad, N., CHARM: A Cost-Efficient Multi-Cloud Data Hosting Scheme With High Availability, International Journal for Technological Research In Engineering, Volume 5, Issue 10, June-2018, ISSN (Online): 2347 – 4718.
Suryasa, I. W., Rodríguez-Gámez, M., & Koldoris, T. (2022). Post-pandemic health and its sustainability: Educational situation. International Journal of Health Sciences, 6(1), i-v. https://doi.org/10.53730/ijhs.v6n1.5949
Towards a unified approach to simultaneous single-document and multi-document summarizations. In Proceedings of the 23rd international conference on computational linguistics, pages 1137–1145. Association for Computational Linguistics.
Udit Shinghal, Yashwanth A V Mowdhgalya, Vaibhav Tiwari, Achyutha Prasad N "Centaur - A Self-Driving Car" International Journal of Computer Trends and Technology 68.4 (2020):129-131.
Udit Shinghal, Yashwanth A V Mowdhgalya, Vaibhav Tiwari, Achyutha Prasad N "Home Automation using HTTP and MQTT Server" International Journal of Computer Trends and Technology 68.4 (2020):126-128.
Verma, P., Pal, S., and Om, H. (2019). A comparative analysis on hindi and english extractive text summarization. ACM Transactions on Asian and Low-Resource Language Information Processing (TALLIP), 18(3):30. Wan, X. (2010).
Widyaningrum, I. ., Wibisono, N. ., & Kusumawati, A. H. . (2020). Effect of extraction method on antimicrobial activity against staphylococcus aureus of tapak liman (elephantopus scaber l.) leaves. International Journal of Health & Medical Sciences, 3(1), 105-110. https://doi.org/10.31295/ijhms.v3n1.181
Published
How to Cite
Issue
Section
Copyright (c) 2022 International journal of health sciences

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.
Articles published in the International Journal of Health Sciences (IJHS) are available under Creative Commons Attribution Non-Commercial No Derivatives Licence (CC BY-NC-ND 4.0). Authors retain copyright in their work and grant IJHS right of first publication under CC BY-NC-ND 4.0. Users have the right to read, download, copy, distribute, print, search, or link to the full texts of articles in this journal, and to use them for any other lawful purpose.
Articles published in IJHS can be copied, communicated and shared in their published form for non-commercial purposes provided full attribution is given to the author and the journal. Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgment of its initial publication in this journal.
This copyright notice applies to articles published in IJHS volumes 4 onwards. Please read about the copyright notices for previous volumes under Journal History.