Invasive weed optimization with stacked long short term memory for PDF malware detection and classification

https://doi.org/10.53730/ijhs.v6nS5.9540

Authors

  • P. Pandi Chandran Department of Computer and Information Science, Faculty of Science, Annamalai University, Chidambaram, 608002, India
  • Hema Rajini. N Department of CSE, Alagappa Chettiar Government College of Engineering and Technology, Karaikudi, 630003, India
  • M. Jeyakarthic Department of Computer and Information Science, Annamalai University, Chidambaram, 608002, India

Keywords:

PDFs, Malware detection, Outlier removal, Machine learning, Deep learning, Invasive Weed optimization

Abstract

Due to high versatility and widespread adoption, PDF documents are widely exploited for launching attacks by cyber criminals. PDFs have been conventionally utilized as an effective method for spreading malware. Automated detection and classification of PDF malware are essential to accomplish security. Latest developments of artificial intelligence (AI) and deep learning (DL) models pave a way for automated detection of PDF malware. In this view, this article develops an Invasive Weed Optimization with Stacked Long Short Term Memory (IWO-S-LSTM) technique for PDF malware detection and classification. The presented IWO-S-LSTM model focuses on the recognition and classification of different kinds of malware that exist in PDF documents. The proposed IWO-S-LSTM model initially undergoes pre-processing in two stages namely categorical encoding and null value removal. Besides, autoencoder (AE) based outlier detection approach is presented to remove the existence of outliers. In addition, S-LSTM model is utilized to detect and classify PDF malware. Finally, IWO algorithm is applied to fine tune the hyperparameters involved in the S-LSTM model. To determine the enhanced outcomes of the IWO-S-LSTM model, a series of simulations were executed on two benchmark datasets. The experimental outcomes outperformed the promising performance of the IWO-S-LSTM technique on the other approaches.

Downloads

Download data is not yet available.

References

Rudra, B., 2021. Study of a hybrid approach towards malware detection in executable files. SN Computer Science, 2(4), pp.1-7.

Rathore, H., Agarwal, S., Sahay, S.K. and Sewak, M., 2018, December. Malware detection using machine learning and deep learning. In International Conference on Big Data Analytics (pp. 402-411). Springer, Cham.

Mercaldo, F. and Santone, A., 2020. Deep learning for image-based mobile malware detection. Journal of Computer Virology and Hacking Techniques, 16(2), pp.157-171.

Yuxin, D. and Siyi, Z., 2019. Malware detection based on deep learning algorithm. Neural Computing and Applications, 31(2), pp.461-472.

Yen, Y.S. and Sun, H.M., 2019. An Android mutation malware detection based on deep learning using visualization of importance from codes. Microelectronics Reliability, 93, pp.109-114.

Singh, P., Tapaswi, S. and Gupta, S., 2020. Malware detection in pdf and office documents: A survey. Information Security Journal: A Global Perspective, 29(3), pp.134-153.

Cuan, B., Damien, A., Delaplace, C. and Valois, M., 2018, July. Malware detection in pdf files using machine learning. In SECRYPT 2018-15th International Conference on Security and Cryptography (p. 8p).

Tian, D., Ying, Q., Jia, X., Ma, R., Hu, C. and Liu, W., 2021. MDCHD: A novel malware detection method in cloud using hardware trace and deep learning. Computer Networks, 198, p.108394.

Iadarola, G., Martinelli, F., Mercaldo, F. and Santone, A., 2021. Towards an interpretable deep learning model for mobile malware detection and family identification. Computers & Security, 105, p.102198.

Mohammed, T.M., Nataraj, L., Chikkagoudar, S., Chandrasekaran, S. and Manjunath, B.S., 2021, November. HAPSSA: Holistic Approach to PDF malware detection using Signal and Statistical Analysis. In MILCOM 2021-2021 IEEE Military Communications Conference (MILCOM) (pp. 709-714). IEEE.

Yoo, S., Kim, S., Kim, S. and Kang, B.B., 2021. AI-HydRa: Advanced hybrid approach using random forest and deep learning for malware classification. Information Sciences, 546, pp.420-435.

Li, Y., Wang, X., Shi, Z., Zhang, R., Xue, J. and Wang, Z., 2021. Boosting training for PDF malware classifier via active learning. International Journal of Intelligent Systems.

Gandamayu, I. B. M., Antari, N. W. S., & Strisanti, I. A. S. (2022). The level of community compliance in implementing health protocols to prevent the spread of COVID-19. International Journal of Health & Medical Sciences, 5(2), 177-182. https://doi.org/10.21744/ijhms.v5n2.1897

Jeong, Y.S., Woo, J. and Kang, A.R., 2019. Malware detection on byte streams of pdf files using convolutional neural networks. Security and Communication Networks, 2019.

Rinartha, K., & Suryasa, W. (2017). Comparative study for better result on query suggestion of article searching with MySQL pattern matching and Jaccard similarity. In 2017 5th International Conference on Cyber and IT Service Management (CITSM) (pp. 1-4). IEEE.

Zhang, J., 2018. MLPdf: an effective machine learning based approach for PDF malware detection. arXiv preprint arXiv:1808.06991.

Ye, Y., Chen, L., Hou, S., Hardy, W. and Li, X., 2018. DeepAM: a heterogeneous deep learning framework for intelligent malware detection. Knowledge and Information Systems, 54(2), pp.265-285.

Liu, C.Y., Chiu, M.Y., Huang, Q.X. and Sun, H.M., 2021, July. PDF Malware Detection Using Visualization and Machine Learning. In IFIP Annual Conference on Data and Applications Security and Privacy (pp. 209-220). Springer, Cham.

Cozzolino, D. and Verdoliva, L., 2016, December. Single-image splicing localization through autoencoder-based anomaly detection. In 2016 IEEE International workshop on information forensics and security (WIFS) (pp. 1-6). IEEE.

Yu, L., Qu, J., Gao, F. and Tian, Y., 2019. A novel hierarchical algorithm for bearing fault diagnosis based on stacked LSTM. Shock and Vibration, 2019.

Sedighy, S.H., Mallahzadeh, A.R., Soleimani, M. and Rashed-Mohassel, J., 2010. Optimization of printed Yagi antenna using invasive weed optimization (IWO). IEEE Antennas and Wireless Propagation Letters, 9, pp.1275-1278.

https://www.unb.ca/cic/datasets/pdfmal-2022.html

https://github.com/srndic/mimicus/tree/master/data

Damaševiˇcius, R.; Venˇckauskas, A.; Toldinas, J.; Grigaliunas, Š. Ensemble-Based ¯ Classification Using Neural Networks and Machine Learning Models for Windows PE Malware Detection. Electronics 2021, 10, 485. https:// doi.org/10.3390/electronics10040485.

Published

23-06-2022

How to Cite

Chandran, P. P., Hema, R. N., & Jeyakarthic, M. (2022). Invasive weed optimization with stacked long short term memory for PDF malware detection and classification. International Journal of Health Sciences, 6(S5), 4187–4204. https://doi.org/10.53730/ijhs.v6nS5.9540

Issue

Section

Peer Review Articles