Malware Detection using Deep Neural Networks on Imbalanced Data

  • Mohammed Abdulkreem Mohammed
  • Drai Ahmed Smait
  • Mustafa Al-Tahai
  • Israa S. Kamil
  • Kadhum Al-Majdi
  • Shahad K. Khaleel
Keywords: Tomek Links, SMOTE, convolutional neural networks, imbalanced data, Malware detection

Abstract

Through the use of malware, particularly JavaScript, cybercriminals have turned online applications into one of their main targets for impersonation. Detection of such dangerous code in real-time, therefore, becomes crucial in order to prevent any harmful action. By categorizing the salient characteristics of the malicious code, this study suggests an effective technique for identifying malicious Java scripts that were previously unknown, employing an interceptor on the client side. By employing the wrapper approach for dimensionality reduction, a feature subset was generated. In this paper, we propose an approach for handling the malware detection task in imbalanced data situations. Our approach utilizes two main imbalanced solutions namely, Synthetic Minority Over Sampling Technique (SMOTE) and Tomek Links with the object of augmenting the data and then applying a Deep Neural Network (DNN) for classifying the scripts. The conducted experiments demonstrate the efficient performance of our approach and it achieves an accuracy of 94.00%. 

References

[1] Rai, Mahima, and Hardwari Mandoria. "A study on cyber crimes cyber criminals and major security breaches." Int. Res. J. Eng. Technol, vol.6, no. 7, pp.1-8, 2019.
[2] Gursoy, Mehmet Emre, Acar Tamersoy, Stacey Truex, Wenqi Wei, and Ling Liu. "Secure and utility-aware data collection with condensed local differential privacy." IEEE Transactions on Dependable and Secure Computing, vol. 18, no. 5, pp. 2365-2378, 2019.
[3] Fang, Yong, Cheng Huang, Yu Su, and Yaoyao Qiu. "Detecting malicious JavaScript code based on semantic analysis." Computers & Security, vol. 93, p. 101764, 2020.
[4] Rodríguez, Germán E., Jenny G. Torres, Pamela Flores, and Diego E. Benavides. "Cross-site scripting (XSS) attacks and mitigation: A survey." Computer Networks, vol. 166, p. 106960, 2020.
[5] Tariq, Iram, Muddassar Azam Sindhu, Rabeeh Ayaz Abbasi, Akmal Saeed Khattak, Onaiza Maqbool, and Ghazanfar Farooq Siddiqui. "Resolving cross-site scripting attacks through genetic algorithm and reinforcement learning." Expert Systems with Applications vol. 168, p. 114386, 2021.
[6] Mokbal, Fawaz Mahiuob Mohammed, Wang Dan, Wang Xiaoxi, Zhao Wenbin, and Fu Lihua. "XGBXSS: an extreme gradient boosting detection framework for cross-site scripting attacks based on hybrid feature selection approach and parameters optimization." Journal of Information Security and Applications, vol. 58, p. 102813, 2021.
[7] Odun-Ayo, Isaac, Williams Toro-Abasi, Marion Adebiyi, and Oladapo Alagbe. "An implementation of real-time detection of cross-site scripting attacks on cloud-based web applications using deep learning." Bulletin of Electrical Engineering and Informatics, vol. 10, no. 5, pp. 2442-2453, 2021.
[8] Usha, G., S. Kannimuthu, P. D. Mahendiran, Anusha Kadambari Shanker, and Deepti Venugopal. "Static analysis method for detecting cross site scripting vulnerabilities." International Journal of Information and Computer Security, vol. 13, no. 1, pp. 32-47, 2020.
[9] Kadhim, R., and M. Gaata. "A hybrid of CNN and LSTM methods for securing web application against cross-site scripting attack." Indones. J. Electr. Eng. Comput. Sci, vol. 21, pp. 1022-1029, 2020.
[10] Alsaffar, Mohammad, Saud Aljaloud, Badiea Abdulkarem Mohammed, Zeyad Ghaleb Al-Mekhlafi, Tariq S. Almurayziq, Gharbi Alshammari, and Abdullah Alshammari. "Detection of Web Cross-Site Scripting (XSS) Attacks." Electronics, vol. 11, no. 14, p. 2212, 2022.
[11] Nassif, Ali Bou, Ismail Shahin, Imtinan Attili, Mohammad Azzeh, and Khaled Shaalan. "Speech recognition using deep neural networks: A systematic review." IEEE access, vol. 7, pp. 19143-19165, 2019.
[12] Obaid, Kavi B., Subhi Zeebaree, and Omar M. Ahmed. "Deep learning models based on image classification: a review." International Journal of Science and Business, vol. 4, no. 11, pp. 75-81, 2020.
[13] Karimi, Davood, Haoran Dou, Simon K. Warfield, and Ali Gholipour. "Deep learning with noisy labels: Exploring techniques and remedies in medical image analysis." Medical Image Analysis, vol. 65, p. 101759, 2020.
[14] Nourani, Vahid, Zahra Razzaghzadeh, Aida Hosseini Baghanam, and Amir Molajou. "ANN-based statistical downscaling of climatic parameters using decision tree predictor screening method." Theoretical and Applied Climatology, vol. 137, no. 3, pp. 1729-1746, 2019.
[15] Sharghi, Elnaz, Vahid Nourani, Hessam Najafi, and Amir Molajou. "Emotional ANN (EANN) and wavelet-ANN (WANN) approaches for Markovian and seasonal based modeling of rainfall-runoff process." Water resources management, vol. 32, no. 10, pp. 3441-3456, 2018.
[16] Nourani, Vahid, Amir Molajou, Selin Uzelaltinbulat, and Fahreddin Sadikoglu. "Emotional artificial neural networks (EANNs) for multi-step ahead prediction of monthly precipitation; case study: northern Cyprus." Theoretical and Applied Climatology, vol. 138, no. 3, pp. 1419-1434, 2019.
[17] Jia, Sen, Shuguo Jiang, Zhijie Lin, Nanying Li, Meng Xu, and Shiqi Yu. "A survey: Deep learning for hyperspectral image classification with few labeled samples." Neurocomputing, vol. 448, pp. 179-204, 2021.
[18] Kaur, Harsurinder, Husanbir Singh Pannu, and Avleen Kaur Malhi. "A systematic review on imbalanced data challenges in machine learning: Applications and solutions." ACM Computing Surveys (CSUR), vol. 52, no. 4, pp. 1-36, 2019.
[19] Zhang, Wei, Xiang Li, Xiao-Dong Jia, Hui Ma, Zhong Luo, and Xu Li. "Machinery fault diagnosis with imbalanced data using deep generative adversarial networks." Measurement, vol. 152, p. 107377, 2020.
[20] Ishaq, Abid, Saima Sadiq, Muhammad Umer, Saleem Ullah, Seyedali Mirjalili, Vaibhav Rupapara, and Michele Nappi. "Improving the prediction of heart failure patients’ survival using SMOTE and effective data mining techniques." IEEE access, vol. 9, pp. 39707-39716, 2021.
[21] Swana, Elsie Fezeka, Wesley Doorsamy, and Pitshou Bokoro. "Tomek Link and SMOTE Approaches for Machine Fault Classification with an Imbalanced Dataset." Sensors, vol. 22, no. 9, pp. 3246, 2022.
[22] Farahani, Farzad V., Waldemar Karwowski, and Nichole R. Lighthall. "Application of graph theory for identifying connectivity patterns in human brain networks: a systematic review." frontiers in Neuroscience, vol. 13, p. 585, 2019.
[23] Khan, Asifullah, Anabia Sohail, Umme Zahoora, and Aqsa Saeed Qureshi. "A survey of the recent architectures of deep convolutional neural networks." Artificial intelligence review, vol. 53, no. 8, pp. 5455-5516, 2020.
[24] Kattenborn, Teja, Jens Leitloff, Felix Schiefer, and Stefan Hinz. "Review on Convolutional Neural Networks (CNN) in vegetation remote sensing." ISPRS Journal of Photogrammetry and Remote Sensing, vol. 173, pp. 24-49, 2021.
[25] Capra, Maurizio, Beatrice Bussolino, Alberto Marchisio, Muhammad Shafique, Guido Masera, and Maurizio Martina. "An updated survey of efficient hardware architectures for accelerating deep convolutional neural networks." Future Internet, vol. 12, no. 7, p. 113, 2020.
[26] Boulent, Justine, Samuel Foucher, Jérôme Théau, and Pierre-Luc St-Charles. "Convolutional neural networks for the automatic identification of plant diseases." Frontiers in plant science, vol. 10, p. 941, 2019.
[27] Véstias, Mário P. "A survey of convolutional neural networks on edge with reconfigurable computing." Algorithms, vol. 12, no. 8, p. 154, 2019.
[28] Jiao, Jinyang, Ming Zhao, Jing Lin, and Kaixuan Liang. "A comprehensive review on convolutional neural network in machine fault diagnosis." Neurocomputing, vol. 417, pp. 36-63, 2020.
[29] Ribani, Ricardo, and Mauricio Marengoni. "A survey of transfer learning for convolutional neural networks." In 2019 32nd SIBGRAPI conference on graphics, patterns and images tutorials (SIBGRAPI-T), pp. 47-57. IEEE, 2019.
[30] Tulbure, Andrei-Alexandru, Adrian-Alexandru Tulbure, and Eva-Henrietta Dulf. "A review on modern defect detection models using DCNNs–Deep convolutional neural networks." Journal of Advanced Research, vol. 35, 33-48, 2022.
[31] Kovács, György. "Smote-variants: A python implementation of 85 minority oversampling techniques." Neurocomputing, vol. 366, pp. 352-354, 2019.
[32] Zhang, Hongpo, Lulu Huang, Chase Q. Wu, and Zhanbo Li. "An effective convolutional neural network based on SMOTE and Gaussian mixture model for intrusion detection in imbalanced dataset." Computer Networks, vol. 177, p. 107315, 2020.
[33] Raghuwanshi, Bhagat Singh, and Sanyam Shukla. "SMOTE based class-specific extreme learning machine for imbalanced learning." Knowledge-Based Systems vol. 187, p. 104814, 2020.
[34] Kumar, Sujit, Saroj Kr Biswas, and Debashree Devi. "TLUSBoost algorithm: a boosting solution for class imbalance problem." Soft Computing, vol. 23, no. 21, pp. 10755-10767, 2019.
[35] Devi, Debashree, and Biswajit Purkayastha. "Redundancy-driven modified Tomek-link based undersampling: A solution to class imbalance." Pattern Recognition Letters, vol. 93, pp. 3-12, 2017.
Published
2022-10-02
How to Cite
Mohammed, M. A., Smait, D. A., Al-Tahai, M., Kamil, I. S., Al-Majdi, K., & Khaleel, S. K. (2022). Malware Detection using Deep Neural Networks on Imbalanced Data. Majlesi Journal of Electrical Engineering. Retrieved from http://mjee.iaumajlesi.ac.ir/index/index.php/ee/article/view/4890
Section
Articles

Most read articles by the same author(s)