Supervised Learning to Detect DDoS Attacks
By Indra Dewaji, part of assignment of Research Methodology, University of York, November 2021
Last updated
By Indra Dewaji, part of assignment of Research Methodology, University of York, November 2021
Last updated
In this report, we discuss Network Intrusion Detection Systems (NIDS) to detect DDoS attacks on backscatter darknet traffic. This report analyzes the solution proposal by E. Balkanli et.al research which proposed the solution of supervised learning approach to detect the backscatter using Bro and Corsaro open-source system and also Classification and Regression Tree (CART) and Naive Bayesian machine learning [1].
DDoS attacks are security attacks to prevent the user to access the system by disrupting the normal traffic on the targeted server [2]. One of the samples of DDoS using this method attacks was during the US presidential election. The attacker targeted the government campaign staffer’s personal email [3].
Backscatter is a side effect from the DDoS by spoofing the source packet sent to the destination. The receiver is unable to distinguish if the packet received is legitimate or not, and the receiver sent the normal response as usual [1].
The research conducted by E. Balkanli et.al was using Bro and Cosaro's open-source system. They were used as the tool to analyze malicious activities on backscatter. Bro and Cosaro categorize as NIDS to detect security threats and inspect the network traffic passively [1]. The data detection result that generated by Bro and Corsaro, then processed by the supervised machine learning CART Decision Tree and Naïve Bayes.
The research expectation is to see the effectiveness of CART Decision Tree and Naïve Bayes method to detect DDoS attack at backscatter traffic with small training without using IP address and port number [1].
The data used in the research conducted by E.Balkanli taken from publicly backscatter CAIDA data since 2008 [1]. These data used for the training dataset in supervised machine learning. Machine learning is the way of the machine to learn based on the past data to solve the problem. Supervised learning, labelled datasets to create algorithm model for prediction [5]. The labelled datasets differentiate based on the defined classification.
Let’s we have set A = {A1, A2, ... , Am} where m is the size of length set A, and classification set C = {C1, C2, … , Cn} where n is the size of length set C. Given D is a data set to relate between set A and set C. Machine learning objective is to relate between a set classify into classification and predict the relation between them [5]. To get the accurate prediction, the number of datasets to train the machine is a critical factor. More records will give better data prediction [1].
CART represent in binary tree model. The node represents of single input variable.
Figure 1. CART Sample Illustration [6]
Binary tree used to split the classification and reduce the computing time and any irrelevant attributes, will not impact in the prediction [1].
Refer to the research of E.Balkani evaluation result between the use of Bro and Corsaro, the processing time of Bro is high because the log that generated were very detail and there’s no specific rule to detect DDoS attack. The Corsaro had performed better, it can detect 99% accuracy of the backscatter traffic. The performance of decision tree also better compare to Naïve Bayes. [1].
[1] E. Balkanli, J. Alves and A. N. Zincir-Heywood, "Supervised learning to detect DDoS attacks," 2014 IEEE Symposium on Computational Intelligence in Cyber Security (CICS), 2014, pp. 1-8, doi: 10.1109/CICYBS.2014.7013367.
[2] E. Balkanli and A. N. Zincir-Heywood, "On the analysis of backscatter traffic," 39th Annual IEEE Conference on Local Computer Networks Workshops, 2014, pp. 671-678, doi: 10.1109/LCNW.2014.6927719.
[3] Shane Huntley. “How we’re tackling evolving online threats”. Threat Analysis Group. Oct 16, 2020. Google https://blog.google/threat-analysis-group/how-were-tackling-evolving-online-threats/
[4] N. Furutani, T. Ban, J. Nakazato, J. Shimamura, J. Kitazono and S. Ozawa, "Detection of DDoS Backscatter Based on Traffic Features of Darknet TCP Packets," 2014 Ninth Asia Joint Conference on Information Security, 2014, pp. 39-43, doi: 10.1109/AsiaJCIS.2014.23.
[5] Iqbal, Muhammad & Yan, Zhu. (2015). “Supervised Machine Learning Approaches: A Survey”. International Journal of Soft Computing. 5. 946-952. 10.21917/ijsc.2015.0133.
[6] Jason Brownlee. “Classification and Regression Trees for Machine Learning”. April 8, 2016. https://machinelearningmastery.com/classification-and-regression-trees-for-machine-learning/