In the last six months of 2018, there were over 181 million ransomware attacks, marking a 229% increase from the same time frame in 2017. While most corporations are making progress in finding elegant solutions to counteract malware attacks, university networks remain vulnerable targets due to a lack of appropriate security mechanisms. In 2017, atleast 5 universities in the United States were victims of ransomware attacks.
Constructing intrusion detection systems using machine learning is an area that has enticed researchers for a long time. Data acquisition is one of the biggest challenges in building such a system. Data on malware attacks is generally not made public by organizations due to privacy concerns. As a result, many of the cybersecurity solutions that have been built thus far are based on attack simulations generated by cybersecurity experts.
In our project, we are utilizing the connection logs acquired from the cybersecurity team at the University of Virginia. In order to label the connection logs, we collect data on malicious attacks using low interaction honeypots, systems designed to appear as vulnerable targets in order to lure attackers, on to sub networks at the University of Virginia.
One issue that plagues intrusion detection systems is class imbalance which occurs due to the overwhelming majority of benign traffic that exists across networks. In the field of cybersecurity, the cost of misclassifying minority class samples, false positives, is much higher the cost of misclassifying majority class samples. We remedy the issue of class imbalance using SVM-SMOTE (Support Vector Machines - Synthetic Minority Over Sampling Technique) to create synthetic samples which will balance the class labels in the dataset. SVM-SMOTE is an oversampling technique that generates new minority class instances near borderlines with SVMs so as to help establish boundaries between different classes.
Our hypothesis is that the approach of detecting malicious activity using honeypots, which has been proven to work on partially simulated datasets, will effectively predict malicious attacks on active network data. In our project, we label the university network connections as ‘benign’ or ‘malicious’ using data captured by the honeypots. We treat the labelled data with SVM-SMOTE to resolve class imbalance and build a machine learning classifier. We use deep learning and ensemble learning on the dataset to further improve performance of the model. The proposed system would improve the overall accuracy of detecting malicious activity and minimize false positives.
You need this ticket from Eventbrite to sign up:
Applied Machine Learning Conference.