2,904 research outputs found
A survey on the application of deep learning for code injection detection
Abstract Code injection is one of the top cyber security attack vectors in the modern world. To overcome the limitations of conventional signature-based detection techniques, and to complement them when appropriate, multiple machine learning approaches have been proposed. While analysing these approaches, the surveys focus predominantly on the general intrusion detection, which can be further applied to specific vulnerabilities. In addition, among the machine learning steps, data preprocessing, being highly critical in the data analysis process, appears to be the least researched in the context of Network Intrusion Detection, namely in code injection. The goal of this survey is to fill in the gap through analysing and classifying the existing machine learning techniques applied to the code injection attack detection, with special attention to Deep Learning. Our analysis reveals that the way the input data is preprocessed considerably impacts the performance and attack detection rate. The proposed full preprocessing cycle demonstrates how various machine-learning-based approaches for detection of code injection attacks take advantage of different input data preprocessing techniques. The most used machine learning methods and preprocessing stages have been also identified
Semi-WTC: A Practical Semi-supervised Framework for Attack Categorization through Weight-Task Consistency
Supervised learning has been widely used for attack categorization, requiring
high-quality data and labels. However, the data is often imbalanced and it is
difficult to obtain sufficient annotations. Moreover, supervised models are
subject to real-world deployment issues, such as defending against unseen
artificial attacks. To tackle the challenges, we propose a semi-supervised
fine-grained attack categorization framework consisting of an encoder and a
two-branch structure and this framework can be generalized to different
supervised models. The multilayer perceptron with residual connection is used
as the encoder to extract features and reduce the complexity. The Recurrent
Prototype Module (RPM) is proposed to train the encoder effectively in a
semi-supervised manner. To alleviate the data imbalance problem, we introduce
the Weight-Task Consistency (WTC) into the iterative process of RPM by
assigning larger weights to classes with fewer samples in the loss function. In
addition, to cope with new attacks in real-world deployment, we propose an
Active Adaption Resampling (AAR) method, which can better discover the
distribution of unseen sample data and adapt the parameters of encoder.
Experimental results show that our model outperforms the state-of-the-art
semi-supervised attack detection methods with a 3% improvement in
classification accuracy and a 90% reduction in training time.Comment: Tech repor
Dynamic Analysis of Executables to Detect and Characterize Malware
It is needed to ensure the integrity of systems that process sensitive
information and control many aspects of everyday life. We examine the use of
machine learning algorithms to detect malware using the system calls generated
by executables-alleviating attempts at obfuscation as the behavior is monitored
rather than the bytes of an executable. We examine several machine learning
techniques for detecting malware including random forests, deep learning
techniques, and liquid state machines. The experiments examine the effects of
concept drift on each algorithm to understand how well the algorithms
generalize to novel malware samples by testing them on data that was collected
after the training data. The results suggest that each of the examined machine
learning algorithms is a viable solution to detect malware-achieving between
90% and 95% class-averaged accuracy (CAA). In real-world scenarios, the
performance evaluation on an operational network may not match the performance
achieved in training. Namely, the CAA may be about the same, but the values for
precision and recall over the malware can change significantly. We structure
experiments to highlight these caveats and offer insights into expected
performance in operational environments. In addition, we use the induced models
to gain a better understanding about what differentiates the malware samples
from the goodware, which can further be used as a forensics tool to understand
what the malware (or goodware) was doing to provide directions for
investigation and remediation.Comment: 9 pages, 6 Tables, 4 Figure
- …