745 research outputs found
Unsupervised Intrusion Detection with Cross-Domain Artificial Intelligence Methods
Cybercrime is a major concern for corporations, business owners, governments and citizens, and it continues to grow in spite of increasing investments in security and fraud prevention. The main challenges in this research field are: being able to detect unknown attacks, and reducing the false positive ratio. The aim of this research work was to target both problems by leveraging four artificial intelligence techniques.
The first technique is a novel unsupervised learning method based on skip-gram modeling. It was designed, developed and tested against a public dataset with popular intrusion patterns. A high accuracy and a low false positive rate were achieved without prior knowledge of attack patterns.
The second technique is a novel unsupervised learning method based on topic modeling. It was applied to three related domains (network attacks, payments fraud, IoT malware traffic). A high accuracy was achieved in the three scenarios, even though the malicious activity significantly differs from one domain to the other.
The third technique is a novel unsupervised learning method based on deep autoencoders, with feature selection performed by a supervised method, random forest. Obtained results showed that this technique can outperform other similar techniques.
The fourth technique is based on an MLP neural network, and is applied to alert reduction in fraud prevention. This method automates manual reviews previously done by human experts, without significantly impacting accuracy
Learning Fast and Slow: PROPEDEUTICA for Real-time Malware Detection
In this paper, we introduce and evaluate PROPEDEUTICA, a novel methodology
and framework for efficient and effective real-time malware detection,
leveraging the best of conventional machine learning (ML) and deep learning
(DL) algorithms. In PROPEDEUTICA, all software processes in the system start
execution subjected to a conventional ML detector for fast classification. If a
piece of software receives a borderline classification, it is subjected to
further analysis via more performance expensive and more accurate DL methods,
via our newly proposed DL algorithm DEEPMALWARE. Further, we introduce delays
to the execution of software subjected to deep learning analysis as a way to
"buy time" for DL analysis and to rate-limit the impact of possible malware in
the system. We evaluated PROPEDEUTICA with a set of 9,115 malware samples and
877 commonly used benign software samples from various categories for the
Windows OS. Our results show that the false positive rate for conventional ML
methods can reach 20%, and for modern DL methods it is usually below 6%.
However, the classification time for DL can be 100X longer than conventional ML
methods. PROPEDEUTICA improved the detection F1-score from 77.54% (conventional
ML method) to 90.25%, and reduced the detection time by 54.86%. Further, the
percentage of software subjected to DL analysis was approximately 40% on
average. Further, the application of delays in software subjected to ML reduced
the detection time by approximately 10%. Finally, we found and discussed a
discrepancy between the detection accuracy offline (analysis after all traces
are collected) and on-the-fly (analysis in tandem with trace collection). Our
insights show that conventional ML and modern DL-based malware detectors in
isolation cannot meet the needs of efficient and effective malware detection:
high accuracy, low false positive rate, and short classification time.Comment: 17 pages, 7 figure
Trustworthiness in Social Big Data Incorporating Semantic Analysis, Machine Learning and Distributed Data Processing
This thesis presents several state-of-the-art approaches constructed for the purpose of (i) studying the trustworthiness of users in Online Social Network platforms, (ii) deriving concealed knowledge from their textual content, and (iii) classifying and predicting the domain knowledge of users and their content. The developed approaches are refined through proof-of-concept experiments, several benchmark comparisons, and appropriate and rigorous evaluation metrics to verify and validate their effectiveness and efficiency, and hence, those of the applied frameworks
NLP Methods in Host-based Intrusion Detection Systems: A Systematic Review and Future Directions
Host based Intrusion Detection System (HIDS) is an effective last line of
defense for defending against cyber security attacks after perimeter defenses
(e.g., Network based Intrusion Detection System and Firewall) have failed or
been bypassed. HIDS is widely adopted in the industry as HIDS is ranked among
the top two most used security tools by Security Operation Centers (SOC) of
organizations. Although effective and efficient HIDS is highly desirable for
industrial organizations, the evolution of increasingly complex attack patterns
causes several challenges resulting in performance degradation of HIDS (e.g.,
high false alert rate creating alert fatigue for SOC staff). Since Natural
Language Processing (NLP) methods are better suited for identifying complex
attack patterns, an increasing number of HIDS are leveraging the advances in
NLP that have shown effective and efficient performance in precisely detecting
low footprint, zero day attacks and predicting the next steps of attackers.
This active research trend of using NLP in HIDS demands a synthesized and
comprehensive body of knowledge of NLP based HIDS. Thus, we conducted a
systematic review of the literature on the end to end pipeline of the use of
NLP in HIDS development. For the end to end NLP based HIDS development
pipeline, we identify, taxonomically categorize and systematically compare the
state of the art of NLP methods usage in HIDS, attacks detected by these NLP
methods, datasets and evaluation metrics which are used to evaluate the NLP
based HIDS. We highlight the relevant prevalent practices, considerations,
advantages and limitations to support the HIDS developers. We also outline the
future research directions for the NLP based HIDS development
- …