51 research outputs found

    Malicious PDF detection Based on Machine Learning with Enhanced Feature Set

    Get PDF
    PDF is one of the most popular document file formats due to its flexibility, platform independence and ability to embed different types of content. Over the years, PDF has become a popular attack vector for spreading malware and compromising computer systems. Existing signature-based defense systems have extremely high recall rates, but quickly become obsolete and ineffective against zero-day attacks, which makes them easy to circumvent by malicious PDF files. Recently, Machine Learning (ML) has emerged as a viable tool to improve discovery of previously unseen attacks. Hence, in this paper we present enhanced ML-based models for the detection of malicious PDF documents. We develop an approach for ML-based detection with static features derived from PDF documents leveraging existing tools and propose new, previously unused features to enhance the performance of the ML-based classifiers. Our investigative study is conducted on the recently published Evasive-PDFMal2022 dataset, which was used to evaluate seven ML classifiers based on our proposed method. The EvasivePDFMal2022 dataset consists of 4,468 benign samples and 5,557 malicious PDF samples. The results of the experiments show that our proposed approach with the enhanced features enabled improved accuracies in five out of seven of the classifiers that were evaluated. The results demonstrate the potential of the new features to increase the robustness of feature-based PDF malware detection

    Cyber Attack Surface Mapping For Offensive Security Testing

    Get PDF
    Security testing consists of automated processes, like Dynamic Application Security Testing (DAST) and Static Application Security Testing (SAST), as well as manual offensive security testing, like Penetration Testing and Red Teaming. This nonautomated testing is frequently time-constrained and difficult to scale. Previous literature suggests that most research is spent in support of improving fully automated processes or in finding specific vulnerabilities, with little time spent improving the interpretation of the scanned attack surface critical to nonautomated testing. In this work, agglomerative hierarchical clustering is used to compress the Internet-facing hosts of 13 representative companies as collected by the Shodan search engine, resulting in an average 89% reduction in attack surface complexity. The work is then extended to map network services and also analyze the characteristics of the Log4Shell security vulnerability and its impact on attack surface mapping. The results highlighted outliers indicative of possible anti-patterns as well as opportunities to improve how testers and tools map the web attack surface. Ultimately the work is extended to compress web attack surfaces based on security relevant features, demonstrating via accuracy measurements not only that this compression is feasible but can also be automated. In the process a framework is created which could be extended in future work to compress other attack surfaces, including physical structures/campuses for physical security testing and even humans for social engineering tests

    Cyberthreats, Attacks and Intrusion Detection in Supervisory Control and Data Acquisition Networks

    Get PDF
    Supervisory Control and Data Acquisition (SCADA) systems are computer-based process control systems that interconnect and monitor remote physical processes. There have been many real world documented incidents and cyber-attacks affecting SCADA systems, which clearly illustrate critical infrastructure vulnerabilities. These reported incidents demonstrate that cyber-attacks against SCADA systems might produce a variety of financial damage and harmful events to humans and their environment. This dissertation documents four contributions towards increased security for SCADA systems. First, a set of cyber-attacks was developed. Second, each attack was executed against two fully functional SCADA systems in a laboratory environment; a gas pipeline and a water storage tank. Third, signature based intrusion detection system rules were developed and tested which can be used to generate alerts when the aforementioned attacks are executed against a SCADA system. Fourth, a set of features was developed for a decision tree based anomaly based intrusion detection system. The features were tested using the datasets developed for this work. This dissertation documents cyber-attacks on both serial based and Ethernet based SCADA networks. Four categories of attacks against SCADA systems are discussed: reconnaissance, malicious response injection, malicious command injection and denial of service. In order to evaluate performance of data mining and machine learning algorithms for intrusion detection systems in SCADA systems, a network dataset to be used for benchmarking intrusion detection systemswas generated. This network dataset includes different classes of attacks that simulate different attack scenarios on process control systems. This dissertation describes four SCADA network intrusion detection datasets; a full and abbreviated dataset for both the gas pipeline and water storage tank systems. Each feature in the dataset is captured from network flow records. This dataset groups two different categories of features that can be used as input to an intrusion detection system. First, network traffic features describe the communication patterns in a SCADA system. This research developed both signature based IDS and anomaly based IDS for the gas pipeline and water storage tank serial based SCADA systems. The performance of both types of IDS were evaluates by measuring detection rate and the prevalence of false positives

    Federated learning for distributed intrusion detection systems in public networks

    Get PDF
    Abstract. The rapid integration of technologies such as IoT devices, cloud, and edge computing has led to a progressively interconnected network of intelligent environments, services, and public infrastructures. This evolution highlights the critical need for sophisticated and self-governing Intrusion Detection Systems (IDS) to enhance trust and ensure the security and integrity of these interconnected environments. Furthermore, the advancement of AI-based Intrusion Detection Systems hinges on the effective utilization of high-quality data for model training. A considerable number of datasets created in controlled lab environments have recently been released, which has significantly facilitated researchers in developing and evaluating resilient Machine Learning models. However, a substantial portion of the architectures and datasets available are now considered outdated. As a result, the principal aim of this thesis is to contribute to the enhancement of knowledge concerning the creation of contemporary testbed architectures specifically designed for defense systems. The main objective of this study is to propose an innovative testbed infrastructure design, capitalizing on the broad connectivity panOULU public network, to facilitate the analysis and evaluation of AI-based security applications within a public network setting. The testbed incorporates a variety of distributed computing paradigms including edge, fog, and cloud computing. It simplifies the adoption of technologies like Software-Defined Networking, Network Function Virtualization, and Service Orchestration by leveraging the capabilities of the VMware vSphere platform. In the learning phase, a custom-developed application uses information from the attackers to automatically classify incoming data as either normal or malicious. This labeled data is then used for training machine learning models within a federated learning framework (FED-ML). The trained models are validated using previously unseen network data (test data). The entire procedure, from collecting network traffic to labeling data, and from training models within the federated architecture, operates autonomously, removing the necessity for human involvement. The development and implementation of FED-ML models in this thesis may contribute towards laying the groundwork for future-forward, AI-oriented cybersecurity measures. The dataset and testbed configuration showcased in this research could improve our understanding of the challenges associated with safeguarding public networks, especially those with heterogeneous environments comprising various technologies

    Cyber Security and Critical Infrastructures 2nd Volume

    Get PDF
    The second volume of the book contains the manuscripts that were accepted for publication in the MDPI Special Topic "Cyber Security and Critical Infrastructure" after a rigorous peer-review process. Authors from academia, government and industry contributed their innovative solutions, consistent with the interdisciplinary nature of cybersecurity. The book contains 16 articles, including an editorial that explains the current challenges, innovative solutions and real-world experiences that include critical infrastructure and 15 original papers that present state-of-the-art innovative solutions to attacks on critical systems

    Detecting and Modeling Polymorphic Shellcode

    Get PDF
    In this thesis, we address the problem of modeling and detecting polymorphic engines shellcode. By polymorphic engines, we mean programs having the ability to transform any piece of malware into many instances consisting of different code but having the same functionality as the original malware. Typically, polymorphic engines work by encrypting the target malware using various encryption techniques and providing a decryption module in order to execute the newly encrypted instance. Moreover, those engines have the ability to mutate their decryption routine making them unique from one instance to another and hard to detect. Our analysis focuses on polymorphic shellcode, which is shellcode that uses a polymorphic engine to mutate while keeping the original function of the code the same. We propose a new concept of signatures, shape signatures, which cope with the highly mutated nature of those engines. Those signatures try to identify the constant part as well as the mutated part of the deciphering routines. This combination is able to cope with the highly mutated nature of those engines in a much more efficient way compared to traditional signatures used in most intrusion detection systems. The second part of the thesis aims at modeling those polymorphic engines by showing that they exhibit commo

    Unsupervised Intrusion Detection with Cross-Domain Artificial Intelligence Methods

    Get PDF
    Cybercrime is a major concern for corporations, business owners, governments and citizens, and it continues to grow in spite of increasing investments in security and fraud prevention. The main challenges in this research field are: being able to detect unknown attacks, and reducing the false positive ratio. The aim of this research work was to target both problems by leveraging four artificial intelligence techniques. The first technique is a novel unsupervised learning method based on skip-gram modeling. It was designed, developed and tested against a public dataset with popular intrusion patterns. A high accuracy and a low false positive rate were achieved without prior knowledge of attack patterns. The second technique is a novel unsupervised learning method based on topic modeling. It was applied to three related domains (network attacks, payments fraud, IoT malware traffic). A high accuracy was achieved in the three scenarios, even though the malicious activity significantly differs from one domain to the other. The third technique is a novel unsupervised learning method based on deep autoencoders, with feature selection performed by a supervised method, random forest. Obtained results showed that this technique can outperform other similar techniques. The fourth technique is based on an MLP neural network, and is applied to alert reduction in fraud prevention. This method automates manual reviews previously done by human experts, without significantly impacting accuracy

    Detecting worm mutations using machine learning

    Get PDF
    Worms are malicious programs that spread over the Internet without human intervention. Since worms generally spread faster than humans can respond, the only viable defence is to automate their detection. Network intrusion detection systems typically detect worms by examining packet or flow logs for known signatures. Not only does this approach mean that new worms cannot be detected until the corresponding signatures are created, but that mutations of known worms will remain undetected because each mutation will usually have a different signature. The intuitive and seemingly most effective solution is to write more generic signatures, but this has been found to increase false alarm rates and is thus impractical. This dissertation investigates the feasibility of using machine learning to automatically detect mutations of known worms. First, it investigates whether Support Vector Machines can detect mutations of known worms. Support Vector Machines have been shown to be well suited to pattern recognition tasks such as text categorisation and hand-written digit recognition. Since detecting worms is effectively a pattern recognition problem, this work investigates how well Support Vector Machines perform at this task. The second part of this dissertation compares Support Vector Machines to other machine learning techniques in detecting worm mutations. Gaussian Processes, unlike Support Vector Machines, automatically return confidence values as part of their result. Since confidence values can be used to reduce false alarm rates, this dissertation determines how Gaussian Process compare to Support Vector Machines in terms of detection accuracy. For further comparison, this work also compares Support Vector Machines to K-nearest neighbours, known for its simplicity and solid results in other domains. The third part of this dissertation investigates the automatic generation of training data. Classifier accuracy depends on good quality training data -- the wider the training data spectrum, the higher the classifier's accuracy. This dissertation describes the design and implementation of a worm mutation generator whose output is fed to the machine learning techniques as training data. This dissertation then evaluates whether the training data can be used to train classifiers of sufficiently high quality to detect worm mutations. The findings of this work demonstrate that Support Vector Machines can be used to detect worm mutations, and that the optimal configuration for detection of worm mutations is to use a linear kernel with unnormalised bi-gram frequency counts. Moreover, the results show that Gaussian Processes and Support Vector Machines exhibit similar accuracy on average in detecting worm mutations, while K-nearest neighbours consistently produces lower quality predictions. The generated worm mutations are shown to be of sufficiently high quality to serve as training data. Combined, the results demonstrate that machine learning is capable of accurately detecting mutations of known worms

    Security Issues of Mobile and Smart Wearable Devices

    Get PDF
    Mobile and smart devices (ranging from popular smartphones and tablets to wearable fitness trackers equipped with sensing, computing and networking capabilities) have proliferated lately and redefined the way users carry out their day-to-day activities. These devices bring immense benefits to society and boast improved quality of life for users. As mobile and smart technologies become increasingly ubiquitous, the security of these devices becomes more urgent, and users should take precautions to keep their personal information secure. Privacy has also been called into question as so many of mobile and smart devices collect, process huge quantities of data, and store them on the cloud as a matter of fact. Ensuring confidentiality, integrity, and authenticity of the information is a cybersecurity challenge with no easy solution. Unfortunately, current security controls have not kept pace with the risks posed by mobile and smart devices, and have proven patently insufficient so far. Thwarting attacks is also a thriving research area with a substantial amount of still unsolved problems. The pervasiveness of smart devices, the growing attack vectors, and the current lack of security call for an effective and efficient way of protecting mobile and smart devices. This thesis deals with the security problems of mobile and smart devices, providing specific methods for improving current security solutions. Our contributions are grouped into two related areas which present natural intersections and corresponds to the two central parts of this document: (1) Tackling Mobile Malware, and (2) Security Analysis on Wearable and Smart Devices. In the first part of this thesis, we study methods and techniques to assist security analysts to tackle mobile malware and automate the identification of malicious applications. We provide threefold contributions in tackling mobile malware: First, we introduce a Secure Message Delivery (SMD) protocol for Device-to-Device (D2D) networks, with primary objective of choosing the most secure path to deliver a message from a sender to a destination in a multi-hop D2D network. Second, we illustrate a survey to investigate concrete and relevant questions concerning Android code obfuscation and protection techniques, where the purpose is to review code obfuscation and code protection practices. We evaluate efficacy of existing code de-obfuscation tools to tackle obfuscated Android malware (which provide attackers with the ability to evade detection mechanisms). Finally, we propose a Machine Learning-based detection framework to hunt malicious Android apps by introducing a system to detect and classify newly-discovered malware through analyzing applications. The proposed system classifies different types of malware from each other and helps to better understanding how malware can infect devices, the threat level they pose and how to protect against them. Our designed system leverages more complete coverage of apps’ behavioral characteristics than the state-of-the-art, integrates the most performant classifier, and utilizes the robustness of extracted features. The second part of this dissertation conducts an in-depth security analysis of the most popular wearable fitness trackers on the market. Our contributions are grouped into four central parts in this domain: First, we analyze the primitives governing the communication between fitness tracker and cloud-based services. In addition, we investigate communication requirements in this setting such as: (i) Data Confidentiality, (ii) Data Integrity, and (iii) Data Authenticity. Second, we show real-world demos on how modern wearable devices are vulnerable to false data injection attacks. Also, we document successful injection of falsified data to cloud-based services that appears legitimate to the cloud to obtain personal benefits. Third, we circumvent End-to-End protocol encryption implemented in the most advanced and secure fitness trackers (e.g., Fitbit, as the market leader) through Hardware-based reverse engineering. Last but not least, we provide guidelines for avoiding similar vulnerabilities in future system designs

    Data-Driven Techniques For Vulnerability Assessments

    Get PDF
    Security vulnerabilities have been puzzling researchers and practitioners for decades.As highlighted by the recent WannaCry and NotPetya ransomware campaigns, which resulted in billions of dollars of losses, weaponized exploits against vulnerabilities remain one of the main tools for cybercrime. The upward trend in the number of vulnerabilities reported annually and technical challenges in the way of remediation lead to large exposure windows for the vulnerable populations. On the other hand, due to sustained efforts in application and operating system security, few vulnerabilities are exploited in real-world attacks. Existing metrics for severity assessments err on the side of caution and overestimate the risk posed by vulnerabilities, further affecting remediation efforts that rely on prioritization. In this dissertation we show that severity assessments can be improved by taking into account public information about vulnerabilities and exploits.The disclosure of vulnerabilities is followed by artifacts such as social media discussions, write-ups and proof-of-concepts, containing technical information related to the vulnerabilities and their exploitation. These artifacts can be mined to detect active exploits or predict their development. However, we first need to understand: What features are required for different tasks? What biases are present in public data and how are data-driven systems affected? What security threats do these systems face when deployed operationally? We explore the questions by first collecting vulnerability-related posts on social media and analyzing the community and the content of their discussions.This analysis reveals that victims of attacks often share their experience online, and we leverage this finding to build an early detector of exploits active in the wild. Our detector significantly improves on the precision of existing severity metrics and can detect active exploits a median of 5 days earlier than a commercial intrusion prevention product. Next, we investigate the utility of various artifacts in predicting the development of functional exploits. We engineer features causally linked to the ease of exploitation, highlight trade-offs between timeliness and predictive utility of various artifacts, and characterize the biases that affect the ground truth for exploit prediction tasks. Using these insights, we propose a machine learning-based system that continuously collects artifacts and predicts the likelihood of exploits being developed against these vulnerabilities. We demonstrate our system's practical utility through its ability to highlight critical vulnerabilities and predict imminent exploits. Lastly, we explore the adversarial threats faced by data-driven security systems that rely on inputs of unknown provenance.We propose a framework for defining algorithmic threat models and for exploring adversaries with various degrees of knowledge and capabilities. Using this framework, we model realistic adversaries that could target our systems, design data poisoning attacks to measure their robustness, and highlight promising directions for future defenses against such attacks
    corecore