1,229 research outputs found

    Data Stream Clustering for Real-Time Anomaly Detection: An Application to Insider Threats

    Get PDF
    Insider threat detection is an emergent concern for academia, industries, and governments due to the growing number of insider incidents in recent years. The continuous streaming of unbounded data coming from various sources in an organisation, typically in a high velocity, leads to a typical Big Data computational problem. The malicious insider threat refers to anomalous behaviour(s) (outliers) that deviate from the normal baseline of a data stream. The absence of previously logged activities executed by users shapes the insider threat detection mechanism into an unsupervised anomaly detection approach over a data stream. A common shortcoming in the existing data mining approaches to detect insider threats is the high number of false alarms/positives (FPs). To handle the Big Data issue and to address the shortcoming, we propose a streaming anomaly detection approach, namely Ensemble of Random subspace Anomaly detectors In Data Streams (E-RAIDS), for insider threat detection. E-RAIDS learns an ensemble of p established outlier detection techniques [Micro-cluster-based Continuous Outlier Detection (MCOD) or Anytime Outlier Detection (AnyOut)] which employ clustering over continuous data streams. Each model of the p models learns from a random feature subspace to detect local outliers, which might not be detected over the whole feature space. E-RAIDS introduces an aggregate component that combines the results from the p feature subspaces, in order to confirm whether to generate an alarm at each window iteration. The merit of E-RAIDS is that it defines a survival factor and a vote factor to address the shortcoming of high number of FPs. Experiments on E-RAIDS-MCOD and E-RAIDS-AnyOut are carried out, on synthetic data sets including malicious insider threat scenarios generated at Carnegie Mellon University, to test the effectiveness of voting feature subspaces, and the capability to detect (more than one)-behaviour-all-threat in real-time. The results show that E-RAIDS-MCOD reports the highest F1 measure and less number of false alarm = 0 compared to E-RAIDS-AnyOut, as well as it attains to detect approximately all the insider threats in real-time

    Graph Mining for Cybersecurity: A Survey

    Full text link
    The explosive growth of cyber attacks nowadays, such as malware, spam, and intrusions, caused severe consequences on society. Securing cyberspace has become an utmost concern for organizations and governments. Traditional Machine Learning (ML) based methods are extensively used in detecting cyber threats, but they hardly model the correlations between real-world cyber entities. In recent years, with the proliferation of graph mining techniques, many researchers investigated these techniques for capturing correlations between cyber entities and achieving high performance. It is imperative to summarize existing graph-based cybersecurity solutions to provide a guide for future studies. Therefore, as a key contribution of this paper, we provide a comprehensive review of graph mining for cybersecurity, including an overview of cybersecurity tasks, the typical graph mining techniques, and the general process of applying them to cybersecurity, as well as various solutions for different cybersecurity tasks. For each task, we probe into relevant methods and highlight the graph types, graph approaches, and task levels in their modeling. Furthermore, we collect open datasets and toolkits for graph-based cybersecurity. Finally, we outlook the potential directions of this field for future research

    Insider threat identification using the simultaneous neural learning of multi-source logs

    Get PDF
    Insider threat detection has drawn increasing attention in recent years. In order to capture a malicious insider's digital footprints that occur scatteredly across a wide range of audit data sources over a long period of time, existing approaches often leverage a scoring mechanism to orchestrate alerts generated from multiple sub-detectors, or require domain knowledge-based feature engineering to conduct a one-off analysis across multiple types of data. These approaches result in a high deployment complexity and incur additional costs for engaging security experts. In this paper, we present a novel approach that works with a variety of security logs. The security logs are transformed into texts in the same format and then arranged as a corpus. Using the model trained by Word2vec with the corpus, we are enabled to approximate the posterior probabilities for insider behaviours. Accordingly, we label the transformed events as suspicious if their behavioural probabilities are smaller than a given threshold, and a user is labelled as malicious if he/she is associated with multiple suspicious events. The experiments are undertaken with the Carnegie Mellon University (CMU) CERT Programs insider threat database v6.2, which not only demonstrate that the proposed approach is effective and scalable in practical applications but also provide a guidance for tuning the parameters and thresholds

    Graph based Anomaly Detection and Description: A Survey

    Get PDF
    Detecting anomalies in data is a vital task, with numerous high-impact applications in areas such as security, finance, health care, and law enforcement. While numerous techniques have been developed in past years for spotting outliers and anomalies in unstructured collections of multi-dimensional points, with graph data becoming ubiquitous, techniques for structured graph data have been of focus recently. As objects in graphs have long-range correlations, a suite of novel technology has been developed for anomaly detection in graph data. This survey aims to provide a general, comprehensive, and structured overview of the state-of-the-art methods for anomaly detection in data represented as graphs. As a key contribution, we give a general framework for the algorithms categorized under various settings: unsupervised vs. (semi-)supervised approaches, for static vs. dynamic graphs, for attributed vs. plain graphs. We highlight the effectiveness, scalability, generality, and robustness aspects of the methods. What is more, we stress the importance of anomaly attribution and highlight the major techniques that facilitate digging out the root cause, or the ‘why’, of the detected anomalies for further analysis and sense-making. Finally, we present several real-world applications of graph-based anomaly detection in diverse domains, including financial, auction, computer traffic, and social networks. We conclude our survey with a discussion on open theoretical and practical challenges in the field

    Mathematical models for insider threat mitigation

    Get PDF
    The world is rapidly undergoing a massive digital transformation where every human will have no choice but to rely on the confidentiality, integrity, and availability of information systems. At the same time, there are increasing numbers of malicious attackers who are ever trying to compromise information systems for financial or political gain. Given the threat landscape and its sophistication, the traditional approach of fortifying the castle will not provide sufficient protection to the information systems. This formidable threat can only be restrained by a new approach, which looks at both inwards and outwards for potential attacks. It is well established that humans are the weakest link when it comes to information security controls although the same humans are considered as the most valued assets. A trusted custodian with malicious intent can inflict an enormous damage to critical information assets. Often these attacks go unnoticed for a considerable period and will have caused irreversible damage to the organisation by the time they are discovered. In the recent past, there have been well publicised data compromises in the media which have damaged the reputations of governments and organisations and in some cases endangered human life. While some of these leaks can be classified as whistleblowing in the public interest, they are very real examples of information compromises in the context of information security. High profile leaks by Edward Snowden and Bradley (Chelsea) Manning, are perfect examples of the potential damage from an insider. Furthermore, most malicious insider activities go unnoticed or unpublicised as a damage control measure by the affected organisations. While there is lots of research and investment going into insider threat prevention, these attacks are on the rise at an alarming rate. A comprehensive study of publicly available insider threat cases, academic literature, and technical reports reveals the need for a multifaceted view of the problem. The insider threat problem can no longer be treated only as a technical data driven problem but requires the analysis of associated factors, a combination of technical and human behavioural aspects going beyond the traditional technology driven approaches. Furthermore, there is no universally agreed comprehensive feature set as the majority of the proposed models are bounded into a single threat scenario or conducted on a specific system. In order to overcome this limitation, this thesis introduces a precise user profile model integrating insider threat related parameters from technical, behavioural, psychological, and organisational paradigms. The proposed user profile model is a combination of: a comprehensive insider threat detection and prediction feature set; a collection of various techniques for feature specific user behaviour comparisons; and a framework for quantifying user behaviour as a numerical value. The unpredictability of malicious attackers and the complexity of malicious actions, necessitates the careful analysis of network, system and user parameters correlated with the insider threat problem. Also, unearthing the hidden evidence requires the analysis of an enormous amount of data generated from heterogeneous input streams. This creates a high dimensional, heterogeneous data analysis problem for distinguishing suspicious users from benign users. This creates the need to identify an appropriate means for data representation and feature extraction. Since traditional graph theory and new approaches in the field of complex networks enable the means of representing high dimensional, heterogeneous data, the feasibility of the use of graphs for data representation and feature extraction are investigated going beyond traditional data mining techniques. Unattributed graphs are introduced to represent users’ device usage data, web access data, and organisational hierarchy. A graph based feature extraction technique based on subgraphs generated on different order of neighbourhoods are introduced. A graph based approach to capture inter-user relationships using web access data is presented. Various insider threat models proposed in the literature including intrusion detection based approaches, system call based approaches, honeypot based approaches and stream mining approaches end up with high false positive rates. More recently machine learning approaches for identifying suspicious users from normal users have increased. However, the application of graph based anomaly detection techniques addressing the insider threat problem is relatively rare in the academic literature as well as uncommon in the commercial world. Therefore, we focused our attention on graph based anomaly detection techniques for differentiating suspicious users from the benign users. This thesis introduces two distinct insider threat detection frameworks. The first is a hybrid insider threat detection framework based on graph theoretic feature extraction mechanism and an unsupervised anomaly detection algorithm. The second is built on an attributed graph clustering mechanism integrated with an outlier ranking mechanism. Finally, a comprehensive theoretical and commercially viable framework for insider threat mitigation integrating user profiling, threat detection, and threat detection is introduced
    • …
    corecore