3,485 research outputs found

    HYPA: Efficient Detection of Path Anomalies in Time Series Data on Networks

    Full text link
    The unsupervised detection of anomalies in time series data has important applications in user behavioral modeling, fraud detection, and cybersecurity. Anomaly detection has, in fact, been extensively studied in categorical sequences. However, we often have access to time series data that represent paths through networks. Examples include transaction sequences in financial networks, click streams of users in networks of cross-referenced documents, or travel itineraries in transportation networks. To reliably detect anomalies, we must account for the fact that such data contain a large number of independent observations of paths constrained by a graph topology. Moreover, the heterogeneity of real systems rules out frequency-based anomaly detection techniques, which do not account for highly skewed edge and degree statistics. To address this problem, we introduce HYPA, a novel framework for the unsupervised detection of anomalies in large corpora of variable-length temporal paths in a graph. HYPA provides an efficient analytical method to detect paths with anomalous frequencies that result from nodes being traversed in unexpected chronological order.Comment: 11 pages with 8 figures and supplementary material. To appear at SIAM Data Mining (SDM 2020

    Unsupervised Learning of Distributional Properties can Supplement Human Labeling and Increase Active Learning Efficiency in Anomaly Detection

    Full text link
    Exfiltration of data via email is a serious cybersecurity threat for many organizations. Detecting data exfiltration (anomaly) patterns typically requires labeling, most often done by a human annotator, to reduce the high number of false alarms. Active Learning (AL) is a promising approach for labeling data efficiently, but it needs to choose an efficient order in which cases are to be labeled, and there are uncertainties as to what scoring procedure should be used to prioritize cases for labeling, especially when detecting rare cases of interest is crucial. We propose an adaptive AL sampling strategy that leverages the underlying prior data distribution, as well as model uncertainty, to produce batches of cases to be labeled that contain instances of rare anomalies. We show that (1) the classifier benefits from a batch of representative and informative instances of both normal and anomalous examples, (2) unsupervised anomaly detection plays a useful role in building the classifier in the early stages of training when relatively little labeling has been done thus far. Our approach to AL for anomaly detection outperformed existing AL approaches on three highly unbalanced UCI benchmarks and on one real-world redacted email data set

    Moving from a "human-as-problem" to a "human-as-solution" cybersecurity mindset

    Get PDF
    Cybersecurity has gained prominence, with a number of widely publicised security incidents, hacking attacks and data breaches reaching the news over the last few years. The escalation in the numbers of cyber incidents shows no sign of abating, and it seems appropriate to take a look at the way cybersecurity is conceptualised and to consider whether there is a need for a mindset change.To consider this question, we applied a "problematization" approach to assess current conceptualisations of the cybersecurity problem by government, industry and hackers. Our analysis revealed that individual human actors, in a variety of roles, are generally considered to be "a problem". We also discovered that deployed solutions primarily focus on preventing adverse events by building resistance: i.e. implementing new security layers and policies that control humans and constrain their problematic behaviours. In essence, this treats all humans in the system as if they might well be malicious actors, and the solutions are designed to prevent their ill-advised behaviours. Given the continuing incidences of data breaches and successful hacks, it seems wise to rethink the status quo approach, which we refer to as "Cybersecurity, Currently". In particular, we suggest that there is a need to reconsider the core assumptions and characterisations of the well-intentioned human's role in the cybersecurity socio-technical system. Treating everyone as a problem does not seem to work, given the current cyber security landscape.Benefiting from research in other fields, we propose a new mindset i.e. "Cybersecurity, Differently". This approach rests on recognition of the fact that the problem is actually the high complexity, interconnectedness and emergent qualities of socio-technical systems. The "differently" mindset acknowledges the well-intentioned human's ability to be an important contributor to organisational cybersecurity, as well as their potential to be "part of the solution" rather than "the problem". In essence, this new approach initially treats all humans in the system as if they are well-intentioned. The focus is on enhancing factors that contribute to positive outcomes and resilience. We conclude by proposing a set of key principles and, with the help of a prototypical fictional organisation, consider how this mindset could enhance and improve cybersecurity across the socio-technical system

    Unsupervised Anomaly Detection in Stream Data with Online Evolving Spiking Neural Networks

    Get PDF
    Unsupervised anomaly discovery in stream data is a research topic with many practical applications. However, in many cases, it is not easy to collect enough training data with labeled anomalies for supervised learning of an anomaly detector in order to deploy it later for identification of real anomalies in streaming data. It is thus important to design anomalies detectors that can correctly detect anomalies without access to labeled training data. Our idea is to adapt the Online evolving Spiking Neural Network (OeSNN) classifier to the anomaly detection task. As a result, we offer an Online evolving Spiking Neural Network for Unsupervised Anomaly Detection algorithm (OeSNN-UAD), which, unlike OeSNN, works in an unsupervised way and does not separate output neurons into disjoint decision classes. OeSNN-UAD uses our proposed new two-step anomaly detection method. Also, we derive new theoretical properties of neuronal model and input layer encoding of OeSNN, which enable more effective and efficient detection of anomalies in our OeSNN-UAD approach. The proposed OeSNN-UAD detector was experimentally compared with state-of-the-art unsupervised and semi-supervised detectors of anomalies in stream data from the Numenta Anomaly Benchmark and Yahoo Anomaly Datasets repositories. Our approach outperforms the other solutions provided in the literature in the case of data streams from the Numenta Anomaly Benchmark repository. Also, in the case of real data files of the Yahoo Anomaly Benchmark repository, OeSNN-UAD outperforms other selected algorithms, whereas in the case of Yahoo Anomaly Benchmark synthetic data files, it provides competitive results to the results recently reported in the literature.P. MaciÄ…g acknowledges financial Support of the Faculty of the Electronics and Information Technology of the Warsaw University of Technology, Poland (Grant No. II/2019/GD/1). J.L. Lobo and J. Del Ser would like to thank the Basque Government, Spain for their support through the ELKARTEK and EMAITEK funding programs. J. Del Ser also acknowledges funding support from the Consolidated Research Group MATHMODE (IT1294-19) given by the Department of Education of the Basque Governmen

    GPS Anomaly Detection And Machine Learning Models For Precise Unmanned Aerial Systems

    Get PDF
    The rapid development and deployment of 5G/6G networks have brought numerous benefits such as faster speeds, enhanced capacity, improved reliability, lower latency, greater network efficiency, and enablement of new applications. Emerging applications of 5G impacting billions of devices and embedded electronics also pose cyber security vulnerabilities. This thesis focuses on the development of Global Positioning Systems (GPS) Based Anomaly Detection and corresponding algorithms for Unmanned Aerial Systems (UAS). Chapter 1 provides an overview of the thesis background and its objectives. Chapter 2 presents an overview of the 5G architectures, their advantages, and potential cyber threat types. Chapter 3 addresses the issue of GPS dropouts by taking the use case of the Dallas-Fort Worth (DFW) airport. By analyzing data from surveillance drones in the (DFW) area, its message frequency, and statistics on time differences between GPS messages were examined. Chapter 4 focuses on modeling and detecting false data injection (FDI) on GPS. Specifically, three scenarios, including Gaussian noise injection, data duplication, data manipulation are modeled. Further, multiple detection schemes that are Clustering-based and reinforcement learning techniques are deployed and detection accuracy were investigated. Chapter 5 shows the results of Chapters 3 and 4. Overall, this research provides a categorization and possible outlier detection to minimize the GPS interference for UAS enhancing the security and reliability of UAS operations

    Integrating Data Mining Techniques for Fraud Detection in Financial Control Processes

    Get PDF
    Detecting fraud in financial control processes poses significant challenges due to the complex nature of financial transactions and the evolving tactics employed by fraudsters. This paper investigates the integration of data mining techniques, specifically the combination of Benford's Law and machine learning algorithms, to create an enhanced framework for fraud detection. The paper highlights the importance of combating fraudulent activities and the potential of data mining techniques to bolster detection efforts. The literature review explores existing methodologies and their limitations, emphasizing the suitability of Benford's Law for fraud detection. However, shortcomings in practical implementation necessitate improvements for its effective utilization in financial control. Consequently, the article proposes a methodology that combines informative statistical features revealed by Benford’s law tests and subsequent clustering to overcome its limitations. The results present findings from a financial audit conducted on a road-construction company, showcasing representations of primary, advanced, and associated Benford’s law tests. Additionally, by applying clustering techniques, a distinct class of suspicious transactions is successfully identified, highlighting the efficacy of the integrated approach. This class represents only a small proportion of the entire sample, thereby significantly reducing the labor costs of specialists for manual audit of transactions. In conclusion, this paper underscores the comprehensive understanding that can be achieved through the integration of Benford's Law and other data mining techniques in fraud detection, emphasizing their potential to automate and scale fraud detection efforts in financial control processes

    A Family of Joint Sparse PCA Algorithms for Anomaly Localization in Network Data Streams

    Get PDF
    Determining anomalies in data streams that are collected and transformed from various types of networks has recently attracted significant research interest. Principal Component Analysis (PCA) is arguably the most widely applied unsupervised anomaly detection technique for networked data streams due to its simplicity and efficiency. However, none of existing PCA based approaches addresses the problem of identifying the sources that contribute most to the observed anomaly, or anomaly localization. In this paper, we first proposed a novel joint sparse PCA method to perform anomaly detection and localization for network data streams. Our key observation is that we can detect anomalies and localize anomalous sources by identifying a low dimensional abnormal subspace that captures the abnormal behavior of data. To better capture the sources of anomalies, we incorporated the structure of the network stream data in our anomaly localization framework. Also, an extended version of PCA, multidimensional KLE, was introduced to stabilize the localization performance. We performed comprehensive experimental studies on four real-world data sets from different application domains and compared our proposed techniques with several state-of-the-arts. Our experimental studies demonstrate the utility of the proposed methods
    • …
    corecore