609 research outputs found

    Neural Computing for Event Log Quality Improvement

    Get PDF
    Department of Management EngineeringAn event log is a vital part used for process mining such as process discovery, conformance checking or enhancement. Like any other data, the initial event logs can be too coarse resulting in severe data mining mistakes. Traditional statistical reconstruction methods work poorly with event logs, because of the complex interrelations among attributes, events and cases. As such, machine learning approaches appear more suitable for reconstructing or repairing event logs. However, there is very limited work on exploiting neural networks to do this task. This thesis focuses on two issues that may arise in the coarse event logs, incorrect attribute values and missing attribute values. We are interested in exploring the application of different kinds of autoencoders on the task of reconstructing event logs since this architecture suits the problem of unsupervised learning, such as the ones we are considering. When repairing an event log, in fact, one cannot assume that a training set with true labels is available for model training. We also propose the techniques for preprocessing and training the event logs data. In order to provide an insight on how feasible and applicable our work is, we have carried out experiments using real-life datasets. Regarding the first issue, we train autoencoders under purely unsupervised manner to deal with the problem of anomaly detection without using any prior knowledge of the domain. We focus on developing algorithms that can capture the general pattern and sequence aspect of the data. In order to solve the second issue, we develop models that should not only learn the representation and underlying true distribution of the data but also be able to generate the realistic and reliable output that has the characteristic of the logs.ope

    The New Abnormal: Network Anomalies in the AI Era

    Get PDF
    Anomaly detection aims at finding unexpected patterns in data. It has been used in several problems in computer networks, from the detection of port scans and DDoS attacks to the monitoring of time-series collected from Internet monitoring systems. Data-driven approaches and machine learning have seen widespread application on anomaly detection too, and this trend has been accelerated by the recent developments on Artificial Intelligence research. This chapter summarizes ongoing recent progresses on anomaly detection research. In particular, we evaluate how developments on AI algorithms bring new possibilities for anomaly detection. We cover new representation learning techniques such as Generative Artificial Networks and Autoencoders, as well as techniques that can be used to improve models learned with machine learning algorithms, such as reinforcement learning. We survey both research works and tools implementing AI algorithms for anomaly detection. We found that the novel algorithms, while successful in other fields, have hardly been applied to networking problems. We conclude the chapter with a case study that illustrates a possible research direction

    Clustering of LMS Use Strategies with Autoencoders

    Get PDF
    Learning Management Systems provide teachers with many functionalities to offer materials to students, interact with them and manage their courses. Recognizing teachers’ instructing styles from their course designs would allow recommendations and best practices to be made. We propose a method that determines teaching style in an unsupervised way from the course structure and use patterns. We define a course classification approach based on deep learning and clustering. We first use an autoencoder to reduce the dimensionality of the input data, while extracting the most important characteristics; thus, we obtain a latent representation of the courses. We then apply clustering techniques to the latent data to group courses based on their use patterns. The results show that this technique improves the clustering performance while avoiding the manual data pre-processing work. Furthermore, the obtained model defines seven course typologies that are clearly related to different use patterns of Learning Management Systems

    Design and Deployment of an Access Control Module for Data Lakes

    Get PDF
    Nowadays big data is considered an extremely valued asset for companies, which are discovering new avenues to use it for their business profit. However, an organization’s ability to effectively extract valuable information from data is based on its knowledge management infrastructure. Thus, most organizations are transitioning from data warehouse (DW) storages to data lake (DL) infrastructures, from which further insights are derived. The present work is carried out as part of a cybersecurity project in a financial institution that manages vast volumes and variety of data that is kept in a data lake. Although DL is presented as the answer to the current big data scenario, this infrastructure presents certain flaws on authentication and access control. Preceding work on DL access control points out that the main goal is to avoid fraudulent behaviors derived from user’s access, such as secondary use1, that could result in business data being exposed to third parties. To overcome the risk, traditional mechanisms attempt to identify these behaviors based on rules, however, they cannot reveal all different kinds of fraud because they only look for known patterns of misuse. The present work proposes a novel access control system for data lakes, assisted by Oracle’s database audit trail and based on anomaly detection mechanisms, that automatically looks for events that do not conform the normal or expected behavior. Thus, the overall aim of this project is to develop and deploy an automated system for identifying abnormal accesses to the DL, which can be separated into four subgoals: explore the different technologies that could be applied in the domain of anomaly detection, design the solution, deploy it, and evaluate the results. For the purpose, feature engineering is performed, and four different unsupervised ML models are built and evaluated. According to the quality of the results, the better model is finally productionalized with Docker. To conclude, although anomaly detection has been a lasting yet active research area for several decades, there are still some unique problem complexities and challenges that leave the way open for the proposed solution to be further improved.Doble Grado en Ingeniería Informática y Administración de Empresa

    Online anomaly detection using statistical leverage for streaming business process events

    Full text link
    While several techniques for detecting trace-level anomalies in event logs in offline settings have appeared recently in the literature, such techniques are currently lacking for online settings. Event log anomaly detection in online settings can be crucial for discovering anomalies in process execution as soon as they occur and, consequently, allowing to promptly take early corrective actions. This paper describes a novel approach to event log anomaly detection on event streams that uses statistical leverage. Leverage has been used extensively in statistics to develop measures to identify outliers and it has been adapted in this paper to the specific scenario of event stream data. The proposed approach has been evaluated on both artificial and real event streams.Comment: 12 pages, 4 figures, conference (Proceedings of the 1st International Workshop on Streaming Analytics for Process Mining (SA4PM 2020) in conjunction with International Conference on Process Mining, Accepted for publication (Sep 2020)

    A survey on the application of deep learning for code injection detection

    Get PDF
    Abstract Code injection is one of the top cyber security attack vectors in the modern world. To overcome the limitations of conventional signature-based detection techniques, and to complement them when appropriate, multiple machine learning approaches have been proposed. While analysing these approaches, the surveys focus predominantly on the general intrusion detection, which can be further applied to specific vulnerabilities. In addition, among the machine learning steps, data preprocessing, being highly critical in the data analysis process, appears to be the least researched in the context of Network Intrusion Detection, namely in code injection. The goal of this survey is to fill in the gap through analysing and classifying the existing machine learning techniques applied to the code injection attack detection, with special attention to Deep Learning. Our analysis reveals that the way the input data is preprocessed considerably impacts the performance and attack detection rate. The proposed full preprocessing cycle demonstrates how various machine-learning-based approaches for detection of code injection attacks take advantage of different input data preprocessing techniques. The most used machine learning methods and preprocessing stages have been also identified

    A Survey on Explainable Anomaly Detection

    Full text link
    In the past two decades, most research on anomaly detection has focused on improving the accuracy of the detection, while largely ignoring the explainability of the corresponding methods and thus leaving the explanation of outcomes to practitioners. As anomaly detection algorithms are increasingly used in safety-critical domains, providing explanations for the high-stakes decisions made in those domains has become an ethical and regulatory requirement. Therefore, this work provides a comprehensive and structured survey on state-of-the-art explainable anomaly detection techniques. We propose a taxonomy based on the main aspects that characterize each explainable anomaly detection technique, aiming to help practitioners and researchers find the explainable anomaly detection method that best suits their needs.Comment: Paper accepted by the ACM Transactions on Knowledge Discovery from Data (TKDD) for publication (preprint version
    corecore