29,357 research outputs found

    Anomaly Detection for Big Data Technologies

    Get PDF
    The main goal of this research is to contribute to automated performance anomaly detection for large-scale and complex distributed systems, especially for Big Data applications within cloud computing. The main points that we will investigate are: - Automated detection of anomalous performance behaviors by finding the relevant performance metrics with which to characterize behavior of systems. - Performance anomaly localization: To pinpoint the cause of a performance anomaly due to internal or external faults. - Investigation of the possibility of anomaly prediction. Failure prediction aims to determine the possible occurrences of catastrophic events in the near future and will enable system developers to utilize effective monitoring solutions to guarantee system availability. - Assessment for the potential of hybrid methods that combine machine learning with traditional methods used in performance for anomaly detection. The topic of this research proposal will offer me the opportunity to more deeply apply my interest in the field of performance anomaly detection and prediction by investigating and using novel optimization strategies. In addition, this research provides a very interesting case of utilizing the anomaly detection techniques in a large-scale Big Data and cloud computing environment. Among the various Big Data technologies, in-memory processing technology like Apache Spark has become widely adopted by industries as result of its speed, generality, ease of use, and compatibility with other Big Data systems. Although Spark is developing gradually, currently there are still shortages in comprehensive performance analyses that specifically build for Spark and are used to detect performance anomalies. Therefore, this raises my interest in addressing this challenge by investigating new hybrid learning techniques for anomaly detection in large-scale and complex systems, especially for in-memory processing Big Data platforms within cloud computing

    Real-time big data processing for anomaly detection : a survey

    Get PDF
    The advent of connected devices and omnipresence of Internet have paved way for intruders to attack networks, which leads to cyber-attack, financial loss, information theft in healthcare, and cyber war. Hence, network security analytics has become an important area of concern and has gained intensive attention among researchers, off late, specifically in the domain of anomaly detection in network, which is considered crucial for network security. However, preliminary investigations have revealed that the existing approaches to detect anomalies in network are not effective enough, particularly to detect them in real time. The reason for the inefficacy of current approaches is mainly due the amassment of massive volumes of data though the connected devices. Therefore, it is crucial to propose a framework that effectively handles real time big data processing and detect anomalies in networks. In this regard, this paper attempts to address the issue of detecting anomalies in real time. Respectively, this paper has surveyed the state-of-the-art real-time big data processing technologies related to anomaly detection and the vital characteristics of associated machine learning algorithms. This paper begins with the explanation of essential contexts and taxonomy of real-time big data processing, anomalous detection, and machine learning algorithms, followed by the review of big data processing technologies. Finally, the identified research challenges of real-time big data processing in anomaly detection are discussed. © 2018 Elsevier Lt

    ANOMALY DETECTION IN IT AUDIT : The possibilities and potential in the domain of IT Audit

    Get PDF
    IT Audit is dealing with a continuous increase in complexity and work. Regulations get stricter, while IT plays an increasingly more important role in companies. New technologies like anomaly detection can play a role in supporting IT Audit decisions. Anomaly detection has recently seen use in many domains, including financial audit, for example in fraud detection. Yet IT Audit does not make use of this technology as of now. This research looks atthe possible roles that anomaly detection can play in this domain. This research starts by attempting to bring the existing literature on both domains closer together and then creating variables that influence successful anomaly detection implementation in IT Audit. Exploratory interviews led to different approaches to implementation. IT Audit currently works with random samples to offer reasonable assurance on a statistical basis. As anomaly detection requires more data than the samples can provide, the potential benefits and consequences of utilizing the entire data population in an audit are researched. As controls are unique to each client, IT Audit tasks have been grouped per common IT risk. For each risk, the potential of anomaly detection is determined based on four variables: the impact of erroneous instances going undetected, the time spent on the audit task, the frequency of the task, and the external pressure. Interviews with IT Audit professionals have been used to go through the IT risks with the highest potential, and determine the challenges. For each challenge, solutions have been discussed, as well as their feasibility. Two use-cases have been formulated based on the interviews. The first use-case aims to use anomaly detection to detect multiple manage change risks, by looking at the full data population of changes at big clients working in standardized systems. The second use-case aims to discover SoD concerns and could be combined with financial audit data to discover fraud. Unsupervised deep learning methods are most likely to succeed. Prior research indicates deep autoencoder neural networks as a suitable method. The biggest challenges for implementation turned out to be in the current audit methodology, rather than development. The current sample approach is based on the notion that testing the full data population would not be possible while remaining within time and budget norms. New techniques, such as anomaly detection, might mean this notion is outdated, but the methods cannot be created and optimized due to the current restraints

    A Survey on Big Data for Network Traffic Monitoring and Analysis

    Get PDF
    Network Traffic Monitoring and Analysis (NTMA) represents a key component for network management, especially to guarantee the correct operation of large-scale networks such as the Internet. As the complexity of Internet services and the volume of traffic continue to increase, it becomes difficult to design scalable NTMA applications. Applications such as traffic classification and policing require real-time and scalable approaches. Anomaly detection and security mechanisms require to quickly identify and react to unpredictable events while processing millions of heterogeneous events. At last, the system has to collect, store, and process massive sets of historical data for post-mortem analysis. Those are precisely the challenges faced by general big data approaches: Volume, Velocity, Variety, and Veracity. This survey brings together NTMA and big data. We catalog previous work on NTMA that adopt big data approaches to understand to what extent the potential of big data is being explored in NTMA. This survey mainly focuses on approaches and technologies to manage the big NTMA data, additionally briefly discussing big data analytics (e.g., machine learning) for the sake of NTMA. Finally, we provide guidelines for future work, discussing lessons learned, and research directions

    NEMICO: Mining network data through cloud-based data mining techniques

    Get PDF
    Thanks to the rapid advances in Internet-based applications, data acquisition and storage technologies, petabyte-sized network data collections are becoming more and more common, thus prompting the need for scalable data analysis solutions. By leveraging today’s ubiquitous many-core computer architectures and the increasingly popular cloud computing paradigm, the applicability of data mining algorithms to these large volumes of network data can be scaled up to gain interesting insights. This paper proposes NEMICO, a comprehensive Big Data mining system targeted to network traffic flow analyses (e.g., traffic flow characterization, anomaly detection, multiplelevel pattern mining). NEMICO comprises new approaches that contribute to a paradigm-shift in distributed data mining by addressing most challenging issues related to Big Data, such as data sparsity, horizontal scaling, and parallel computation

    Perspectives on anomaly and event detection in exascale systems

    Get PDF
    Proceeding of: IEEE 5th International Conference on Big Data Security on Cloud (BigDataSecurity), 27-29 May 2019, Washington, USAThe design and implementation of exascale system is nowadays an important challenge. Such a system is expected to combine HPC with Big Data methods and technologies to allow the execution of scientific workloads which are not tractable at this present time. In this paper we focus on an event and anomaly detection framework which is crucial in giving a global overview of a exascale system (which in turn is necessary for the successful implementation and exploitation of the system). We propose an architecture for such a framework and show how it can be used to handle failures during job execution.This work has received funding from the EC-funded H2020 ASPIDE project (Agreement 801091). This work was supported with hardware resources by the Romanian grant BID (PN-III-P1-PFE-28)
    • …
    corecore