7 research outputs found

    Dataset for anomaly detection in a production wireless mesh community network

    Get PDF
    Wireless community networks, WCN, have proliferated around the world. Cheap off-the-shelf WiFi devices have enabled this new network paradigm where users build their own network infrastructure in a do-it-yourself alternative to traditional network operators. The fact that users are responsible for the administration of their own nodes makes the network very dynamic. There are frequent reboots of the networking devices, and users that join and leave the network. In addition, the unplanned deployment of the network makes it very heterogeneous, with both high and low capacity links. Therefore, anomaly detection in such dynamic scenario is challenging. In this paper we provide a dataset gathered from a production WCN. The data was obtained from a central server that collects data from the mesh nodes that build the network. In total, 63 different nodes were encountered during the data collection. The WCN is used daily to access the Internet from 17 subscribers of the local ISP available on the mesh. We have produced a dataset gathering a large set of features related not only to traffic, but other parameters such as CPU and memory. Furthermore, we provide the network topology of each sample in terms of the adjacency matrix, routing table and routing metrics. In the data we provide there is a known unprovoked gateway failure. Therefore, the dataset can be used to investigate the performance of unsupervised machine learning algorithms for fault detection in WCN. To our knowledge, this is the first dataset that allows fault detection to be investigated from a production WCN.This work has received funding through the DiPET CHIST-ERA under grant agreement PCI2019-111850-2; Spanish grant PID2019- 106774RB-C21; Romanian DIPET (62652/15.11.2019) project funded via PN 124/2020; and has been partially supported by the EU research project SERRANO (101017168) and hardware resources courtesy of the Romanian Ministry of Research and Innovation UEFISCDI COCO research project PN III-P4-ID-PCE-2020-0407.Peer ReviewedPostprint (published version

    Perspectives on anomaly and event detection in exascale systems

    Get PDF
    Proceeding of: IEEE 5th International Conference on Big Data Security on Cloud (BigDataSecurity), 27-29 May 2019, Washington, USAThe design and implementation of exascale system is nowadays an important challenge. Such a system is expected to combine HPC with Big Data methods and technologies to allow the execution of scientific workloads which are not tractable at this present time. In this paper we focus on an event and anomaly detection framework which is crucial in giving a global overview of a exascale system (which in turn is necessary for the successful implementation and exploitation of the system). We propose an architecture for such a framework and show how it can be used to handle failures during job execution.This work has received funding from the EC-funded H2020 ASPIDE project (Agreement 801091). This work was supported with hardware resources by the Romanian grant BID (PN-III-P1-PFE-28)

    Monitoring of exascale data processing

    Get PDF
    Proceeding of: 2019 IEEE International Conference on Advanced Scientific Computing (ICASC)Exascale systems are a hot topic of research in computer science. These systems in contrast to current Cloud, Big Data and HPC systems will routinely contain hundreds of thousand of nodes generating millions of events. At this scale of hardware fault and anomalous behaviour is not only more likely but to be expected. In this paper we describe the architecture of and Exascale monitoring solution coupled with an event detection component. The latter component is extremely important in order to handle the multitude of potential events. We describe the major lacking research that needs to be done, which will make event detection freezable in real world Exascale systems.This work has received funding from the EC-funded H2020 ASPIDE project (Agreement 801091: Exascale programming models for extreme data processing). This work was supported with hardware resources by the Romanian grant BID (PN-III-P1-PFE-28: Big Data Science)

    A scalable platform for monitoring data intensive applications

    Get PDF
    Latest advances in information technology and the widespread growth in different areas are producing large amounts of data. Consequently, in the past decade a large number of distributed platforms for storing and processing large datasets have been proposed. Whether in development or in production, monitoring the applications running on these platforms is not an easy task, dedicated tools and platforms were proposed for this task yet none are specially designed for Big Data frameworks. In this paper we present a distributed, scalable, highly available platform able to collect, store, query and process monitoring data obtained from multiple Big Data frameworks. Alongside the architecture we experimentally show that the solution proposed is scalable and can handle a substantial quantity of monitoring data.This work has received funding from the EC-funded project H2020 DICE (Agreement 644869), which aims at providing a toolchain that makes the task of developing Big Data applications less daunting and the H2020 ASPIDE project (Agreement 801091). This work was partially supported by grants from Romanian Ministry of Research and Innovation, grant Acronim (PNIII-P4-ID-PCE-2016-0842) and grant BID (PNIII-P1-PDI-PFE-2018-028)

    Anomaly detection for fault detection in wireless community networks using machine learning

    No full text
    Machine learning has received increasing attention in computer science in recent years and many types of methods have been proposed. In computer networks, little attention has been paid to the use of ML for fault detection, the main reason being the lack of datasets. This is motivated by the reluctance of network operators to share data about their infrastructure and network failures. In this paper, we attempt to fill this gap using anomaly detection techniques to discern hardware failure events in wireless community networks. For this purpose we use 4 unsupervised machine learning, ML, approaches based on different principles. We have built a dataset from a production wireless community network, gathering traffic and non-traffic features, e.g. CPU and memory. For the numerical analysis we investigated the ability of the different ML approaches to detect an unprovoked gateway failure that occurred during data collection. Our numerical results show that all the tested approaches improve to detect the gateway failure when non-traffic features are also considered. We see that, when properly tuned, all ML methods are effective to detect the failure. Nonetheless, using decision boundaries and other analysis techniques we observe significant different behavior among the ML methods.This work has received funding through the DiPET CHIST-ERA under grant agreement PCI2019-111850-2; Spanish grant PID2019-106774RB-C21; Romanian DIPET (62652/15.11.2019) project funded via PN 124/2020; and has been partially supported by the EU research project SERRANO (101017168) and hardware resources courtesy of the Romanian Ministry of Research and Innovation UEFISCDI COCO research project PN III-P4-ID-PCE-2020-0407.Peer ReviewedPostprint (published version

    On Processing Extreme Data

    Get PDF
    International audienceExtreme Data is an incarnation of Big Data concept distinguished by the massive amounts of data that must be queried, communicated and analyzed in near real-time by using a very large number of memory or storage elements and exascale computing systems. Immediate examples are the scientific data produced at a rate of hundreds of gigabits-per-second that must be stored, filtered and analyzed, the millions of images per day that must be analyzed in parallel, the one billion of social data posts queried in real-time on an in-memory components database. Traditional disks or commercial storage nowadays cannot handle the extreme scale of such application data. Following the need of improvement of current concepts and technologies, we focus in this paper on the needs of data intensive applications running on systems composed of up to millions of computing elements (exascale systems). We propose in this paper a methodology to advance the state-of-the-art. The starting point is the definition of new programming paradigms, APIs, runtime tools and methodologies for expressing data-intensive tasks on exascale systems. This will pave the way for the exploitation of massive parallelism over a simplified model of the system architecture, thus promoting high performance and efficiency, offering powerful operations and mechanisms for processing extreme data sources at high speed and/or real time
    corecore