329 research outputs found

    On the Nature and Types of Anomalies: A Review

    Full text link
    Anomalies are occurrences in a dataset that are in some way unusual and do not fit the general patterns. The concept of the anomaly is generally ill-defined and perceived as vague and domain-dependent. Moreover, despite some 250 years of publications on the topic, no comprehensive and concrete overviews of the different types of anomalies have hitherto been published. By means of an extensive literature review this study therefore offers the first theoretically principled and domain-independent typology of data anomalies, and presents a full overview of anomaly types and subtypes. To concretely define the concept of the anomaly and its different manifestations, the typology employs five dimensions: data type, cardinality of relationship, anomaly level, data structure and data distribution. These fundamental and data-centric dimensions naturally yield 3 broad groups, 9 basic types and 61 subtypes of anomalies. The typology facilitates the evaluation of the functional capabilities of anomaly detection algorithms, contributes to explainable data science, and provides insights into relevant topics such as local versus global anomalies.Comment: 38 pages (30 pages content), 10 figures, 3 tables. Preprint; review comments will be appreciated. Improvements in version 2: Explicit mention of fifth anomaly dimension; Added section on explainable anomaly detection; Added section on variations on the anomaly concept; Various minor additions and improvement

    Towards Data Sharing across Decentralized and Federated IoT Data Analytics Platforms

    Get PDF
    In the past decade the Internet-of-Things concept has overwhelmingly entered all of the fields where data are produced and processed, thus, resulting in a plethora of IoT platforms, typically cloud-based, that centralize data and services management. In this scenario, the development of IoT services in domains such as smart cities, smart industry, e-health, automotive, are possible only for the owner of the IoT deployments or for ad-hoc business one-to-one collaboration agreements. The realization of "smarter" IoT services or even services that are not viable today envisions a complete data sharing with the usage of multiple data sources from multiple parties and the interconnection with other IoT services. In this context, this work studies several aspects of data sharing focusing on Internet-of-Things. We work towards the hyperconnection of IoT services to analyze data that goes beyond the boundaries of a single IoT system. This thesis presents a data analytics platform that: i) treats data analytics processes as services and decouples their management from the data analytics development; ii) decentralizes the data management and the execution of data analytics services between fog, edge and cloud; iii) federates peers of data analytics platforms managed by multiple parties allowing the design to scale into federation of federations; iv) encompasses intelligent handling of security and data usage control across the federation of decentralized platforms instances to reduce data and service management complexity. The proposed solution is experimentally evaluated in terms of performances and validated against use cases. Further, this work adopts and extends available standards and open sources, after an analysis of their capabilities, fostering an easier acceptance of the proposed framework. We also report efforts to initiate an IoT services ecosystem among 27 cities in Europe and Korea based on a novel methodology. We believe that this thesis open a viable path towards a hyperconnection of IoT data and services, minimizing the human effort to manage it, but leaving the full control of the data and service management to the users' will

    Optics and virtualization as data center network infrastructure

    Get PDF
    The emerging cloud services have motivated a fresh look at the design of data center network infrastructure in multiple layers. To transfer the huge amount of data generated by many data intensive applications, data center network has to be fast, scalable and power efficient. To support flexible and efficient sharing in cloud services, service providers deploy a virtualization layer as part of the data center infrastructure. This thesis explores the design and performance analysis of data center network infrastructure in both physical network and virtualization layer. On the physical network design front, we present a hybrid packet/circuit switched network architecture which uses circuit switched optics to augment traditional packet-switched Ethernet in modern data centers. We show that this technique has substantial potential to improve bisection bandwidth and application performance in a cost-effective manner. To push the adoption of optical circuits in real cloud data centers, we further explore and address the circuit control issues in shared data center environments. On the virtualization layer, we present an analytical study on the network performance of virtualized data centers. Using Amazon EC2 as an experiment platform, we quantify the impact of virtualization on network performance in commercial cloud. Our findings provide valuable insights to both cloud users in moving legacy application into cloud and service providers in improving the virtualization infrastructure to support better cloud services

    TOWARDS A HOLISTIC EFFICIENT STACKING ENSEMBLE INTRUSION DETECTION SYSTEM USING NEWLY GENERATED HETEROGENEOUS DATASETS

    Get PDF
    With the exponential growth of network-based applications globally, there has been a transformation in organizations\u27 business models. Furthermore, cost reduction of both computational devices and the internet have led people to become more technology dependent. Consequently, due to inordinate use of computer networks, new risks have emerged. Therefore, the process of improving the speed and accuracy of security mechanisms has become crucial.Although abundant new security tools have been developed, the rapid-growth of malicious activities continues to be a pressing issue, as their ever-evolving attacks continue to create severe threats to network security. Classical security techniquesfor instance, firewallsare used as a first line of defense against security problems but remain unable to detect internal intrusions or adequately provide security countermeasures. Thus, network administrators tend to rely predominantly on Intrusion Detection Systems to detect such network intrusive activities. Machine Learning is one of the practical approaches to intrusion detection that learns from data to differentiate between normal and malicious traffic. Although Machine Learning approaches are used frequently, an in-depth analysis of Machine Learning algorithms in the context of intrusion detection has received less attention in the literature.Moreover, adequate datasets are necessary to train and evaluate anomaly-based network intrusion detection systems. There exist a number of such datasetsas DARPA, KDDCUP, and NSL-KDDthat have been widely adopted by researchers to train and evaluate the performance of their proposed intrusion detection approaches. Based on several studies, many such datasets are outworn and unreliable to use. Furthermore, some of these datasets suffer from a lack of traffic diversity and volumes, do not cover the variety of attacks, have anonymized packet information and payload that cannot reflect the current trends, or lack feature set and metadata.This thesis provides a comprehensive analysis of some of the existing Machine Learning approaches for identifying network intrusions. Specifically, it analyzes the algorithms along various dimensionsnamely, feature selection, sensitivity to the hyper-parameter selection, and class imbalance problemsthat are inherent to intrusion detection. It also produces a new reliable dataset labeled Game Theory and Cyber Security (GTCS) that matches real-world criteria, contains normal and different classes of attacks, and reflects the current network traffic trends. The GTCS dataset is used to evaluate the performance of the different approaches, and a detailed experimental evaluation to summarize the effectiveness of each approach is presented. Finally, the thesis proposes an ensemble classifier model composed of multiple classifiers with different learning paradigms to address the issue of detection accuracy and false alarm rate in intrusion detection systems

    Analysis of new control applications

    Get PDF
    This document reports the results of the activities performed during the first year of the CRUTIAL project, within the Work Package 1 "Identification and description of Control System Scenarios". It represents the outcome of the analysis of new control applications in the Power System and the identification of critical control system scenarios to be explored by the CRUTIAL project

    Characterizing the IoT ecosystem at scale

    Get PDF
    Internet of Things (IoT) devices are extremely popular with home, business, and industrial users. To provide their services, they typically rely on a backend server in- frastructure on the Internet, which collectively form the IoT Ecosystem. This ecosys- tem is rapidly growing and offers users an increasing number of services. It also has been a source and target of significant security and privacy risks. One notable exam- ple is the recent large-scale coordinated global attacks, like Mirai, which disrupted large service providers. Thus, characterizing this ecosystem yields insights that help end-users, network operators, policymakers, and researchers better understand it, obtain a detailed view, and keep track of its evolution. In addition, they can use these insights to inform their decision-making process for mitigating this ecosystem’s security and privacy risks. In this dissertation, we characterize the IoT ecosystem at scale by (i) detecting the IoT devices in the wild, (ii) conducting a case study to measure how deployed IoT devices can affect users’ privacy, and (iii) detecting and measuring the IoT backend infrastructure. To conduct our studies, we collaborated with a large European Internet Service Provider (ISP) and a major European Internet eXchange Point (IXP). They rou- tinely collect large volumes of passive, sampled data, e.g., NetFlow and IPFIX, for their operational purposes. These data sources help providers obtain insights about their networks, and we used them to characterize the IoT ecosystem at scale. We start with IoT devices and study how to track and trace their activity in the wild. We developed and evaluated a scalable methodology to accurately detect and monitor IoT devices with limited, sparsely sampled data in the ISP and IXP. Next, we conduct a case study to measure how a myriad of deployed devices can affect the privacy of ISP subscribers. Unfortunately, we found that the privacy of a substantial fraction of IPv6 end-users is at risk. We noticed that a single device at home that encodes its MAC address into the IPv6 address could be utilized as a tracking identifier for the entire end-user prefix—even if other devices use IPv6 privacy extensions. Our results showed that IoT devices contribute the most to this privacy leakage. Finally, we focus on the backend server infrastructure and propose a methodology to identify and locate IoT backend servers operated by cloud services and IoT vendors. We analyzed their IoT traffic patterns as observed in the ISP. Our analysis sheds light on their diverse operational and deployment strategies. The need for issuing a priori unknown network-wide queries against large volumes of network flow capture data, which we used in our studies, motivated us to develop Flowyager. It is a system built on top of existing traffic capture utilities, and it relies on flow summarization techniques to reduce (i) the storage and transfer cost of flow captures and (ii) query response time. We deployed a prototype of Flowyager at both the IXP and ISP.Internet-of-Things-Geräte (IoT) sind aus vielen Haushalten, Büroräumen und In- dustrieanlagen nicht mehr wegzudenken. Um ihre Dienste zu erbringen, nutzen IoT- Geräte typischerweise auf eine Backend-Server-Infrastruktur im Internet, welche als Gesamtheit das IoT-Ökosystem bildet. Dieses Ökosystem wächst rapide an und bie- tet den Nutzern immer mehr Dienste an. Das IoT-Ökosystem ist jedoch sowohl eine Quelle als auch ein Ziel von signifikanten Risiken für die Sicherheit und Privatsphäre. Ein bemerkenswertes Beispiel sind die jüngsten groß angelegten, koordinierten globa- len Angriffe wie Mirai, durch die große Diensteanbieter gestört haben. Deshalb ist es wichtig, dieses Ökosystem zu charakterisieren, eine ganzheitliche Sicht zu bekommen und die Entwicklung zu verfolgen, damit Forscher, Entscheidungsträger, Endnutzer und Netzwerkbetreibern Einblicke und ein besseres Verständnis erlangen. Außerdem können alle Teilnehmer des Ökosystems diese Erkenntnisse nutzen, um ihre Entschei- dungsprozesse zur Verhinderung von Sicherheits- und Privatsphärerisiken zu verbes- sern. In dieser Dissertation charakterisieren wir die Gesamtheit des IoT-Ökosystems indem wir (i) IoT-Geräte im Internet detektieren, (ii) eine Fallstudie zum Einfluss von benutzten IoT-Geräten auf die Privatsphäre von Nutzern durchführen und (iii) die IoT-Backend-Infrastruktur aufdecken und vermessen. Um unsere Studien durchzuführen, arbeiten wir mit einem großen europäischen Internet- Service-Provider (ISP) und einem großen europäischen Internet-Exchange-Point (IXP) zusammen. Diese sammeln routinemäßig für operative Zwecke große Mengen an pas- siven gesampelten Daten (z.B. als NetFlow oder IPFIX). Diese Datenquellen helfen Netzwerkbetreibern Einblicke in ihre Netzwerke zu erlangen und wir verwendeten sie, um das IoT-Ökosystem ganzheitlich zu charakterisieren. Wir beginnen unsere Analysen mit IoT-Geräten und untersuchen, wie diese im Inter- net aufgespürt und verfolgt werden können. Dazu entwickelten und evaluierten wir eine skalierbare Methodik, um IoT-Geräte mit Hilfe von eingeschränkten gesampelten Daten des ISPs und IXPs präzise erkennen und beobachten können. Als Nächstes führen wir eine Fallstudie durch, in der wir messen, wie eine Unzahl von eingesetzten Geräten die Privatsphäre von ISP-Nutzern beeinflussen kann. Lei- der fanden wir heraus, dass die Privatsphäre eines substantiellen Teils von IPv6- Endnutzern bedroht ist. Wir entdeckten, dass bereits ein einzelnes Gerät im Haus, welches seine MAC-Adresse in die IPv6-Adresse kodiert, als Tracking-Identifikator für das gesamte Endnutzer-Präfix missbraucht werden kann — auch wenn andere Geräte IPv6-Privacy-Extensions verwenden. Unsere Ergebnisse zeigten, dass IoT-Geräte den Großteil dieses Privatsphäre-Verlusts verursachen. Abschließend fokussieren wir uns auf die Backend-Server-Infrastruktur und wir schla- gen eine Methodik zur Identifizierung und Lokalisierung von IoT-Backend-Servern vor, welche von Cloud-Diensten und IoT-Herstellern betrieben wird. Wir analysier- ten Muster im IoT-Verkehr, der vom ISP beobachtet wird. Unsere Analyse gibt Auf- schluss über die unterschiedlichen Strategien, wie IoT-Backend-Server betrieben und eingesetzt werden. Die Notwendigkeit a-priori unbekannte netzwerkweite Anfragen an große Mengen von Netzwerk-Flow-Daten zu stellen, welche wir in in unseren Studien verwenden, moti- vierte uns zur Entwicklung von Flowyager. Dies ist ein auf bestehenden Netzwerkverkehrs- Tools aufbauendes System und es stützt sich auf die Zusammenfassung von Verkehrs- flüssen, um (i) die Kosten für Archivierung und Transfer von Flow-Daten und (ii) die Antwortzeit von Anfragen zu reduzieren. Wir setzten einen Prototypen von Flowyager sowohl im IXP als auch im ISP ein

    Architectures and GPU-Based Parallelization for Online Bayesian Computational Statistics and Dynamic Modeling

    Get PDF
    Recent work demonstrates that coupling Bayesian computational statistics methods with dynamic models can facilitate the analysis of complex systems associated with diverse time series, including those involving social and behavioural dynamics. Particle Markov Chain Monte Carlo (PMCMC) methods constitute a particularly powerful class of Bayesian methods combining aspects of batch Markov Chain Monte Carlo (MCMC) and the sequential Monte Carlo method of Particle Filtering (PF). PMCMC can flexibly combine theory-capturing dynamic models with diverse empirical data. Online machine learning is a subcategory of machine learning algorithms characterized by sequential, incremental execution as new data arrives, which can give updated results and predictions with growing sequences of available incoming data. While many machine learning and statistical methods are adapted to online algorithms, PMCMC is one example of the many methods whose compatibility with and adaption to online learning remains unclear. In this thesis, I proposed a data-streaming solution supporting PF and PMCMC methods with dynamic epidemiological models and demonstrated several successful applications. By constructing an automated, easy-to-use streaming system, analytic applications and simulation models gain access to arriving real-time data to shorten the time gap between data and resulting model-supported insight. The well-defined architecture design emerging from the thesis would substantially expand traditional simulation models' potential by allowing such models to be offered as continually updated services. Contingent on sufficiently fast execution time, simulation models within this framework can consume the incoming empirical data in real-time and generate informative predictions on an ongoing basis as new data points arrive. In a second line of work, I investigated the platform's flexibility and capability by extending this system to support the use of a powerful class of PMCMC algorithms with dynamic models while ameliorating such algorithms' traditionally stiff performance limitations. Specifically, this work designed and implemented a GPU-enabled parallel version of a PMCMC method with dynamic simulation models. The resulting codebase readily has enabled researchers to adapt their models to the state-of-art statistical inference methods, and ensure that the computation-heavy PMCMC method can perform significant sampling between the successive arrival of each new data point. Investigating this method's impact with several realistic PMCMC application examples showed that GPU-based acceleration allows for up to 160x speedup compared to a corresponding CPU-based version not exploiting parallelism. The GPU accelerated PMCMC and the streaming processing system can complement each other, jointly providing researchers with a powerful toolset to greatly accelerate learning and securing additional insight from the high-velocity data increasingly prevalent within social and behavioural spheres. The design philosophy applied supported a platform with broad generalizability and potential for ready future extensions. The thesis discusses common barriers and difficulties in designing and implementing such systems and offers solutions to solve or mitigate them

    Towards secure message systems

    Get PDF
    Message systems, which transfer information from sender to recipient via communication networks, are indispensable to our modern society. The enormous user base of message systems and their critical role in information delivery make it the top priority to secure message systems. This dissertation focuses on securing the two most representative and dominant messages systems---e-mail and instant messaging (IM)---from two complementary aspects: defending against unwanted messages and ensuring reliable delivery of wanted messages.;To curtail unwanted messages and protect e-mail and instant messaging users, this dissertation proposes two mechanisms DBSpam and HoneyIM, which can effectively thwart e-mail spam laundering and foil malicious instant message spreading, respectively. DBSpam exploits the distinct characteristics of connection correlation and packet symmetry embedded in the behavior of spam laundering and utilizes a simple statistical method, Sequential Probability Ratio Test, to detect and break spam laundering activities inside a customer network in a timely manner. The experimental results demonstrate that DBSpam is effective in quickly and accurately capturing and suppressing e-mail spam laundering activities and is capable of coping with high speed network traffic. HoneyIM leverages the inherent characteristic of spreading of IM malware and applies the honey-pot technology to the detection of malicious instant messages. More specifically, HoneyIM uses decoy accounts in normal users\u27 contact lists as honey-pots to capture malicious messages sent by IM malware and suppresses the spread of malicious instant messages by performing network-wide blocking. The efficacy of HoneyIM has been validated through both simulations and real experiments.;To improve e-mail reliability, that is, prevent losses of wanted e-mail, this dissertation proposes a collaboration-based autonomous e-mail reputation system called CARE. CARE introduces inter-domain collaboration without central authority or third party and enables each e-mail service provider to independently build its reputation database, including frequently contacted and unacquainted sending domains, based on the local e-mail history and the information exchanged with other collaborating domains. The effectiveness of CARE on improving e-mail reliability has been validated through a number of experiments, including a comparison of two large e-mail log traces from two universities, a real experiment of DNS snooping on more than 36,000 domains, and extensive simulation experiments in a large-scale environment

    Event detection in high throughput social media

    Get PDF

    Urban Deformation Monitoring using Persistent Scatterer Interferometry and SAR tomography

    Get PDF
    This book focuses on remote sensing for urban deformation monitoring. In particular, it highlights how deformation monitoring in urban areas can be carried out using Persistent Scatterer Interferometry (PSI) and Synthetic Aperture Radar (SAR) Tomography (TomoSAR). Several contributions show the capabilities of Interferometric SAR (InSAR) and PSI techniques for urban deformation monitoring. Some of them show the advantages of TomoSAR in un-mixing multiple scatterers for urban mapping and monitoring. This book is dedicated to the technical and scientific community interested in urban applications. It is useful for choosing the appropriate technique and gaining an assessment of the expected performance. The book will also be useful to researchers, as it provides information on the state-of-the-art and new trends in this fiel
    • …
    corecore