398 research outputs found

    BGP Anomaly Detection with Balanced Datasets

    Get PDF
    We use machine learning techniques to build predictive models for anomaly detection in the Border Gateway Protocol (BGP). Imbalanced datasets of network anomalies pose limitations to building predictive models for anomaly detection. In order to achieve better classification performance measures, we use resampling methods to balance classes in the datasets. We use undersampling, oversampling and combination techniques to change class distributions of the datasets. In this paper we build predictive models based on preprocessed network anomaly datasets of known Internet network anomalies and observe improvement in classifier performance measures compared to those reported in our previous work. We propose to use resampling combination techniques on datasets along with Decision Tree and NaĂŻve Bayes classifiers in order to achieve the best trade-off between (1) the F-measure and the length of model training time, and (2) avoiding overfitting and loss of information

    BGP Hijacking Classification

    Get PDF
    Recent reports show that BGP hijacking has increased substantially. BGP hijacking allows malicious ASes to obtain IP prefixes for spamming as well as intercepting or blackholing traffic. While systems to prevent hijacks are hard to deploy and require the cooperation of many other organizations, techniques to detect hijacks have been a popular area of study. In this paper, we classify detected hijack events in order to document BGP detectors output and understand the nature of reported events. We introduce four categories of BGP hijack: typos, prepending mistakes, origin changes, and forged AS paths. We leverage AS hegemony-a measure of dependency in AS relationship-to identify forged AS paths in a fast and efficient way. Besides, we utilize heuristic approaches to find common operators\u27 mistakes such as typos and AS prepending mistakes. The proposed approach classifies our collected ground truth into four categories with 95.71% accuracy. We characterize publicly reported alarms (e.g. BGPMon) with our trained classifier and find 4%, 1%, and 2% of typos, prepend mistakes, and BGP hijacking with a forged AS path, respectively

    The BGP Visibility Toolkit: detecting anomalous internet routing behavior

    Get PDF
    In this paper, we propose the BGP Visibility Toolkit, a system for detecting and analyzing anomalous behavior in the Internet. We show that interdomain prefix visibility can be used to single out cases of erroneous demeanors resulting from misconfiguration or bogus routing policies. The implementation of routing policies with BGP is a complicated process, involving fine-tuning operations and interactions with the policies of the other active ASes. Network operators might end up with faulty configurations or unintended routing policies that prevent the success of their strategies and impact their revenues. As part of the Visibility Toolkit, we propose the BGP Visibility Scanner, a tool which identifies limited visibility prefixes in the Internet. The tool enables operators to provide feedback on the expected visibility status of prefixes. We build a unique set of ground-truth prefixes qualified by their ASes as intended or unintended to have limited visibility. Using a machine learning algorithm, we train on this unique dataset an alarm system that separates with 95% accuracy the prefixes with unintended limited visibility. Hence, we find that visibility features are generally powerful to detect prefixes which are suffering from inadvertent effects of routing policies. Limited visibility could render a whole prefix globally unreachable. This points towards a serious problem, as limited reachability of a non-negligible set of prefixes undermines the global connectivity of the Internet. We thus verify the correlation between global visibility and global connectivity of prefixes.This work was sup-ported in part by the European Community's Seventh Framework Programme (FP7/2007-2013) under Grant 317647 (Leone)

    On the cyber security issues of the internet infrastructure

    Get PDF
    The Internet network has received huge attentions by the research community. At a first glance, the network optimization and scalability issues dominate the efforts of researchers and vendors. Many results have been obtained in the last decades: the Internet’s architecture is optimized to be cheap, robust and ubiquitous. In contrast, such a network has never been perfectly secure. During all its evolution, the security threats of the Internet persist as a transversal and endless topic. Nowadays, the Internet network hosts a multitude of mission critical activities. The electronic voting systems and financial services are carried out through it. Governmental institutions, financial and business organizations depend on the performance and the security of the Internet. This role confers to the Internet network a critical characterization. At the same time, the Internet network is a vector of malicious activities, like Denial of Service attacks; many reports of attacks can be found in both academic outcomes and daily news. In order to mitigate this wide range of issues, many research efforts have been carried out in the past decades; unfortunately, the complex architecture and the scale of the Internet make hard the evaluation and the adoption of such proposals. In order to improve the security of the Internet, the research community can benefit from sharing real network data. Unfortunately, privacy and security concerns inhibit the release of these data: its suffices to imagine the big amount of private information (e.g., political preferences or religious belief) it is possible to get while reading the Internet packets exchanged between users and web services. This scenario motivates my research, and represents the context of this dissertation which contributes to the analysis of the security issues of the Internet infrastructures and describes relevant security proposals. In particular, the main outcomes described in this dissertation are: • the definition of a secure routing protocol for the Internet network able to provide cryptographic guarantees against false route announcement and invalid path attack; • the definition of a new obfuscation technique that allow the research community to publicly release their real network flows with formal guarantees of security and privacy; • the evidence of a new kind of leakage of sensitive informations obtained hacking the models used by sundry Machine Learning Algorithms

    CHASING THE UNKNOWN: A PREDICTIVE MODEL TO DEMYSTIFY BGP COMMUNITY SEMANTICS

    Get PDF
    The Border Gateway Protocol (BGP) specifies an optional communities attribute for traffic engineering, route manipulation, remotely-triggered blackholing, and other services. However, communities have neither unifying semantics nor cryptographic protections and often propagate much farther than intended. Consequently, Autonomous System (AS) operators are free to define their own community values. This research is a proof-of-concept for a machine learning approach to prediction of community semantics; it attempts a quantitative measurement of semantic predictability between different AS semantic schemata. Ground-truth community semantics data were collated and manually labeled according to a unified taxonomy of community services. Various classification algorithms, including a feed-forward Multi-Layer Perceptron and a Random Forest, were used as the estimator for a One-vs-All multi-class model and trained according to a feature set engineered from this data. The best model's performance on the test set indicates as much as 89.15% of these semantics can be accurately predicted according to a proposed standard taxonomy of community services. This model was additionally applied to historical BGP data from various route collectors to estimate the taxonomic distribution of communities transiting the control plane.http://archive.org/details/chasingtheunknow1094566047Outstanding ThesisCivilian, CyberCorps - Scholarship For ServiceApproved for public release. distribution is unlimite

    Machine Learning and Big Data Methodologies for Network Traffic Monitoring

    Get PDF
    Over the past 20 years, the Internet saw an exponential grown of traffic, users, services and applications. Currently, it is estimated that the Internet is used everyday by more than 3.6 billions users, who generate 20 TB of traffic per second. Such a huge amount of data challenge network managers and analysts to understand how the network is performing, how users are accessing resources, how to properly control and manage the infrastructure, and how to detect possible threats. Along with mathematical, statistical, and set theory methodologies machine learning and big data approaches have emerged to build systems that aim at automatically extracting information from the raw data that the network monitoring infrastructures offer. In this thesis I will address different network monitoring solutions, evaluating several methodologies and scenarios. I will show how following a common workflow, it is possible to exploit mathematical, statistical, set theory, and machine learning methodologies to extract meaningful information from the raw data. Particular attention will be given to machine learning and big data methodologies such as DBSCAN, and the Apache Spark big data framework. The results show that despite being able to take advantage of mathematical, statistical, and set theory tools to characterize a problem, machine learning methodologies are very useful to discover hidden information about the raw data. Using DBSCAN clustering algorithm, I will show how to use YouLighter, an unsupervised methodology to group caches serving YouTube traffic into edge-nodes, and latter by using the notion of Pattern Dissimilarity, how to identify changes in their usage over time. By using YouLighter over 10-month long races, I will pinpoint sudden changes in the YouTube edge-nodes usage, changes that also impair the end users’ Quality of Experience. I will also apply DBSCAN in the deployment of SeLINA, a self-tuning tool implemented in the Apache Spark big data framework to autonomously extract knowledge from network traffic measurements. By using SeLINA, I will show how to automatically detect the changes of the YouTube CDN previously highlighted by YouLighter. Along with these machine learning studies, I will show how to use mathematical and set theory methodologies to investigate the browsing habits of Internauts. By using a two weeks dataset, I will show how over this period, the Internauts continue discovering new websites. Moreover, I will show that by using only DNS information to build a profile, it is hard to build a reliable profiler. Instead, by exploiting mathematical and statistical tools, I will show how to characterize Anycast-enabled CDNs (A-CDNs). I will show that A-CDNs are widely used either for stateless and stateful services. That A-CDNs are quite popular, as, more than 50% of web users contact an A-CDN every day. And that, stateful services, can benefit of A-CDNs, since their paths are very stable over time, as demonstrated by the presence of only a few anomalies in their Round Trip Time. Finally, I will conclude by showing how I used BGPStream an open-source software framework for the analysis of both historical and real-time Border Gateway Protocol (BGP) measurement data. By using BGPStream in real-time mode I will show how I detected a Multiple Origin AS (MOAS) event, and how I studies the black-holing community propagation, showing the effect of this community in the network. Then, by using BGPStream in historical mode, and the Apache Spark big data framework over 16 years of data, I will show different results such as the continuous growth of IPv4 prefixes, and the growth of MOAS events over time. All these studies have the aim of showing how monitoring is a fundamental task in different scenarios. In particular, highlighting the importance of machine learning and of big data methodologies

    A Neural Network Approach to Border Gateway Protocol Peer Failure Detection and Prediction

    Get PDF
    The size and speed of computer networks continue to expand at a rapid pace, as do the corresponding errors, failures, and faults inherent within such extensive networks. This thesis introduces a novel approach to interface Border Gateway Protocol (BGP) computer networks with neural networks to learn the precursor connectivity patterns that emerge prior to a node failure. Details of the design and construction of a framework that utilizes neural networks to learn and monitor BGP connection states as a means of detecting and predicting BGP peer node failure are presented. Moreover, this framework is used to monitor a BGP network and a suite of tests are conducted to establish that this neural network approach as a viable strategy for predicting BGP peer node failure. For all performed experiments both of the proposed neural network architectures succeed in memorizing and utilizing the network connectivity patterns. Lastly, a discussion of this framework\u27s generic design is presented to acknowledge how other types of networks and alternate machine learning techniques can be accommodated with relative ease
    • …
    corecore