418 research outputs found

    Adaptive Traffic Fingerprinting for Darknet Threat Intelligence

    Full text link
    Darknet technology such as Tor has been used by various threat actors for organising illegal activities and data exfiltration. As such, there is a case for organisations to block such traffic, or to try and identify when it is used and for what purposes. However, anonymity in cyberspace has always been a domain of conflicting interests. While it gives enough power to nefarious actors to masquerade their illegal activities, it is also the cornerstone to facilitate freedom of speech and privacy. We present a proof of concept for a novel algorithm that could form the fundamental pillar of a darknet-capable Cyber Threat Intelligence platform. The solution can reduce anonymity of users of Tor, and considers the existing visibility of network traffic before optionally initiating targeted or widespread BGP interception. In combination with server HTTP response manipulation, the algorithm attempts to reduce the candidate data set to eliminate client-side traffic that is most unlikely to be responsible for server-side connections of interest. Our test results show that MITM manipulated server responses lead to expected changes received by the Tor client. Using simulation data generated by shadow, we show that the detection scheme is effective with false positive rate of 0.001, while sensitivity detecting non-targets was 0.016+-0.127. Our algorithm could assist collaborating organisations willing to share their threat intelligence or cooperate during investigations.Comment: 26 page

    Machine Learning and Big Data Methodologies for Network Traffic Monitoring

    Get PDF
    Over the past 20 years, the Internet saw an exponential grown of traffic, users, services and applications. Currently, it is estimated that the Internet is used everyday by more than 3.6 billions users, who generate 20 TB of traffic per second. Such a huge amount of data challenge network managers and analysts to understand how the network is performing, how users are accessing resources, how to properly control and manage the infrastructure, and how to detect possible threats. Along with mathematical, statistical, and set theory methodologies machine learning and big data approaches have emerged to build systems that aim at automatically extracting information from the raw data that the network monitoring infrastructures offer. In this thesis I will address different network monitoring solutions, evaluating several methodologies and scenarios. I will show how following a common workflow, it is possible to exploit mathematical, statistical, set theory, and machine learning methodologies to extract meaningful information from the raw data. Particular attention will be given to machine learning and big data methodologies such as DBSCAN, and the Apache Spark big data framework. The results show that despite being able to take advantage of mathematical, statistical, and set theory tools to characterize a problem, machine learning methodologies are very useful to discover hidden information about the raw data. Using DBSCAN clustering algorithm, I will show how to use YouLighter, an unsupervised methodology to group caches serving YouTube traffic into edge-nodes, and latter by using the notion of Pattern Dissimilarity, how to identify changes in their usage over time. By using YouLighter over 10-month long races, I will pinpoint sudden changes in the YouTube edge-nodes usage, changes that also impair the end users’ Quality of Experience. I will also apply DBSCAN in the deployment of SeLINA, a self-tuning tool implemented in the Apache Spark big data framework to autonomously extract knowledge from network traffic measurements. By using SeLINA, I will show how to automatically detect the changes of the YouTube CDN previously highlighted by YouLighter. Along with these machine learning studies, I will show how to use mathematical and set theory methodologies to investigate the browsing habits of Internauts. By using a two weeks dataset, I will show how over this period, the Internauts continue discovering new websites. Moreover, I will show that by using only DNS information to build a profile, it is hard to build a reliable profiler. Instead, by exploiting mathematical and statistical tools, I will show how to characterize Anycast-enabled CDNs (A-CDNs). I will show that A-CDNs are widely used either for stateless and stateful services. That A-CDNs are quite popular, as, more than 50% of web users contact an A-CDN every day. And that, stateful services, can benefit of A-CDNs, since their paths are very stable over time, as demonstrated by the presence of only a few anomalies in their Round Trip Time. Finally, I will conclude by showing how I used BGPStream an open-source software framework for the analysis of both historical and real-time Border Gateway Protocol (BGP) measurement data. By using BGPStream in real-time mode I will show how I detected a Multiple Origin AS (MOAS) event, and how I studies the black-holing community propagation, showing the effect of this community in the network. Then, by using BGPStream in historical mode, and the Apache Spark big data framework over 16 years of data, I will show different results such as the continuous growth of IPv4 prefixes, and the growth of MOAS events over time. All these studies have the aim of showing how monitoring is a fundamental task in different scenarios. In particular, highlighting the importance of machine learning and of big data methodologies

    Matrix profile data mining for BGP anomaly detection

    Get PDF
    The Border Gateway Protocol (BGP), acting as the communication protocol that binds the Internet, remains vulnerable despite Internet security advancements. This is not surprising, as the Internet was not designed to be resilient to cyber-attacks, therefore the detection of anomalous activity was not of prime importance to the Internet creators. Detection of BGP anomalies can potentially provide network operators with an early warning system to focus on protecting networks, systems, and infrastructure from significant impact, improve security posture and resilience, while ultimately contributing to a secure global Internet environment. In this paper, we present a novel technique for the detection of BGP anomalies in different events. This research uses publicly available datasets of BGP messages collected from the repositories, Route Views and Réseaux IP Européens (RIPE). Our contribution is the application of a time series data mining approach, Matrix Profile (MP), to detect BGP anomalies in all categories of BGP events. Advantages of the MP detection technique compared to extant approaches include that it is domain agnostic, is assumption-free, requires few parameters, does not require training data, and is scalable and storage efficient. The single hyper-parameter analyzed in MP shows it is robust to change. Our results indicate the MP detection scheme is competitive against existing detection schemes. A novel BGP anomaly detection scheme is also proposed for further research and validation

    Border Gateway Protocol Anomaly Detection Using Machine Learning Techniques

    Get PDF
    As the primary protocol used to exchange routing information between network domains, Border Gateway Protocol (BGP) plays a central role in the functioning of the Internet. Border Gateway Protocol is a standardized router protocol used to initiate and maintain communication between domains, or autonomous systems, on the Internet. This protocol can exhibit anomalous behavior caused by improper provisioning, malicious attacks, traffic or equipment failure, and network operator error. At large internet service providers, many BGP issues are not immediately seen or explicitly monitored by network operations centers. This possible blind spot is due to the enormous number of BGP handshakes that occur throughout the network along with the fact that there are many of these sub-interfaces associated to a single physical connection. We will present machine learning methods for anomaly detection using unsupervised learning techniques and create a data pipeline to quickly collect and trigger on these anomalies when they occur. Clustering techniques including k-means and DBSCAN were successfully implemented and able to detect known anomalies for historical events. This approach could incur soft savings by triggering early detection warnings of anomalous BGP events, but human intervention may still be required in order to address possible false positives

    CHASING THE UNKNOWN: A PREDICTIVE MODEL TO DEMYSTIFY BGP COMMUNITY SEMANTICS

    Get PDF
    The Border Gateway Protocol (BGP) specifies an optional communities attribute for traffic engineering, route manipulation, remotely-triggered blackholing, and other services. However, communities have neither unifying semantics nor cryptographic protections and often propagate much farther than intended. Consequently, Autonomous System (AS) operators are free to define their own community values. This research is a proof-of-concept for a machine learning approach to prediction of community semantics; it attempts a quantitative measurement of semantic predictability between different AS semantic schemata. Ground-truth community semantics data were collated and manually labeled according to a unified taxonomy of community services. Various classification algorithms, including a feed-forward Multi-Layer Perceptron and a Random Forest, were used as the estimator for a One-vs-All multi-class model and trained according to a feature set engineered from this data. The best model's performance on the test set indicates as much as 89.15% of these semantics can be accurately predicted according to a proposed standard taxonomy of community services. This model was additionally applied to historical BGP data from various route collectors to estimate the taxonomic distribution of communities transiting the control plane.http://archive.org/details/chasingtheunknow1094566047Outstanding ThesisCivilian, CyberCorps - Scholarship For ServiceApproved for public release. distribution is unlimite

    Why (and How) Networks Should Run Themselves

    Full text link
    The proliferation of networked devices, systems, and applications that we depend on every day makes managing networks more important than ever. The increasing security, availability, and performance demands of these applications suggest that these increasingly difficult network management problems be solved in real time, across a complex web of interacting protocols and systems. Alas, just as the importance of network management has increased, the network has grown so complex that it is seemingly unmanageable. In this new era, network management requires a fundamentally new approach. Instead of optimizations based on closed-form analysis of individual protocols, network operators need data-driven, machine-learning-based models of end-to-end and application performance based on high-level policy goals and a holistic view of the underlying components. Instead of anomaly detection algorithms that operate on offline analysis of network traces, operators need classification and detection algorithms that can make real-time, closed-loop decisions. Networks should learn to drive themselves. This paper explores this concept, discussing how we might attain this ambitious goal by more closely coupling measurement with real-time control and by relying on learning for inference and prediction about a networked application or system, as opposed to closed-form analysis of individual protocols
    • …
    corecore