127 research outputs found

    Deep Mining Port Scans from Darknet

    Get PDF
    International audienceTCP/UDP port scanning or sweeping is one of the most common technique used by attackers to discover accessible and potentially vulnerable hosts and applications. Although extracting and distinguishing different port scanning strategies is a challenging task, the identification of dependencies among probed ports is primordial for profiling attacker behaviors, with as a final goal to better mitigate them. In this paper, we propose an approach that allows to track port scanning behavior patterns among multiple probed ports and identify intrinsic properties of observed group of ports. Our method is fully automated based on graph modeling and data mining techniques including text mining. It provides to security analysts and operators relevant information about services that are jointly targeted by attackers. This is helpful to assess the strategy of the attacker, such that understanding the types of applications or environment she targets. We applied our method to data collected through a large Internet telescope (or Darknet)

    Sensing the Noise: Uncovering Communities in Darknet Traffic

    Get PDF
    Darknets are ranges of IP addresses advertised without answering any traffic. Darknets help to uncover inter- esting network events, such as misconfigurations and network scans. Interpreting darknet traffic helps against cyber-attacks – e.g., malware often reaches darknets when scanning the Internet for vulnerable devices. The traffic reaching darknets is however voluminous and noisy, which calls for efficient ways to represent the data and highlight possibly important events. This paper evaluates a methodology to summarize packets reaching darknets. We represent the darknet activity as a graph, which captures remote hosts contacting the darknet nodes ports, as well as the frequency at which each port is reached. From these representations, we apply community detection algorithms in the search for patterns that could represent coordinated activity. By highlighting such activities we are able to group together, for example, groups of IP addresses that predominantly engage in contacting specific targets, or, vice versa, to identify targets which are frequently contacted together, for exploiting the vulnerabilities of a given service. The network analyst can recognize from the community detection results, for example, that a group of hosts has been infected by a botnet and it is currently scanning the network in search of vulnerable services (e.g., SSH and Telnet among the most commonly targeted). Such piece of information is impossible to obtain when analyzing the behavior of single sources, or packets one by one. All in all, our work is a first step towards a comprehensive aggregation methodology to automate the analysis of darknet traffic, a fundamental aspect for the recognition of coordinated and anomalous events

    i-DarkVec: Incremental Embeddings for Darknet Traffic Analysis

    Get PDF
    Darknets are probes listening to traffic reaching IP addresses that host no services. Traffic reaching a darknet results from the actions of internet scanners, botnets, and possibly misconfigured hosts. Such peculiar nature of the darknet traffic makes darknets a valuable instrument to discover malicious online activities, e.g., identifying coordinated actions performed by bots or scanners. However, the massive amount of packets and sources that darknets observe makes it hard to extract meaningful insights, calling for scalable tools to automatically identify and group sources that share similar behaviour. We here present i-DarkVec, a methodology to learn meaningful representations of Darknet traffic. i-DarkVec leverages Natural Language Processing techniques (e.g., Word2Vec) to capture the co-occurrence patterns that emerge when scanners or bots launch coordinated actions. As in NLP problems, the embeddings learned with i-DarkVec enable several new machine learning tasks on the darknet traffic, such as identifying clusters of senders engaged in similar activities. We extensively test i-DarkVec and explore its design space in a case study using real darknets. We show that with a proper definition of services, the learned embeddings can be used to (i) solve the classification problem to associate unknown sources’ IP addresses to the correct classes of coordinated actors and (ii) automatically identify clusters of previously unknown sources performing similar attacks and scans, easing the security analyst’s job. i-DarkVec leverages a novel incremental embedding learning approach that is scalable and robust to traffic changes, making it applicable to dynamic and large-scale scenarios

    DarkVec: Automatic analysis of darknet traffic with word embeddings

    Get PDF

    DarkVec: automatic analysis of darknet traffic with word embeddings

    Get PDF
    Darknets are passive probes listening to traffic reaching IP addresses that host no services. Traffic reaching them is unsolicited by nature and often induced by scanners, malicious senders and misconfigured hosts. Its peculiar nature makes it a valuable source of information to learn about malicious activities. However, the massive amount of packets and sources that reach darknets makes it hard to extract meaningful insights. In particular, multiple senders contact the darknet while performing similar and coordinated tasks, which are often commanded by common controllers (botnets, crawlers, etc.). How to automatically identify and group those senders that share similar behaviors remains an open problem. We here introduce DarkVec, a methodology to identify clusters of senders (i.e., IP addresses) engaged in similar activities on darknets. DarkVec leverages word embedding techniques (e.g., Word2Vec) to capture the co-occurrence patterns of sources hitting the darknets. We extensively test DarkVec and explore its design space in a case study using one month of darknet data. We show that with a proper definition of service, the generated embeddings can be easily used to (i) associate unknown senders' IP addresses to the correct known labels (more than 96% accuracy), and (ii) identify new attack and scan groups of previously unknown senders. We contribute DarkVec source code and datasets to the community also to stimulate the use of word embeddings to automatically learn patterns on generic traffic traces

    The New Abnormal: Network Anomalies in the AI Era

    Get PDF
    Anomaly detection aims at finding unexpected patterns in data. It has been used in several problems in computer networks, from the detection of port scans and DDoS attacks to the monitoring of time-series collected from Internet monitoring systems. Data-driven approaches and machine learning have seen widespread application on anomaly detection too, and this trend has been accelerated by the recent developments on Artificial Intelligence research. This chapter summarizes ongoing recent progresses on anomaly detection research. In particular, we evaluate how developments on AI algorithms bring new possibilities for anomaly detection. We cover new representation learning techniques such as Generative Artificial Networks and Autoencoders, as well as techniques that can be used to improve models learned with machine learning algorithms, such as reinforcement learning. We survey both research works and tools implementing AI algorithms for anomaly detection. We found that the novel algorithms, while successful in other fields, have hardly been applied to networking problems. We conclude the chapter with a case study that illustrates a possible research direction

    Internet and Tor Traffic Classification Using Machine Learning

    Get PDF
    Privacy has always been a concern over the internet. A new wave of privacy networks struck the world in 2002 when the TOR Project was released to the public. The core principle of TOR, popularly known as the onion routing protocol, was developed by the ‘United States Naval Research Laboratory’ in the mid-1990s. It was further developed by ‘Defense Advanced Research Projects Agency’. The project that started as an attempt to create a secured communication network for the U.S. Intelligence was soon released as a general anonymous network. These anonymous networks are run with the help of volunteers that serve the physical need of the network, while the software fills up the gaps using encryption algorithms. Fundamentally, the volunteers along with the encryption algorithms are the network. Once a part of such a network, the identity, and activity of a user is invisible. The users remain completely anonymous over the network if they follow a few steps and rules. As of December 2017, there are more than 3 million TOR users as per the TOR Project’s website. Today, the anonymous web is used by people of all kinds. While, some just want to use it to make sure nobody could possibly spy on them, others are also using it to buy and sell things. Thus, functioning as a censorship-resistant peer-to-peer network. Through this thesis, we propose a novel approach to identifying traffic and without sacrificing the privacy of the Tor nodes or clients. We recorded traffic over our own Tor Exit and Middle nodes to train Decision Tree classifiers to identify and differentiate between different types of traffic. Our classifiers can accurately differentiate between regular internet and Tor traffic while can also be combined together for detailed classification. These classifiers can be used to selectively drop traffic on a Tor node, giving more control to the users while providing scope for censorship

    Data-driven curation, learning and analysis for inferring evolving IoT botnets in the wild

    Get PDF
    The insecurity of the Internet-of-Things (IoT) paradigm continues to wreak havoc in consumer and critical infrastructure realms. Several challenges impede addressing IoT security at large, including, the lack of IoT-centric data that can be collected, analyzed and correlated, due to the highly heterogeneous nature of such devices and their widespread deployments in Internet-wide environments. To this end, this paper explores macroscopic, passive empirical data to shed light on this evolving threat phenomena. This not only aims at classifying and inferring Internet-scale compromised IoT devices by solely observing such one-way network traffic, but also endeavors to uncover, track and report on orchestrated "in the wild" IoT botnets. Initially, to prepare the effective utilization of such data, a novel probabilistic model is designed and developed to cleanse such traffic from noise samples (i.e., misconfiguration traffic). Subsequently, several shallow and deep learning models are evaluated to ultimately design and develop a multi-window convolution neural network trained on active and passive measurements to accurately identify compromised IoT devices. Consequently, to infer orchestrated and unsolicited activities that have been generated by well-coordinated IoT botnets, hierarchical agglomerative clustering is deployed by scrutinizing a set of innovative and efficient network feature sets. By analyzing 3.6 TB of recent darknet traffic, the proposed approach uncovers a momentous 440,000 compromised IoT devices and generates evidence-based artifacts related to 350 IoT botnets. While some of these detected botnets refer to previously documented campaigns such as the Hide and Seek, Hajime and Fbot, other events illustrate evolving threats such as those with cryptojacking capabilities and those that are targeting industrial control system communication and control services

    Strengthening Privacy and Cybersecurity through Anonymization and Big Data

    Get PDF
    L'abstract è presente nell'allegato / the abstract is in the attachmen
    • …
    corecore