30 research outputs found
Neural combinatorial optimization beyond the TSP: Existing architectures under-represent graph structure
Recent years have witnessed the promise that reinforcement learning, coupled with Graph Neural Network (GNN) architectures, could learn to solve hard combinatorial optimization problems: given raw input data and an evaluator to guide the process, the idea is to automatically learn a policy able to return feasible and high-quality outputs. Recent works have shown promising results but the latter were mainly evaluated on the travelling salesman problem (TSP) and similar abstract variants such as Split Delivery Vehicle Routing Problem (SDVRP). In this paper, we analyze how and whether recent neural architectures can be applied to graph problems of practical importance. We thus set out to systematically "transfer" these architectures to the Power and Channel Allocation Problem (PCAP), which has practical relevance for, e.g., radio resource allocation in wireless networks. Our experimental results suggest that existing architectures (i) are still incapable of capturing graph structural features and (ii) are not suitable for problems where the actions on the graph change the graph attributes. On a positive note, we show that augmenting the structural representation of problems with Distance Encoding is a promising step toward the still-ambitious goal of learning multi-purpose autonomous solvers
You, the Web and Your Device: Longitudinal Characterization of Browsing Habits
Understanding how people interact with the web is key for a variety of
applications, e.g., from the design of effective web pages to the definition of
successful online marketing campaigns. Browsing behavior has been traditionally
represented and studied by means of clickstreams, i.e., graphs whose vertices
are web pages, and edges are the paths followed by users. Obtaining large and
representative data to extract clickstreams is however challenging. The
evolution of the web questions whether browsing behavior is changing and, by
consequence, whether properties of clickstreams are changing. This paper
presents a longitudinal study of clickstreams in from 2013 to 2016. We evaluate
an anonymized dataset of HTTP traces captured in a large ISP, where thousands
of households are connected. We first propose a methodology to identify actual
URLs requested by users from the massive set of requests automatically fired by
browsers when rendering web pages. Then, we characterize web usage patterns and
clickstreams, taking into account both the temporal evolution and the impact of
the device used to explore the web. Our analyses precisely quantify various
aspects of clickstreams and uncover interesting patterns, such as the typical
short paths followed by people while navigating the web, the fast increasing
trend in browsing from mobile devices and the different roles of search engines
and social networks in promoting content. Finally, we contribute a dataset of
anonymized clickstreams to the community to foster new studies (anonymized
clickstreams are available to the public at
http://bigdata.polito.it/clickstream).Comment: 30 page
Cross-network transferable neural models for WLAN interference estimation
Airtime interference is a key performance indicator for WLANs, measuring, for
a given time period, the percentage of time during which a node is forced to
wait for other transmissions before to transmitting or receiving. Being able to
accurately estimate interference resulting from a given state change (e.g.,
channel, bandwidth, power) would allow a better control of WLAN resources,
assessing the impact of a given configuration before actually implementing it.
In this paper, we adopt a principled approach to interference estimation in
WLANs. We first use real data to characterize the factors that impact it, and
derive a set of relevant synthetic workloads for a controlled comparison of
various deep learning architectures in terms of accuracy, generalization and
robustness to outlier data. We find, unsurprisingly, that Graph Convolutional
Networks (GCNs) yield the best performance overall, leveraging the graph
structure inherent to campus WLANs. We notice that, unlike e.g. LSTMs, they
struggle to learn the behavior of specific nodes, unless given the node indexes
in addition. We finally verify GCN model generalization capabilities, by
applying trained models on operational deployments unseen at training time
The New Abnormal: Network Anomalies in the AI Era
Anomaly detection aims at finding unexpected patterns in data. It has been used in several problems in computer networks, from the detection of port scans and DDoS attacks to the monitoring of time-series collected from Internet monitoring systems. Data-driven approaches and machine learning have seen widespread application on anomaly detection too, and this trend has been accelerated by the recent developments on Artificial Intelligence research. This chapter summarizes ongoing recent progresses on anomaly detection research. In particular, we evaluate how developments on AI algorithms bring new possibilities for anomaly detection. We cover new representation learning techniques such as Generative Artificial Networks and Autoencoders, as well as techniques that can be used to improve models learned with machine learning algorithms, such as reinforcement learning. We survey both research works and tools implementing AI algorithms for anomaly detection. We found that the novel algorithms, while successful in other fields, have hardly been applied to networking problems. We conclude the chapter with a case study that illustrates a possible research direction
i-DarkVec: Incremental Embeddings for Darknet Traffic Analysis
Darknets are probes listening to traffic reaching IP addresses that host no services. Traffic reaching a darknet results from the actions of internet scanners, botnets, and possibly misconfigured hosts. Such peculiar nature of the darknet traffic makes darknets a valuable instrument to discover malicious online activities, e.g., identifying coordinated actions performed by bots or scanners. However, the massive amount of packets and sources that darknets observe makes it hard to extract meaningful insights, calling for scalable tools to automatically identify and group sources that share similar behaviour.
We here present i-DarkVec, a methodology to learn meaningful representations of Darknet traffic. i-DarkVec leverages Natural Language Processing techniques (e.g., Word2Vec) to capture the co-occurrence patterns that emerge when scanners or bots launch coordinated actions. As in NLP problems, the embeddings learned with i-DarkVec enable several new machine learning tasks on the darknet traffic, such as identifying clusters of senders engaged in similar activities.
We extensively test i-DarkVec and explore its design space in a case study using real darknets. We show that with a proper definition of services, the learned embeddings can be used to (i) solve the classification problem to associate unknown sources’ IP addresses to the correct classes of coordinated actors and (ii) automatically identify clusters of previously unknown sources performing similar attacks and scans, easing the security analyst’s job. i-DarkVec leverages a novel incremental embedding learning approach that is scalable and robust to traffic changes, making it applicable to dynamic and large-scale scenarios
DarkVec: automatic analysis of darknet traffic with word embeddings
Darknets are passive probes listening to traffic reaching IP addresses that host no services. Traffic reaching them is unsolicited by nature and often induced by scanners, malicious senders and misconfigured hosts. Its peculiar nature makes it a valuable source of information to learn about malicious activities. However, the massive amount of packets and sources that reach darknets makes it hard to extract meaningful insights. In particular, multiple senders contact the darknet while performing similar and coordinated tasks, which are often commanded by common controllers (botnets, crawlers, etc.). How to automatically identify and group those senders that share similar behaviors remains an open problem.
We here introduce DarkVec, a methodology to identify clusters of senders (i.e., IP addresses) engaged in similar activities on darknets. DarkVec leverages word embedding techniques (e.g., Word2Vec) to capture the co-occurrence patterns of sources hitting the darknets. We extensively test DarkVec and explore its design space in a case study using one month of darknet data. We show that with a proper definition of service, the generated embeddings can be easily used to (i) associate unknown senders' IP addresses to the correct known labels (more than 96% accuracy), and (ii) identify new attack and scan groups of previously unknown senders. We contribute DarkVec source code and datasets to the community also to stimulate the use of word embeddings to automatically learn patterns on generic traffic traces
Cross-network Embeddings Transfer for Traffic Analysis
Artificial Intelligence (AI) approaches have emerged as powerful tools to improve traffic analysis for network monitoring and management. However, the lack of large labeled datasets and the ever-changing networking scenarios make a fundamental difference compared to other domains where AI is thriving. We believe the ability to transfer the specific knowledge acquired in one network (or dataset) to a different network (or dataset) would be fundamental to speed up the adoption of AI-based solutions for traffic analysis and other networking applications (e.g., cybersecurity). We here propose and evaluate different options to transfer the knowledge built from a provider network, owning data and labels, to a customer network that desires to label its traffic but lacks labels. We formulate this problem as a domain adaptation problem that we solve with embedding alignment techniques and canonical transfer learning approaches. We present a thorough experimental analysis to assess the performance considering both supervised (e.g., classification) and unsupervised (e.g., novelty detection) downstream tasks related to darknet and honeypot traffic. Our experiments show the proper transfer techniques to use the models obtained from a network in a different network. We believe our contribution opens new opportunities and business models where network providers can successfully share their knowledge and AI models with customers
Enlightening the Darknets: Augmenting Darknet Visibility with Active Probes
Darknets collect unsolicited traffic reaching unused address spaces. They provide insights into malicious activities, such as the rise of botnets and DDoS attacks. However, darknets provide a shallow view, as traffic is never responded. Here we quantify how their visibility increases by responding to traffic with interactive responders with increasing levels of interaction. We consider four deployments: Darknets, simple, vertical bound to specific ports, and, a honeypot that responds to all protocols on any port. We contrast these alternatives by analyzing the traffic attracted by each deployment and characterizing how traffic changes throughout the responder lifecycle on the darknet. We show that the deployment of responders increases the value of darknet data by revealing patterns that would otherwise be unobservable. We measure Side-Scan phenomena where once a host starts responding, it attracts traffic to other ports and neighboring addresses. uncovers attacks that darknets and would not observe, e.g. large-scale activity on non-standard ports. And we observe how quickly senders can identify and attack new responders. The “enlightened” part of a darknet brings several benefits and offers opportunities to increase the visibility of sender patterns. This information gain is worth taking advantage of, and we, therefore, recommend that organizations consider this option