12 research outputs found

    Extreme bandits

    Get PDF
    International audienceIn many areas of medicine, security, and life sciences, we want to allocate limited resources to different sources in order to detect extreme values. In this paper, we study an efficient way to allocate these resources sequentially under limited feedback. While sequential design of experiments is well studied in bandit theory, the most commonly optimized property is the regret with respect to the maximum mean reward. However, in other problems such as network intrusion detection, we are interested in detecting the most extreme value output by the sources. Therefore, in our work we study extreme regret which measures the efficiency of an algorithm compared to the oracle policy selecting the source with the heaviest tail. We propose the ExtremeHunter algorithm, provide its analysis, and evaluate it empirically on synthetic and real-world experiments

    Unsupervised Machine Learning for Networking:Techniques, Applications and Research Challenges

    Get PDF
    While machine learning and artificial intelligence have long been applied in networking research, the bulk of such works has focused on supervised learning. Recently, there has been a rising trend of employing unsupervised machine learning using unstructured raw network data to improve network performance and provide services such as traffic engineering, anomaly detection, Internet traffic classification, and quality of service optimization. The interest in applying unsupervised learning techniques in networking emerges from their great success in other fields such as computer vision, natural language processing, speech recognition, and optimal control (e.g., for developing autonomous self-driving cars). Unsupervised learning is interesting since it can unconstrain us from the need of labeled data and manual handcrafted feature engineering thereby facilitating flexible, general, and automated methods of machine learning. The focus of this survey paper is to provide an overview of the applications of unsupervised learning in the domain of networking. We provide a comprehensive survey highlighting the recent advancements in unsupervised learning techniques and describe their applications in various learning tasks in the context of networking. We also provide a discussion on future directions and open research issues, while also identifying potential pitfalls. While a few survey papers focusing on the applications of machine learning in networking have previously been published, a survey of similar scope and breadth is missing in literature. Through this paper, we advance the state of knowledge by carefully synthesizing the insights from these survey papers while also providing contemporary coverage of recent advances

    Unsupervised Machine Learning for Networking:Techniques, Applications and Research Challenges

    Get PDF
    While machine learning and artificial intelligence have long been applied in networking research, the bulk of such works has focused on supervised learning. Recently there has been a rising trend of employing unsupervised machine learning using unstructured raw network data to improve network performance and provide services such as traffic engineering, anomaly detection, Internet traffic classification, and quality of service optimization. The interest in applying unsupervised learning techniques in networking emerges from their great success in other fields such as computer vision, natural language processing, speech recognition, and optimal control (e.g., for developing autonomous self-driving cars). Unsupervised learning is interesting since it can unconstrain us from the need of labeled data and manual handcrafted feature engineering thereby facilitating flexible, general, and automated methods of machine learning. The focus of this survey paper is to provide an overview of the applications of unsupervised learning in the domain of networking. We provide a comprehensive survey highlighting the recent advancements in unsupervised learning techniques and describe their applications for various learning tasks in the context of networking. We also provide a discussion on future directions and open research issues, while also identifying potential pitfalls. While a few survey papers focusing on the applications of machine learning in networking have previously been published, a survey of similar scope and breadth is missing in literature. Through this paper, we advance the state of knowledge by carefully synthesizing the insights from these survey papers while also providing contemporary coverage of recent advances

    Improving Pan-African research and education networks through traffic engineering: A LISP/SDN approach

    Get PDF
    The UbuntuNet Alliance, a consortium of National Research and Education Networks (NRENs) runs an exclusive data network for education and research in east and southern Africa. Despite a high degree of route redundancy in the Alliance's topology, a large portion of Internet traffic between the NRENs is circuitously routed through Europe. This thesis proposes a performance-based strategy for dynamic ranking of inter-NREN paths to reduce latencies. The thesis makes two contributions: firstly, mapping Africa's inter-NREN topology and quantifying the extent and impact of circuitous routing; and, secondly, a dynamic traffic engineering scheme based on Software Defined Networking (SDN), Locator/Identifier Separation Protocol (LISP) and Reinforcement Learning. To quantify the extent and impact of circuitous routing among Africa's NRENs, active topology discovery was conducted. Traceroute results showed that up to 75% of traffic from African sources to African NRENs went through inter-continental routes and experienced much higher latencies than that of traffic routed within Africa. An efficient mechanism for topology discovery was implemented by incorporating prior knowledge of overlapping paths to minimize redundancy during measurements. Evaluation of the network probing mechanism showed a 47% reduction in packets required to complete measurements. An interactive geospatial topology visualization tool was designed to evaluate how NREN stakeholders could identify routes between NRENs. Usability evaluation showed that users were able to identify routes with an accuracy level of 68%. NRENs are faced with at least three problems to optimize traffic engineering, namely: how to discover alternate end-to-end paths; how to measure and monitor performance of different paths; and how to reconfigure alternate end-to-end paths. This work designed and evaluated a traffic engineering mechanism for dynamic discovery and configuration of alternate inter-NREN paths using SDN, LISP and Reinforcement Learning. A LISP/SDN based traffic engineering mechanism was designed to enable NRENs to dynamically rank alternate gateways. Emulation-based evaluation of the mechanism showed that dynamic path ranking was able to achieve 20% lower latencies compared to the default static path selection. SDN and Reinforcement Learning were used to enable dynamic packet forwarding in a multipath environment, through hop-by-hop ranking of alternate links based on latency and available bandwidth. The solution achieved minimum latencies with significant increases in aggregate throughput compared to static single path packet forwarding. Overall, this thesis provides evidence that integration of LISP, SDN and Reinforcement Learning, as well as ranking and dynamic configuration of paths could help Africa's NRENs to minimise latencies and to achieve better throughputs

    Large-Scale Networks: Algorithms, Complexity and Real Applications

    Get PDF
    Networks have broad applicability to real-world systems, due to their ability to model and represent complex relationships. The discovery and forecasting of insightful patterns from networks are at the core of analytical intelligence in government, industry, and science. Discoveries and forecasts, especially from large-scale networks commonly available in the big-data era, strongly rely on fast and efficient network algorithms. Algorithms for dealing with large-scale networks are the first topic of research we focus on in this thesis. We design, theoretically analyze and implement efficient algorithms and parallel algorithms, rigorously proving their worst-case time and space complexities. Our main contributions in this area are novel, parallel algorithms to detect k-clique communities, special network groups which are widely used to understand complex phenomena. The proposed algorithms have a space complexity which is the square root of that of the current state-of-the-art. Time complexity achieved is optimal, since it is inversely proportional to the number of processing units available. Extensive experiments were conducted to confirm the efficiency of the proposed algorithms, even in comparison to the state-of-the-art. We experimentally measured a linear speedup, substantiating the optimal performances attained. The second focus of this thesis is the application of networks to discover insights from real-world systems. We introduce novel methodologies to capture cross correlations in evolving networks. We instantiate these methodologies to study the Internet, one of the most, if not the most, pervasive modern technological system. We investigate the dynamics of connectivity among Internet companies, those which interconnect to ensure global Internet access. We then combine connectivity dynamics with historical worldwide stock markets data, and produce graphical representations to visually identify high correlations. We find that geographically close Internet companies offering similar services are driven by common economic factors. We also provide evidence on the existence and nature of hidden factors governing the dynamics of Internet connectivity. Finally, we propose network models to effectively study the Internet Domain Name System (DNS) traffic, and leverage these models to obtain rankings of Internet domains as well as to identify malicious activities

    Improved Detection for Advanced Polymorphic Malware

    Get PDF
    Malicious Software (malware) attacks across the internet are increasing at an alarming rate. Cyber-attacks have become increasingly more sophisticated and targeted. These targeted attacks are aimed at compromising networks, stealing personal financial information and removing sensitive data or disrupting operations. Current malware detection approaches work well for previously known signatures. However, malware developers utilize techniques to mutate and change software properties (signatures) to avoid and evade detection. Polymorphic malware is practically undetectable with signature-based defensive technologies. Today’s effective detection rate for polymorphic malware detection ranges from 68.75% to 81.25%. New techniques are needed to improve malware detection rates. Improved detection of polymorphic malware can only be accomplished by extracting features beyond the signature realm. Targeted detection for polymorphic malware must rely upon extracting key features and characteristics for advanced analysis. Traditionally, malware researchers have relied on limited dimensional features such as behavior (dynamic) or source/execution code analysis (static). This study’s focus was to extract and evaluate a limited set of multidimensional topological data in order to improve detection for polymorphic malware. This study used multidimensional analysis (file properties, static and dynamic analysis) with machine learning algorithms to improve malware detection. This research demonstrated improved polymorphic malware detection can be achieved with machine learning. This study conducted a number of experiments using a standard experimental testing protocol. This study utilized three advanced algorithms (Metabagging (MB), Instance Based k-Means (IBk) and Deep Learning Multi-Layer Perceptron) with a limited set of multidimensional data. Experimental results delivered detection results above 99.43%. In addition, the experiments delivered near zero false positives. The study’s approach was based on single case experimental design, a well-accepted protocol for progressive testing. The study constructed a prototype to automate feature extraction, assemble files for analysis, and analyze results through multiple clustering algorithms. The study performed an evaluation of large malware sample datasets to understand effectiveness across a wide range of malware. The study developed an integrated framework which automated feature extraction for multidimensional analysis. The feature extraction framework consisted of four modules: 1) a pre-process module that extracts and generates topological features based on static analysis of machine code and file characteristics, 2) a behavioral analysis module that extracts behavioral characteristics based on file execution (dynamic analysis), 3) an input file construction and submission module, and 4) a machine learning module that employs various advanced algorithms. As with most studies, careful attention was paid to false positive and false negative rates which reduce their overall detection accuracy and effectiveness. This study provided a novel approach to expand the malware body of knowledge and improve the detection for polymorphic malware targeting Microsoft operating systems

    Online learning on the programmable dataplane

    Get PDF
    This thesis makes the case for managing computer networks with datadriven methods automated statistical inference and control based on measurement data and runtime observations—and argues for their tight integration with programmable dataplane hardware to make management decisions faster and from more precise data. Optimisation, defence, and measurement of networked infrastructure are each challenging tasks in their own right, which are currently dominated by the use of hand-crafted heuristic methods. These become harder to reason about and deploy as networks scale in rates and number of forwarding elements, but their design requires expert knowledge and care around unexpected protocol interactions. This makes tailored, per-deployment or -workload solutions infeasible to develop. Recent advances in machine learning offer capable function approximation and closed-loop control which suit many of these tasks. New, programmable dataplane hardware enables more agility in the network— runtime reprogrammability, precise traffic measurement, and low latency on-path processing. The synthesis of these two developments allows complex decisions to be made on previously unusable state, and made quicker by offloading inference to the network. To justify this argument, I advance the state of the art in data-driven defence of networks, novel dataplane-friendly online reinforcement learning algorithms, and in-network data reduction to allow classification of switchscale data. Each requires co-design aware of the network, and of the failure modes of systems and carried traffic. To make online learning possible in the dataplane, I use fixed-point arithmetic and modify classical (non-neural) approaches to take advantage of the SmartNIC compute model and make use of rich device local state. I show that data-driven solutions still require great care to correctly design, but with the right domain expertise they can improve on pathological cases in DDoS defence, such as protecting legitimate UDP traffic. In-network aggregation to histograms is shown to enable accurate classification from fine temporal effects, and allows hosts to scale such classification to far larger flow counts and traffic volume. Moving reinforcement learning to the dataplane is shown to offer substantial benefits to stateaction latency and online learning throughput versus host machines; allowing policies to react faster to fine-grained network events. The dataplane environment is key in making reactive online learning feasible—to port further algorithms and learnt functions, I collate and analyse the strengths of current and future hardware designs, as well as individual algorithms

    Secure Connectivity With Persistent Identities

    Get PDF
    In the current Internet the Internet Protocol address is burdened with two roles. It serves as the identifier and the locator for the host. As the host moves its identity changes with its locator. The research community thinks that the Future Internet will include identifier-locator split in some form. Identifier-locator split is seen as the solution to multiple problems. However, identifier-locator split introduces multiple new problems to the Internet. In this dissertation we concentrate on: the feasibility of using identifier-locator split with legacy applications, securing the resolution steps, using the persistent identity for access control, improving mobility in environments using multiple address families and so improving the disruption tolerance for connectivity. The proposed methods achieve theoretical and practical improvements over the earlier state of the art. To raise the overall awareness, our results have been published in interdisciplinary forums.Nykypäivän Internetissä IP-osoite on kuormitettu kahdella eri roolilla. IP toimii päätelaitteen osoitteena, mutta myös usein sen identiteetinä. Tällöin laitteen identiteetti muuttuu laitteen liikkuessa, koska laitteen osoite vaihtuu. Tutkimusyhteisön mielestä paikan ja identiteetin erottaminen on välttämätöntä tulevaisuuden Internetissä. Paikan ja identiteetin erottaminen tuo kuitenkin esiin joukon uusia ongelmia. Tässä väitöskirjassa keskitytään selvittämään paikan ja identiteetin erottamisen vaikutusta olemassa oleviin verkkoa käyttäviin sovelluksiin, turvaamaan nimien muuntaminen osoitteiksi, helpottamaan pitkäikäisten identiteettien käyttöä pääsyvalvonnassa ja parantamaan yhteyksien mahdollisuuksia selviytyä liikkumisesta usean osoiteperheen ympäristöissä. Väitöskirjassa ehdotetut menetelmät saavuttavat sekä teoreettisia että käytännön etuja verrattuna aiempiin kirjallisuudessa esitettyihin menetelmiin. Saavutetut tulokset on julkaistu eri osa-alojen foorumeilla
    corecore