    Track: Tracerouting in SDN networks with arbitrary network functions

    The centralization of control plane in Software defined networking (SDN) creates a paramount challenge on troubleshooting the network as packets are ultimately forwarded by distributed data planes. Existing path tracing tools largely utilize packet tags to probe network paths among SDN-enabled switches. However, network functions (NFs) or middleboxes, whose presence is ubiquitous in today's networks, can drop packets or alter their tags - an action that can collapse the probing mechanism. In addition, sending probing packets through network functions could corrupt their internal states, risking of the correctness of servicing logic (e.g., incorrect load balancing decisions). In this paper, we present a novel troubleshooting tool, Track, for SDN-enabled network with arbitrary NFs. Track can discover the forwarding path including NFs taken by any packets, without changing the forwarding rules in switches and internal states of NFs. We have implemented Track on RYU controller. Our extensive experiment results show that Track can achieve 95.08% and 100% accuracy for discovering forwarding paths with and without NFs respectively, and can efficiently generate traces within 3 milliseconds per hop

    Using query transformation to improve Gnutella search performance

    Abstract—Gnutella peers independently choose the way in which objects are named as well as queried. Using a long term analysis of the files shared and queries issued, we show that this flexibility leads to a mismatch between the way that objects were named and the way that users were issuing search queries. Thirty percent of the failed queries contained keywords that were not present in any file name while the remaining queries failed because no file name contained all the keywords in a particular query. Our earlier analysis of files shared in the popular iTunes music file sharing system showed that standardizing the file names to make them easier to search is not a viable alternative. Instead, we transform the queries to better match the objects available in the system. We investigated spell correction (using file name information from the neighborhood) as well as remove query keywords. We consider the results from the transformed query to be relevant to the intent of the original query if the transformed query used many of the original keywords and the number of matching files closely matched the number of matches for typical successful queries. Our approach is practical and uses information available within the immediate neighborhood of an ultra-peer. An overlay agnostic analysis shows that our transformation improves success rates from 45 % to between 72.5 % and 91.2%. Using our Hybrid mechanism as a Gnutella middleware, our transformation produced relevant results for about 61 % of the failed queries. Keywords-unstructured peer-to-peer, query transformation I

    A P2P based usage control enforcement scheme resilient to re-injection attacks

    Exploiting Similarity for Multi-Source Downloads Using File Handprints

    Many contemporary approaches for speeding up large file transfers attempt to download chunks of a data object from multiple sources. Systems such as BitTorrent quickly locate sources that have an exact copy of the desired object, but they are unable to use sources that serve similar but non-identical objects. Other systems automatically exploit cross-file similarity by identifying sources for each chunk of the object. These systems, however, require a number of lookups proportional to the number of chunks in the object and a mapping for each unique chunk in every identical and similar object to its corresponding sources. Thus, the lookups and mappings in such a system can be quite large, limiting its scalability. This paper presents a hybrid system that provides the best of both approaches, locating identical and similar sources for data objects using a constant number of lookups and inserting a constant number of mappings per object. We first demonstrate through extensive data analysis that similarity does exist among objects of popular file types, and that making use of it can sometimes substantially improve download times. Next, we describe handprinting, a technique that allows clients to locate similar sources using a constant number of lookups and mappings. Finally, we describe the design, implementation and evaluation of Similarity-Enhanced Transfer (SET), a system that uses this technique to download objects. Our experimental evaluation shows that by using sources of similar objects, SET is able to significantly out-perform an equivalently configured BitTorrent

    Exploiting similarity for multi-source downloads using file handprints

    Failure-awareness and dynamic adaptation in data scheduling

    Over the years, scientific applications have become more complex and more data intensive. Especially large scale simulations and scientific experiments in areas such as physics, biology, astronomy and earth sciences demand highly distributed resources to satisfy excessive computational requirements. Increasing data requirements and the distributed nature of the resources made I/O the major bottleneck for end-to-end application performance. Existing systems fail to address issues such as reliability, scalability, and efficiency in dealing with wide area data access, retrieval and processing. In this study, we explore data-intensive distributed computing and study challenges in data placement in distributed environments. After analyzing different application scenarios, we develop new data scheduling methodologies and the key attributes for reliability, adaptability and performance optimization of distributed data placement tasks. Inspired by techniques used in microprocessor and operating system architectures, we extend and adapt some of the known low-level data handling and optimization techniques to distributed computing. Two major contributions of this work include (i) a failure-aware data placement paradigm for increased fault-tolerance, and (ii) adaptive scheduling of data placement tasks for improved end-to-end performance. The failure-aware data placement includes early error detection, error classification, and use of this information in scheduling decisions for the prevention of and recovery from possible future errors. The adaptive scheduling approach includes dynamically tuning data transfer parameters over wide area networks for efficient utilization of available network capacity and optimized end-to-end data transfer performance

    High performance stride-based network payload inspection

    There are two main drivers for network payload inspection: malicious data, attacks, virus detection in Network Intrusion Detection System (NIDS) and content detection in Data Leakage Prevention System (DLPS) or Copyright Infringement Detection System (CIDS). Network attacks are getting more and more prevalent. Traditional network firewalls can only check the packet header, but fail to detect attacks hidden in the packet payload. Therefore, the NIDS with Deep Packet Inspection (DPI) function has been developed and widely deployed. By checking each byte of a packet against the pattern set, which is called pattern matching, NIDS is able to detect the attack codes hidden in the payload. The pattern set is usually organized as a Deterministic Finite Automata (DFA). The processing time of DFA is proportional to the length of the input string, but the memory cost of a DFA is quite large. Meanwhile, the link bandwidth and the traffic of the Internet are rapidly increasing, the size of the attack signature database is also growing larger and larger due to the diversification of the attacks. Consequently, there is a strong demand for high performance and low storage cost NIDS. Traditional softwarebased and hardware-based pattern matching algorithms are have difficulty satisfying the processing speed requirement, thus high performance network payload inspection methods are needed to enable deep packet inspection at line rate. In this thesis, Stride Finite Automata (StriFA), a novel finite automata family to accelerate both string matching and regular expression matching, is presented. Compared with the conventional finite automata, which scan the entire traffic stream to locate malicious information, the StriFA only needs to scan samples of the traffic stream to find the suspicious information, thus increasing the matching speed and reducing memory requirements. Technologies such as instant messaging software (Skype, MSN) or BitTorrent file sharing methods, allow convenient sharing of information between managers, employees, customers, and partners. This, however, leads to two kinds of major security risks when exchanging data between different people: firstly, leakage of sensitive data from a company and, secondly, distribution of copyright infringing products in Peer to Peer (P2P) networks. Traditional DFA-based DPI solutions cannot be used for inspection of file distribution in P2P networks due to the potential out-of-order manner of the data delivery. To address this problem, a hybrid finite automaton called Skip-Stride-Neighbor Finite Automaton (S2NFA) is proposed to solve this problem. It combines benefits of the following three structures: 1) Skip-FA, which is used to solve the out-of-order data scanning problem; 2) Stride-DFA, which is introduced to reduce the memory usage of Skip-FA; 3) Neighbor-DFA which is based on the characteristics of Stride-DFA to get a low false positive rate at the additional cost of a small increase in memory consumption

    Leveraging content properties to optimize distributed storage systems

    Les fournisseurs de services de cloud computing, les rĂ©seaux sociaux et les entreprises de gestion des donnĂ©es ont assistĂ© Ă  une augmentation considĂ©rable du volume de donnĂ©es qu'ils reçoivent chaque jour. Toutes ces donnĂ©es crĂ©ent des nouvelles opportunitĂ©s pour Ă©tendre la connaissance humaine dans des domaines comme la santĂ©, l'urbanisme et le comportement humain et permettent d'amĂ©liorer les services offerts comme la recherche, la recommandation, et bien d'autres. Ce n'est pas par accident que plusieurs universitaires mais aussi les mĂ©dias publics se rĂ©fĂ©rent Ă  notre Ă©poque comme l'Ă©poque Big Data . Mais ces Ă©normes opportunitĂ©s ne peuvent ĂȘtre exploitĂ©es que grĂące Ă  de meilleurs systĂšmes de gestion de donnĂ©es. D'une part, ces derniers doivent accueillir en toute sĂ©curitĂ© ce volume Ă©norme de donnĂ©es et, d'autre part, ĂȘtre capable de les restituer rapidement afin que les applications puissent bĂ©nĂ©ficier de leur traite- ment. Ce document se concentre sur ces deux dĂ©fis relatifs aux Big Data . Dans notre Ă©tude, nous nous concentrons sur le stockage de sauvegarde (i) comme un moyen de protĂ©ger les donnĂ©es contre un certain nombre de facteurs qui peuvent les rendre indisponibles et (ii) sur le placement des donnĂ©es sur des systĂšmes de stockage rĂ©partis gĂ©ographiquement, afin que les temps de latence perçue par l'utilisateur soient minimisĂ©s tout en utilisant les ressources de stockage et du rĂ©seau efficacement. Tout au long de notre Ă©tude, les donnĂ©es sont placĂ©es au centre de nos choix de conception dont nous essayons de tirer parti des propriĂ©tĂ©s de contenu Ă  la fois pour le placement et le stockage efficace.Cloud service providers, social networks and data-management companies are witnessing a tremendous increase in the amount of data they receive every day. All this data creates new opportunities to expand human knowledge in fields like healthcare and human behavior and improve offered services like search, recommendation, and many others. It is not by accident that many academics but also public media refer to our era as the Big Data era. But these huge opportunities come with the requirement for better data management systems that, on one hand, can safely accommodate this huge and constantly increasing volume of data and, on the other, serve them in a timely and useful manner so that applications can benefit from processing them. This document focuses on the above two challenges that come with Big Data . In more detail, we study (i) backup storage systems as a means to safeguard data against a number of factors that may render them unavailable and (ii) data placement strategies on geographically distributed storage systems, with the goal to reduce the user perceived latencies and the network and storage resources are efficiently utilized. Throughout our study, data are placed in the centre of our design choices as we try to leverage content properties for both placement and efficient storage.RENNES1-Bibl. Ă©lectronique (352382106) / SudocSudocFranceF

    Incremental parallel and distributed systems

    Incremental computation strives for efficient successive runs of applications by re-executing only those parts of the computation that are affected by a given input change instead of recomputing everything from scratch. To realize the benefits of incremental computation, researchers and practitioners are developing new systems where the application programmer can provide an efficient update mechanism for changing application data. Unfortunately, most of the existing solutions are limiting because they not only depart from existing programming models, but also require programmers to devise an incremental update mechanism (or a dynamic algorithm) on a per-application basis. In this thesis, we present incremental parallel and distributed systems that enable existing real-world applications to automatically benefit from efficient incremental updates. Our approach neither requires departure from current models of programming, nor the design and implementation of dynamic algorithms. To achieve these goals, we have designed and built the following incremental systems: (i) Incoop — a system for incremental MapReduce computation; (ii) Shredder — a GPU-accelerated system for incremental storage; (iii) Slider — a stream processing platform for incremental sliding window analytics; and (iv) iThreads — a threading library for parallel incremental computation. Our experience with these systems shows that significant performance can be achieved for existing applications without requiring any additional effort from programmers.Inkrementelle Berechnungen ermöglichen die effizientere Ausführung aufeinanderfolgender Anwendungsaufrufe, indem nur die Teilbereiche der Anwendung erneut ausgefürt werden, die von den Änderungen der Eingabedaten betroffen sind. Dieses Berechnungsverfahren steht dem konventionellen und vollständig neu berechnenden Verfahren gegenüber. Um den Vorteil inkrementeller Berechnungen auszunutzen, entwickeln sowohl Wissenschaft als auch Industrie neue Systeme, bei denen der Anwendungsprogrammierer den effizienten Aktualisierungsmechanismus für die Änderung der Anwendungsdaten bereitstellt. Bedauerlicherweise lassen sich existierende Lösungen meist nur eingeschränkt anwenden, da sie das konventionelle Programmierungsmodel beibehalten und dadurch die erneute Entwicklung vom Programmierer des inkrementellen Aktualisierungsmechanismus (oder einen dynamischen Algorithmus) für jede Anwendung verlangen. Diese Doktorarbeit stellt inkrementelle Parallele- und Verteiltesysteme vor, die es existierenden Real-World-Anwendungen ermöglichen vom Vorteil der inkre- mentellen Berechnung automatisch zu profitieren. Unser Ansatz erfordert weder eine Abkehr von gegenwärtigen Programmiermodellen, noch Design und Implementierung von anwendungsspezifischen dynamischen Algorithmen. Um dieses Ziel zu erreichen, haben wir die folgenden Systeme zur inkrementellen parallelen und verteilten Berechnung entworfen und implementiert: (i) Incoop — ein System für inkrementelle Map-Reduce-Programme; (ii) Shredder — ein GPU- beschleunigtes System zur inkrementellen Speicherung; (iii) Slider — eine Plat- tform zur Batch-basierten Streamverarbeitung via inkrementeller Sliding-Window- Berechnung; und (iv) iThreads — eine Threading-Bibliothek zur parallelen inkre- mentellen Berechnung. Unsere Erfahrungen mit diesen Systemen zeigen, dass unsere Methoden sehr gute Performanz liefern können, und dies ohne weiteren Aufwand des Programmierers