19 research outputs found

    Algorithms and Architectures for Network Search Processors

    Get PDF
    The continuous growth in the Internet’s size, the amount of data traïŹƒc, and the complexity of processing this traïŹƒc gives rise to new challenges in building high-performance network devices. One of the most fundamental tasks performed by these devices is searching the network data for predeïŹned keys. Address lookup, packet classiïŹcation, and deep packet inspection are some of the operations which involve table lookups and searching. These operations are typically part of the packet forwarding mechanism, and can create a performance bottleneck. Therefore, fast and resource eïŹƒcient algorithms are required. One of the most commonly used techniques for such searching operations is the Ternary Content Addressable Memory (TCAM). While TCAM can oïŹ€er very fast search speeds, it is costly and consumes a large amount of power. Hence, designing cost-eïŹ€ective, power-eïŹƒcient, and high-speed search techniques has received a great deal of attention in the research and industrial community. In this thesis, we propose a generic search technique based on Bloom ïŹlters. A Bloom ïŹlter is a randomized data structure used to represent a set of bit-strings compactly and support set membership queries. We demonstrate techniques to convert the search process into table lookups. The resulting table data structures are kept in the oïŹ€-chip memory and their Bloom ïŹlter representations are kept in the on-chip memory. An item needs to be looked up in the oïŹ€-chip table only when it is found in the on-chip Bloom ïŹlters. By ïŹltering the oïŹ€-chip memory accesses in this fashion, the search operations can be signiïŹcantly accelerated. Our approach involves a unique combination of algorithmic and architectural techniques that outperform some of the current techniques in terms of cost-eïŹ€ectiveness, speed, and power-eïŹƒciency

    Hardware Acceleration for Unstructured Big Data and Natural Language Processing.

    Full text link
    The confluence of the rapid growth in electronic data in recent years, and the renewed interest in domain-specific hardware accelerators presents exciting technical opportunities. Traditional scale-out solutions for processing the vast amounts of text data have been shown to be energy- and cost-inefficient. In contrast, custom hardware accelerators can provide higher throughputs, lower latencies, and significant energy savings. In this thesis, I present a set of hardware accelerators for unstructured big-data processing and natural language processing. The first accelerator, called HAWK, aims to speed up the processing of ad hoc queries against large in-memory logs. HAWK is motivated by the observation that traditional software-based tools for processing large text corpora use memory bandwidth inefficiently due to software overheads, and, thus, fall far short of peak scan rates possible on modern memory systems. HAWK is designed to process data at a constant rate of 32 GB/s—faster than most extant memory systems. I demonstrate that HAWK outperforms state-of-the-art software solutions for text processing, almost by an order of magnitude in many cases. HAWK occupies an area of 45 sq-mm in its pareto-optimal configuration and consumes 22 W of power, well within the area and power envelopes of modern CPU chips. The second accelerator I propose aims to speed up similarity measurement calculations for semantic search in the natural language processing space. By leveraging the latency hiding concepts of multi-threading and simple scheduling mechanisms, my design maximizes functional unit utilization. This similarity measurement accelerator provides speedups of 36x-42x over optimized software running on server-class cores, while requiring 56x-58x lower energy, and only 1.3% of the area.PhDComputer Science and EngineeringUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttp://deepblue.lib.umich.edu/bitstream/2027.42/116712/1/prateekt_1.pd

    Fine-grained reasoning about the security and usability trade-off in modern security tools

    Get PDF
    Defense techniques detect or prevent attacks based on their ability to model the attacks. A balance between security and usability should always be established in any kind of defense technique. Attacks that exploit the weak points in security tools are very powerful and thus can go undetected. One source of those weak points in security tools comes when security is compromised for usability reasons, where if a security tool completely secures a system against attacks the whole system will not be usable because of the large false alarms or the very restricted policies it will create, or if the security tool decides not to secure a system against certain attacks, those attacks will simply and easily succeed. The key contribution of this dissertation is that it digs deeply into modern security tools and reasons about the inherent security and usability trade-offs based on identifying the low-level, contributing factors to known issues. This is accomplished by implementing full systems and then testing those systems in realistic scenarios. The thesis that this dissertation tests is that we can reason about security and usability trade-offs in fine-grained ways by building and testing full systems. Furthermore, this dissertation provides practical solutions and suggestions to reach a good balance between security and usability. We study two modern security tools, Dynamic Information Flow Tracking (DIFT) and Antivirus (AV) software, for their importance and wide usage. DIFT is a powerful technique that is used in various aspects of security systems. It works by tagging certain inputs and propagating the tags along with the inputs in the target system. However, current DIFT systems do not track implicit information flow because if all DIFT propagation rules are directly applied in a conservative way, the target system will be full of tagged data (a problem called overtagging) and thus useless because the tags tell us very little about the actual information flow of the system. So, current DIFT systems drop some security for usability. In this dissertation, we reason about the sources of the overtagging problem and provide practical ways to deal with it, while previous approaches have focused on abstract descriptions of the main causes of the problem based on limited experiments. The second security tool we consider in this dissertation is antivirus (AV) software. AV is a very important tool that protects systems against worms and viruses by scanning data against a database of signatures. Despite its importance and wide usage, AV has received little attention from the security research community. In this dissertation, we examine the AV internals and reason about the possibility of creating timing channel attacks against AV software. The attacker could infer information about the AV based only on the scanning time the AV spends to scan benign inputs. The other aspect of AV this dissertation explores is the low-level AV performance impact on systems. Even though the performance overhead of AV is a well known issue, the exact reasons behind this overhead are not well-studied. In this dissertation, we design a methodology that utilizes Event Tracing for Windows technology (ETW), a technology that accounts for all OS events, to reason about AV performance impact from the OS point of view. We show that the main performance impact of the AV on a task is the longer waiting time the task spends waiting on events

    Accéleration des traitements de la sécurité mobile avec le calcul parallÚle

    Get PDF
    L’accĂ©lĂ©ration des traitements relatifs Ă  la sĂ©curitĂ© mobile est devenue l’un des problĂšmes les plus importants vu la croissance exponentielle et l’impact important des attaques ciblant ces plateformes. Il est important de protĂ©ger les informations sensibles au sein des tĂ©lĂ©phones mobiles Ă  travers l’implantation de systĂšmes de dĂ©tection de malwares ainsi que le chiffrement des donnĂ©es dans le but de maintenir un plus haut niveau de sĂ©curitĂ©. En effet, pour dĂ©tecter les applications malveillantes, un antivirus analyse un flux de donnĂ©es important et le compare avec une base de donnĂ©es de signatures de malwares. Malheureusement, comme le nombre de menaces augmente continuellement, le nombre de signatures de codes malveillants augmente proportionnellement. Ceci rend le processus de dĂ©tection plus complexe pour les tĂ©lĂ©phones mobiles, surtout qu’ils sont limitĂ©s en termes de mĂ©moire, de batterie et de capacitĂ© de traitement. Comme le niveau de sĂ©curitĂ© de ces systĂšmes s’aggrave, la capacitĂ© de calcul parallĂšle pour les tĂ©lĂ©phones mobiles est de mieux en mieux amĂ©liorĂ©e avec l’évolution des unitĂ©s de traitement graphiques mobiles (GPU). Dans ce mĂ©moire, nous allons porter l’accent sur comment nous pouvons tirer profit de l’évolution des capacitĂ©s de traitement parallĂšle des appareils mobiles afin d’accĂ©lĂ©rer la dĂ©tection des logiciels malveillants ainsi que les traitements de cryptographie sur les tĂ©lĂ©phones Android. Dans ce but, nous avons conçu et mis en oeuvre une architecture parallĂšle pour les appareils mobiles qui exploite les capacitĂ©s de calcul des GPUs mobiles et le traitement distribuĂ© sur les clusters. Une sĂ©rie de techniques de calcul et d’optimisation de la mĂ©moire est proposĂ©e pour augmenter l’efficacitĂ© de la dĂ©tection et le dĂ©bit d’exĂ©cution. Les rĂ©sultats de ce travail de recherche nous mĂšnent Ă  conclure que les GPUs mobiles peuvent ĂȘtre utilisĂ©es efficacement pour accĂ©lĂ©rer la dĂ©tection des malwares pour les tĂ©lĂ©phones mobiles ainsi que les traitements cryptographiques. Les rĂ©sultats montrent Ă©galement que l’architecture locale proposĂ©e sur les tĂ©lĂ©phones mobiles peut ĂȘtre Ă©tendue Ă  une architecture de cluster afin d’avoir un taux d’accĂ©lĂ©ration de traitement plus important lorsque les ressources du tĂ©lĂ©phone mobile sont occupĂ©es

    Hardware acceleration for power efficient deep packet inspection

    Get PDF
    The rapid growth of the Internet leads to a massive spread of malicious attacks like viruses and malwares, making the safety of online activity a major concern. The use of Network Intrusion Detection Systems (NIDS) is an effective method to safeguard the Internet. One key procedure in NIDS is Deep Packet Inspection (DPI). DPI can examine the contents of a packet and take actions on the packets based on predefined rules. In this thesis, DPI is mainly discussed in the context of security applications. However, DPI can also be used for bandwidth management and network surveillance. DPI inspects the whole packet payload, and due to this and the complexity of the inspection rules, DPI algorithms consume significant amounts of resources including time, memory and energy. The aim of this thesis is to design hardware accelerated methods for memory and energy efficient high-speed DPI. The patterns in packet payloads, especially complex patterns, can be efficiently represented by regular expressions, which can be translated by the use of Deterministic Finite Automata (DFA). DFA algorithms are fast but consume very large amounts of memory with certain kinds of regular expressions. In this thesis, memory efficient algorithms are proposed based on the transition compressions of the DFAs. In this work, Bloom filters are used to implement DPI on an FPGA for hardware acceleration with the design of a parallel architecture. Furthermore, devoted at a balance of power and performance, an energy efficient adaptive Bloom filter is designed with the capability of adjusting the number of active hash functions according to current workload. In addition, a method is given for implementation on both two-stage and multi-stage platforms. Nevertheless, false positive rates still prevents the Bloom filter from extensive utilization; a cache-based counting Bloom filter is presented in this work to get rid of the false positives for fast and precise matching. Finally, in future work, in order to estimate the effect of power savings, models will be built for routers and DPI, which will also analyze the latency impact of dynamic frequency adaption to current traffic. Besides, a low power DPI system will be designed with a single or multiple DPI engines. Results and evaluation of the low power DPI model and system will be produced in future

    FPGA-based High Throughput Regular Expression Pattern Matching for Network Intrusion Detection Systems

    Get PDF
    Network speeds and bandwidths have improved over time. However, the frequency of network attacks and illegal accesses have also increased as the network speeds and bandwidths improved over time. Such attacks are capable of compromising the privacy and confidentiality of network resources belonging to even the most secure networks. Currently, general-purpose processor based software solutions used for detecting network attacks have become inadequate in coping with the current network speeds. Hardware-based platforms are designed to cope with the rising network speeds measured in several gigabits per seconds (Gbps). Such hardware-based platforms are capable of detecting several attacks at once, and a good candidate is the Field-programmable Gate Array (FPGA). The FPGA is a hardware platform that can be used to perform deep packet inspection of network packet contents at high speed. As such, this thesis focused on studying designs that were implemented with Field-programmable Gate Arrays (FPGAs). Furthermore, all the FPGA-based designs studied in this thesis have attempted to sustain a more steady growth in throughput and throughput efficiency. Throughput efficiency is defined as the concurrent throughput of a regular expression matching engine circuit divided by the average number of look up tables (LUTs) utilised by each state of the engine"s automata. The implemented FPGA-based design was built upon the concept of equivalence classification. The concept helped to reduce the overall table size of the inputs needed to drive the various Nondeterministic Finite Automata (NFA) matching engines. Compared with other approaches, the design sustained a throughput of up to 11.48 Gbps, and recorded an overall reduction in the number of pattern matching engines required by up to 75%. Also, the overall memory required by the design was reduced by about 90% when synthesised on the target FPGA platform
    corecore