Search CORE

19 research outputs found

Algorithms and Architectures for Network Search Processors

Author: Dharmapurikar Sarang
Publication venue: Washington University Open Scholarship
Publication date: 01/01/2006
Field of study

The continuous growth in the Internet’s size, the amount of data traﬃc, and the complexity of processing this traﬃc gives rise to new challenges in building high-performance network devices. One of the most fundamental tasks performed by these devices is searching the network data for predeﬁned keys. Address lookup, packet classiﬁcation, and deep packet inspection are some of the operations which involve table lookups and searching. These operations are typically part of the packet forwarding mechanism, and can create a performance bottleneck. Therefore, fast and resource eﬃcient algorithms are required. One of the most commonly used techniques for such searching operations is the Ternary Content Addressable Memory (TCAM). While TCAM can oﬀer very fast search speeds, it is costly and consumes a large amount of power. Hence, designing cost-eﬀective, power-eﬃcient, and high-speed search techniques has received a great deal of attention in the research and industrial community. In this thesis, we propose a generic search technique based on Bloom ﬁlters. A Bloom ﬁlter is a randomized data structure used to represent a set of bit-strings compactly and support set membership queries. We demonstrate techniques to convert the search process into table lookups. The resulting table data structures are kept in the oﬀ-chip memory and their Bloom ﬁlter representations are kept in the on-chip memory. An item needs to be looked up in the oﬀ-chip table only when it is found in the on-chip Bloom ﬁlters. By ﬁltering the oﬀ-chip memory accesses in this fashion, the search operations can be signiﬁcantly accelerated. Our approach involves a unique combination of algorithmic and architectural techniques that outperform some of the current techniques in terms of cost-eﬀectiveness, speed, and power-eﬃciency

CiteSeerX

Washington University St. Louis: Open Scholarship

Hardware Acceleration for Unstructured Big Data and Natural Language Processing.

Author: Tandon Prateek
Publication venue
Publication date: 01/01/2015
Field of study

The confluence of the rapid growth in electronic data in recent years, and the renewed interest in domain-specific hardware accelerators presents exciting technical opportunities. Traditional scale-out solutions for processing the vast amounts of text data have been shown to be energy- and cost-inefficient. In contrast, custom hardware accelerators can provide higher throughputs, lower latencies, and significant energy savings. In this thesis, I present a set of hardware accelerators for unstructured big-data processing and natural language processing. The first accelerator, called HAWK, aims to speed up the processing of ad hoc queries against large in-memory logs. HAWK is motivated by the observation that traditional software-based tools for processing large text corpora use memory bandwidth inefficiently due to software overheads, and, thus, fall far short of peak scan rates possible on modern memory systems. HAWK is designed to process data at a constant rate of 32 GB/s—faster than most extant memory systems. I demonstrate that HAWK outperforms state-of-the-art software solutions for text processing, almost by an order of magnitude in many cases. HAWK occupies an area of 45 sq-mm in its pareto-optimal configuration and consumes 22 W of power, well within the area and power envelopes of modern CPU chips. The second accelerator I propose aims to speed up similarity measurement calculations for semantic search in the natural language processing space. By leveraging the latency hiding concepts of multi-threading and simple scheduling mechanisms, my design maximizes functional unit utilization. This similarity measurement accelerator provides speedups of 36x-42x over optimized software running on server-class cores, while requiring 56x-58x lower energy, and only 1.3% of the area.PhDComputer Science and EngineeringUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttp://deepblue.lib.umich.edu/bitstream/2027.42/116712/1/prateekt_1.pd

Deep Blue Documents at the University of Michigan

Fine-grained reasoning about the security and usability trade-off in modern security tools

Author: Al-Saleh Mohammed I
Publication venue: UNM Digital Repository
Publication date: 01/07/2011
Field of study

Defense techniques detect or prevent attacks based on their ability to model the attacks. A balance between security and usability should always be established in any kind of defense technique. Attacks that exploit the weak points in security tools are very powerful and thus can go undetected. One source of those weak points in security tools comes when security is compromised for usability reasons, where if a security tool completely secures a system against attacks the whole system will not be usable because of the large false alarms or the very restricted policies it will create, or if the security tool decides not to secure a system against certain attacks, those attacks will simply and easily succeed. The key contribution of this dissertation is that it digs deeply into modern security tools and reasons about the inherent security and usability trade-offs based on identifying the low-level, contributing factors to known issues. This is accomplished by implementing full systems and then testing those systems in realistic scenarios. The thesis that this dissertation tests is that we can reason about security and usability trade-offs in fine-grained ways by building and testing full systems. Furthermore, this dissertation provides practical solutions and suggestions to reach a good balance between security and usability. We study two modern security tools, Dynamic Information Flow Tracking (DIFT) and Antivirus (AV) software, for their importance and wide usage. DIFT is a powerful technique that is used in various aspects of security systems. It works by tagging certain inputs and propagating the tags along with the inputs in the target system. However, current DIFT systems do not track implicit information flow because if all DIFT propagation rules are directly applied in a conservative way, the target system will be full of tagged data (a problem called overtagging) and thus useless because the tags tell us very little about the actual information flow of the system. So, current DIFT systems drop some security for usability. In this dissertation, we reason about the sources of the overtagging problem and provide practical ways to deal with it, while previous approaches have focused on abstract descriptions of the main causes of the problem based on limited experiments. The second security tool we consider in this dissertation is antivirus (AV) software. AV is a very important tool that protects systems against worms and viruses by scanning data against a database of signatures. Despite its importance and wide usage, AV has received little attention from the security research community. In this dissertation, we examine the AV internals and reason about the possibility of creating timing channel attacks against AV software. The attacker could infer information about the AV based only on the scanning time the AV spends to scan benign inputs. The other aspect of AV this dissertation explores is the low-level AV performance impact on systems. Even though the performance overhead of AV is a well known issue, the exact reasons behind this overhead are not well-studied. In this dissertation, we design a methodology that utilizes Event Tracing for Windows technology (ETW), a technology that accounts for all OS events, to reason about AV performance impact from the OS point of view. We show that the main performance impact of the AV on a task is the longer waiting time the task spends waiting on events

Accéleration des traitements de la sécurité mobile avec le calcul parallèle

Author: Abdellatif Manel
Publication venue: École de technologie supérieure
Publication date
Field of study

L’accélération des traitements relatifs à la sécurité mobile est devenue l’un des problèmes les plus importants vu la croissance exponentielle et l’impact important des attaques ciblant ces plateformes. Il est important de protéger les informations sensibles au sein des téléphones mobiles à travers l’implantation de systèmes de détection de malwares ainsi que le chiffrement des données dans le but de maintenir un plus haut niveau de sécurité. En effet, pour détecter les applications malveillantes, un antivirus analyse un flux de données important et le compare avec une base de données de signatures de malwares. Malheureusement, comme le nombre de menaces augmente continuellement, le nombre de signatures de codes malveillants augmente proportionnellement. Ceci rend le processus de détection plus complexe pour les téléphones mobiles, surtout qu’ils sont limités en termes de mémoire, de batterie et de capacité de traitement. Comme le niveau de sécurité de ces systèmes s’aggrave, la capacité de calcul parallèle pour les téléphones mobiles est de mieux en mieux améliorée avec l’évolution des unités de traitement graphiques mobiles (GPU). Dans ce mémoire, nous allons porter l’accent sur comment nous pouvons tirer profit de l’évolution des capacités de traitement parallèle des appareils mobiles afin d’accélérer la détection des logiciels malveillants ainsi que les traitements de cryptographie sur les téléphones Android. Dans ce but, nous avons conçu et mis en oeuvre une architecture parallèle pour les appareils mobiles qui exploite les capacités de calcul des GPUs mobiles et le traitement distribué sur les clusters. Une série de techniques de calcul et d’optimisation de la mémoire est proposée pour augmenter l’efficacité de la détection et le débit d’exécution. Les résultats de ce travail de recherche nous mènent à conclure que les GPUs mobiles peuvent être utilisées efficacement pour accélérer la détection des malwares pour les téléphones mobiles ainsi que les traitements cryptographiques. Les résultats montrent également que l’architecture locale proposée sur les téléphones mobiles peut être étendue à une architecture de cluster afin d’avoir un taux d’accélération de traitement plus important lorsque les ressources du téléphone mobile sont occupées

Espace ÉTS

Hardware acceleration for power efficient deep packet inspection

Author: Zhou Yachao
Publication venue: Dublin City University. Research Institute for Networks and Communications Engineering (RINCE)
Publication date: 01/11/2012
Field of study

The rapid growth of the Internet leads to a massive spread of malicious attacks like viruses and malwares, making the safety of online activity a major concern. The use of Network Intrusion Detection Systems (NIDS) is an effective method to safeguard the Internet. One key procedure in NIDS is Deep Packet Inspection (DPI). DPI can examine the contents of a packet and take actions on the packets based on predefined rules. In this thesis, DPI is mainly discussed in the context of security applications. However, DPI can also be used for bandwidth management and network surveillance. DPI inspects the whole packet payload, and due to this and the complexity of the inspection rules, DPI algorithms consume significant amounts of resources including time, memory and energy. The aim of this thesis is to design hardware accelerated methods for memory and energy efficient high-speed DPI. The patterns in packet payloads, especially complex patterns, can be efficiently represented by regular expressions, which can be translated by the use of Deterministic Finite Automata (DFA). DFA algorithms are fast but consume very large amounts of memory with certain kinds of regular expressions. In this thesis, memory efficient algorithms are proposed based on the transition compressions of the DFAs. In this work, Bloom filters are used to implement DPI on an FPGA for hardware acceleration with the design of a parallel architecture. Furthermore, devoted at a balance of power and performance, an energy efficient adaptive Bloom filter is designed with the capability of adjusting the number of active hash functions according to current workload. In addition, a method is given for implementation on both two-stage and multi-stage platforms. Nevertheless, false positive rates still prevents the Bloom filter from extensive utilization; a cache-based counting Bloom filter is presented in this work to get rid of the false positives for fast and precise matching. Finally, in future work, in order to estimate the effect of power savings, models will be built for routers and DPI, which will also analyze the latency impact of dynamic frequency adaption to current traffic. Besides, a low power DPI system will be designed with a single or multiple DPI engines. Results and evaluation of the low power DPI model and system will be produced in future

DCU Online Research Access Service

Accelerating digital forensic searching through GPGPU parallel processing techniques

Author: Bayne Ethan
Publication venue
Publication date: 01/02/2017
Field of study

Abertay Research Portal

Recommended from our members

GPU-Acceleration of In-Memory Data Analytics

Author: Sitaridi Evangelia
Publication venue: 'Columbia University Libraries/Information Services'
Publication date: 01/01/2016
Field of study

Hardware advances strongly influence the database system design. The flattening speed of CPU cores makes many-core accelerators, such as GPUs, a vital alternative to explore for processing the ever-increasing amounts of data. GPUs have a significantly higher degree of parallelism than multi-core CPUs but their cores are simpler. As a result, they do not face the power constraints limiting the parallelism of CPUs. Their trade-off, however, is the increased implementation complexity. This thesis adapts and redesigns data analytics operators to better exploit the GPU special memory and threading model. Due to the increasing memory capacity and also the user's need for fast interaction with the data, we focus on in-memory analytics. Our techniques span different steps of the data processing pipeline: (1) Data preprocessing, (2) Query compilation, and (3) Algorithmic optimization of the operators. Our data preprocessing techniques adapt the data layout for numeric and string columns to maximize the achieved GPU memory bandwidth. Our query compilation techniques compute the optimal execution plan for conjunctive filters. We formulate \textit{memory divergence} for string matching algorithms and suggest how to eliminate it. Finally, we parallelize decompression algorithms in our compression framework \textit{Gompresso} to fit more data into the limited GPU memory. Gompresso achieves high speed-ups on GPUs over multi-core CPU state-of-the-art libraries and is suitable for any massively parallel processor

Columbia University Academic Commons

FPGA-based High Throughput Regular Expression Pattern Matching for Network Intrusion Detection Systems

Author: Modi Bala
Publication venue
Publication date: 01/02/2015
Field of study

Network speeds and bandwidths have improved over time. However, the frequency of network attacks and illegal accesses have also increased as the network speeds and bandwidths improved over time. Such attacks are capable of compromising the privacy and confidentiality of network resources belonging to even the most secure networks. Currently, general-purpose processor based software solutions used for detecting network attacks have become inadequate in coping with the current network speeds. Hardware-based platforms are designed to cope with the rising network speeds measured in several gigabits per seconds (Gbps). Such hardware-based platforms are capable of detecting several attacks at once, and a good candidate is the Field-programmable Gate Array (FPGA). The FPGA is a hardware platform that can be used to perform deep packet inspection of network packet contents at high speed. As such, this thesis focused on studying designs that were implemented with Field-programmable Gate Arrays (FPGAs). Furthermore, all the FPGA-based designs studied in this thesis have attempted to sustain a more steady growth in throughput and throughput efficiency. Throughput efficiency is defined as the concurrent throughput of a regular expression matching engine circuit divided by the average number of look up tables (LUTs) utilised by each state of the engine"s automata. The implemented FPGA-based design was built upon the concept of equivalence classification. The concept helped to reduce the overall table size of the inputs needed to drive the various Nondeterministic Finite Automata (NFA) matching engines. Compared with other approaches, the design sustained a throughput of up to 11.48 Gbps, and recorded an overall reduction in the number of pattern matching engines required by up to 75%. Also, the overall memory required by the design was reduced by about 90% when synthesised on the target FPGA platform

Kent Academic Repository