Search CORE

23 research outputs found

High-performance software packet processing

Author: Fu Qiaobin
Publication venue
Publication date: 30/01/2021
Field of study

In today’s Internet, it is highly desirable to have fast and scalable software packet processing solutions for network applications that run on commodity hardware. The advent of cloud computing drives the continued rapid growth of Internet traffic. Moreover, the development of emerging networking techniques, such as Network Function Virtualization, significantly shapes the need for implementing the network functions in software. Finally, with the advancement of modern platforms as well as software frameworks for packet processing, network applications have potential to process 100+ Gbps network traffic on a single commodity server. Representative frameworks include the Click modular router, the RouteBricks scalable routing architecture, and BUFFALO, the software-based Ethernet switch. Beneath this general-purpose routing and switching functionality lie a broad set of network applications, many of which are handled with custom methods to provide cost-effectiveness and flexibility. This thesis considers two long-standing networking applications, IP lookup and distributed denial-of-service (DDoS) mitigation, and proposes efficient software-based methods drawing from this new perspective. In this thesis, we first introduce several optimization techniques to accelerate network applications by taking advantage of modern CPU features. Then, we explore the IP lookup problem to find the longest matching prefix of an IP address in a set of prefixes. An ideal IP lookup algorithm should achieve small constant IP lookup time, and on-chip memory usage. However, no prior IP lookup algorithm achieves both requirements at the same time. We propose SAIL, a splitting approach to IP lookup, and a suite of algorithms for IP lookup based on SAIL framework. We conducted extensive experiments to evaluate our algorithms, and experimental results show that our SAIL algorithms are much faster than well-known IP lookup algorithms. Next, we switch our focus to DDoS, an attempt to disrupt the legitimate traffic of a victim by sending a flood of Internet traffic from different sources. Our solution is Gatekeeper, the first open-source and deployable DDoS mitigation system. We present a series of optimization techniques, including use of modern platforms, group prefetching, coroutines, and hashing, to accelerate Gatekeeper. Experimental results show that these optimization techniques significantly improve its performance over alternative baseline solutions.2022-01-30T00:00:00

Boston University Institutional Repository (OpenBU)

GPU Accelerated protocol analysis for large and long-term traffic traces

Author: Nottingham Alastair Timothy
Publication venue: Faculty of Science, Computer Science
Publication date: 01/01/2016
Field of study

This thesis describes the design and implementation of GPF+, a complete general packet classification system developed using Nvidia CUDA for Compute Capability 3.5+ GPUs. This system was developed with the aim of accelerating the analysis of arbitrary network protocols within network traffic traces using inexpensive, massively parallel commodity hardware. GPF+ and its supporting components are specifically intended to support the processing of large, long-term network packet traces such as those produced by network telescopes, which are currently difficult and time consuming to analyse. The GPF+ classifier is based on prior research in the field, which produced a prototype classifier called GPF, targeted at Compute Capability 1.3 GPUs. GPF+ greatly extends the GPF model, improving runtime flexibility and scalability, whilst maintaining high execution efficiency. GPF+ incorporates a compact, lightweight registerbased state machine that supports massively-parallel, multi-match filter predicate evaluation, as well as efficient arbitrary field extraction. GPF+ tracks packet composition during execution, and adjusts processing at runtime to avoid redundant memory transactions and unnecessary computation through warp-voting. GPF+ additionally incorporates a 128-bit in-thread cache, accelerated through register shuffling, to accelerate access to packet data in slow GPU global memory. GPF+ uses a high-level DSL to simplify protocol and filter creation, whilst better facilitating protocol reuse. The system is supported by a pipeline of multi-threaded high-performance host components, which communicate asynchronously through 0MQ messaging middleware to buffer, index, and dispatch packet data on the host system. The system was evaluated using high-end Kepler (Nvidia GTX Titan) and entry level Maxwell (Nvidia GTX 750) GPUs. The results of this evaluation showed high system performance, limited only by device side IO (600MBps) in all tests. GPF+ maintained high occupancy and device utilisation in all tests, without significant serialisation, and showed improved scaling to more complex filter sets. Results were used to visualise captures of up to 160 GB in seconds, and to extract and pre-filter captures small enough to be easily analysed in applications such as Wireshark

South East Academic Libraries System (SEALS)

Rhodes Repository (SEALS)

Runtime Systems for Persistent Memories

Author: Gogte Vaibhav
Publication venue
Publication date: 01/01/2019
Field of study

Emerging persistent memory (PM) technologies promise the performance of DRAM with the durability of disk. However, several challenges remain in existing hardware, programming, and software systems that inhibit wide-scale PM adoption. This thesis focuses on building efficient mechanisms that span hardware and operating systems, and programming languages for integrating PMs in future systems. First, this thesis proposes a mechanism to solve low-endurance problem in PMs. PMs suffer from limited write endurance---PM cells can be written only 10^7-10^9 times before they wear out. Without any wear management, PM lifetime might be as low as 1.1 months. This thesis presents Kevlar, an OS-based wear-management technique for PM, that requires no new hardware. Kevlar uses existing virtual memory mechanisms to remap pages, enabling it to perform both wear leveling---shuffling pages in PM to even wear; and wear reduction---transparently migrating heavily written pages to DRAM. Crucially, Kevlar avoids the need for hardware support to track wear at fine grain. It relies on a novel wear-estimation technique that builds upon Intel's Precise Event Based Sampling to approximately track processor cache contents via a software-maintained Bloom filter and estimate write-back rates at fine grain. Second, this thesis proposes a persistency model for high-level languages to enable integration of PMs in to future programming systems. Prior works extend language memory models with a persistency model prescribing semantics for updates to PM. These approaches require high-overhead mechanisms, are restricted to certain synchronization constructs, provide incomplete semantics, and/or may recover to state that cannot arise in fault-free program execution. This thesis argues for persistency semantics that guarantee failure atomicity of synchronization-free regions (SFRs) --- program regions delimited by synchronization operations. The proposed approach provides clear semantics for the PM state that recovery code may observe and extends C++11's "sequential consistency for data-race-free" guarantee to post-failure recovery code. To this end, this thesis investigates two designs for failure-atomic SFRs that vary in performance and the degree to which commit of persistent state may lag execution. Finally, this thesis proposes StrandWeaver, a hardware persistency model that minimally constrains ordering on PM operations. Several language-level persistency models have emerged recently to aid programming recoverable data structures in PM. The language-level persistency models are built upon hardware primitives that impose stricter ordering constraints on PM operations than the persistency models require. StrandWeaver manages PM order within a strand, a logically independent sequence of PM operations within a thread. PM operations that lie on separate strands are unordered and may drain concurrently to PM. StrandWeaver implements primitives under strand persistency to allow programmers to improve concurrency and relax ordering constraints on updates as they drain to PM. Furthermore, StrandWeaver proposes mechanisms that map persistency semantics in high-level language persistency models to the primitives implemented by StrandWeaver.PHDComputer Science & EngineeringUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttps://deepblue.lib.umich.edu/bitstream/2027.42/155100/1/vgogte_1.pd

Deep Blue Documents at the University of Michigan

Memory Safety Acceleration on RISC-V for C Programming Language

Author: Dow Kenny
Publication venue: UNSW, Sydney
Publication date: 01/01/2023
Field of study

Memory corruption vulnerabilities can lead to memory attacks. Three of the top ten most dangerous weaknesses in computer security are memory-related. Memory attack is one of a computer system’s oldest but everlasting problems. Companies and the government lost billions of dollars due to memory security breaches. Memory safety is paramount to securing memory systems. Pointer-based memory safety protection has been shown as a promising solution covering both out-of-bounds and use-after-free errors. However, pointer-based memory safety relies on additional information (metadata) to check validity when a pointer is dereferenced. Such operations on the metadata introduce significant performance overhead to the system. Existing hardware/software implementations are primarily limited to proprietary closed-source microprocessors, simulation-only studies, or require changes to the input source code. In order to provide the need for memory security, we created a memory-safe RISC-V platform with low-performance overhead. In this thesis, a novel hardware/software co-design methodology consisting of a RISC-V based processor is extended with new instructions and microarchitecture enhancements, enabling complete memory safety in the C programming language and faster memory safety checks. Furthermore, a compiler is instrumented to provide security operations considering the changes to the processor. Moreover, a design exploration framework is proposed to provide an in-depth search for optimal hardware/software configuration for application-specific workloads regarding performance overhead, security coverage, area cost, and critical path latency. The entire system is realized by enhancing a RISC-V Rocket-chip system-on-chip (SoC). The resultant processor SoC is implemented on an FPGA and evaluated with applications from SPEC 2006 (for generic applications), MiBench (for embedded applications), and Olden benchmark suites for performance. The system, including the RISC-V CHISEL, compiler, profiling and analysis tool-chain, is fully available and open-source to the public

UNSWorks

IP address lookup using bit-shuffled trie

Author: Bando
Broder
Brown
Chang
Degermark
Eatherton
Gupta
Hsieh
Jiang
Kuon
Lampson
Lim
Martinez
Nilsson
Pao
Pao
Pao
Ravikumar
Song
Srinivasan
Sun
Taylor
Wadvogel
Publication venue: 'Elsevier BV'
Publication date
Field of study

Crossref

A Framework for Evolutionary Optimization Applications in Water Distribution Systems

Author: Morley Mark S.
Publication venue: School of Engineering, Computing and Mathematics
Publication date: 21/03/2013
Field of study

The application of optimization to Water Distribution Systems encompasses the use of computer-based techniques to problems of many different areas of system design, maintenance and operational management. As well as laying out the configuration of new WDS networks, optimization is commonly needed to assist in the rehabilitation or reinforcement of existing network infrastructure in which alternative scenarios driven by investment constraints and hydraulic performance are used to demonstrate a cost-benefit relationship between different network intervention strategies. Moreover, the ongoing operation of a WDS is also subject to optimization, particularly with respect to the minimization of energy costs associated with pumping and storage and the calibration of hydraulic network models to match observed field data. Increasingly, Evolutionary Optimization techniques, of which Genetic Algorithms are the best-known examples, are applied to aid practitioners in these facets of design, management and operation of water distribution networks as part of Decision Support Systems (DSS). Evolutionary Optimization employs processes akin to those of natural selection and “survival of the fittest” to manipulate a population of individual solutions, which, over time, “evolve” towards optimal solutions. Such algorithms are characterized, however, by large numbers of function evaluations. This, coupled with the computational complexity associated with the hydraulic simulation of water networks incurs significant computational overheads, can limit the applicability and scalability of this technology in this domain. Accordingly, this thesis presents a methodology for applying Genetic Algorithms to Water Distribution Systems. A number of new procedures are presented for improving the performance of such algorithms when applied to complex engineering problems. These techniques approach the problem of minimising the impact of the inherent computational complexity of these problems from a number of angles. A novel genetic representation is presented which combines the algorithmic simplicity of the classical binary string of the Genetic Algorithm with the performance advantages inherent in an integer-based representation. Further algorithmic improvements are demonstrated with an intelligent mutation operator that “learns” which genes have the greatest impact on the quality of a solution and concentrates the mutation operations on those genes. A technique for implementing caching of solutions – recalling the results for solutions that have already been calculated - is demonstrated to reduce runtimes for Genetic Algorithms where applied to problems with significant computation complexity in their evaluation functions. A novel reformulation of the Genetic Algorithm for implementing robust stochastic optimizations is presented which employs the caching technology developed to produce an multiple-objective optimization methodology that demonstrates dramatically improved quality of solutions for given runtime of the algorithm. These extensions to the Genetic Algorithm techniques are coupled with a supporting software library that represents a standardized modelling architecture for the representation of connected networks. This library gives rise to a system for distributing the computational load of hydraulic simulations across a network of computers. This methodology is established to provide a viable, scalable technique for accelerating evolutionary optimization applications.Engineering and Physical Sciences Research Council, UK

Open Research Exeter

Practical Private Information Retrieval

Author: Olumofin Femi George
Publication venue: 'University of Waterloo'
Publication date: 01/01/2011
Field of study

In recent years, the subject of online privacy has been attracting much interest, especially as more Internet users than ever are beginning to care about the privacy of their online activities. Privacy concerns are even prompting legislators in some countries to demand from service providers a more privacy-friendly Internet experience for their citizens. These are welcomed developments and in stark contrast to the practice of Internet censorship and surveillance that legislators in some nations have been known to promote. The development of Internet systems that are able to protect user privacy requires private information retrieval (PIR) schemes that are practical, because no other efficient techniques exist for preserving the confidentiality of the retrieval requests and responses of a user from an Internet system holding unencrypted data. This thesis studies how PIR schemes can be made more relevant and practical for the development of systems that are protective of users' privacy. Private information retrieval schemes are cryptographic constructions for retrieving data from a database, without the database (or database administrator) being able to learn any information about the content of the query. PIR can be applied to preserve the confidentiality of queries to online data sources in many domains, such as online patents, real-time stock quotes, Internet domain names, location-based services, online behavioural profiling and advertising, search engines, and so on. In this thesis, we study private information retrieval and obtain results that seek to make PIR more relevant in practice than all previous treatments of the subject in the literature, which have been mostly theoretical. We also show that PIR is the most computationally efficient known technique for providing access privacy under realistic computation powers and network bandwidths. Our result covers all currently known varieties of PIR schemes. We provide a more detailed summary of our contributions below: Our first result addresses an existing question regarding the computational practicality of private information retrieval schemes. We show that, unlike previously argued, recent lattice-based computational PIR schemes and multi-server information-theoretic PIR schemes are much more computationally efficient than a trivial transfer of the entire PIR database from the server to the client (i.e., trivial download). Our result shows the end-to-end response times of these schemes are one to three orders of magnitude (10--1000 times) smaller than the trivial download of the database for realistic computation powers and network bandwidths. This result extends and clarifies the well-known result of Sion and Carbunar on the computational practicality of PIR. Our second result is a novel approach for preserving the privacy of sensitive constants in an SQL query, which improves substantially upon the earlier work. Specifically, we provide an expressive data access model of SQL atop of the existing rudimentary index- and keyword-based data access models of PIR. The expressive SQL-based model developed results in between 7 and 480 times improvement in query throughput than previous work. We then provide a PIR-based approach for preserving access privacy over large databases. Unlike previously published access privacy approaches, we explore new ideas about privacy-preserving constraint-based query transformations, offline data classification, and privacy-preserving queries to index structures much smaller than the databases. This work addresses an important open problem about how real systems can systematically apply existing PIR schemes for querying large databases. In terms of applications, we apply PIR to solve user privacy problem in the domains of patent database query and location-based services, user and database privacy problems in the domain of the online sales of digital goods, and a scalability problem for the Tor anonymous communication network. We develop practical tools for most of our techniques, which can be useful for adding PIR support to existing and new Internet system designs

University of Waterloo's Institutional Repository

43rd International Symposium on Mathematical Foundations of Computer Science: MFCS 2018, August 27-31, 2018, Liverpool, United Kingdom

Author: International Symposium on Mathematical Foundations of Computer Science <43. 2018, Liverpool>
Publication venue: Schloss Dagstuhl - Leibniz-Zentrum für Informatik GmbH, Dagstuhl Publishing
Publication date: 01/08/2018
Field of study

Digitale Bibliothek Thüringen

Recommended from our members

Computational Methods in Multi-Messenger Astrophysics using Gravitational Waves and High Energy Neutrinos

Author: Countryman Stefan Trklja
Publication venue
Publication date: 01/01/2023
Field of study

This dissertation seeks to describe advancements made in computational methods for multi-messenger astrophysics (MMA) using gravitational waves GW and neutrinos during Advanced LIGO (aLIGO)’s first through third observing runs (O1-O3) and, looking forward, to describe novel computational techniques suited to the challenges of both the burgeoning MMA field and high-performance computing as a whole. The first two chapters provide an overview of MMA as it pertains to gravitational wave/high energy neutrino (GWHEN) searches, including a summary of expected astrophysical sources as well as GW, neutrino, and gamma-ray detectors used in their detection. These are followed in the third chapter by an in-depth discussion of LIGO’s timing system, particularly the diagnostic subsystem, describing both its role in MMA searches and the author’s contributions to the system itself. The fourth chapter provides a detailed description of the Low-Latency Algorithm for Multi-messenger Astrophysics (LLAMA), the GWHEN pipeline developed by the author and used in O2 and O3. Relevant past multi-messenger searches are described first, followed by the O2 and O3 analysis methods, the pipeline’s performance, scientific results, and finally, an in-depth account of the library’s structure and functionality. In particular, the author’s high-performance multi-order coordinates (MOC) HEALPix image analysis library, HPMOC, is described. HPMOC increases performance of HEALPix image manipulations by several orders of magnitude vs. naive single-resolution approaches while presenting a simple high-level interface and should prove useful for diverse future MMA searches. The performance improvements it provides for LLAMA are also covered. The final chapter of this dissertation builds on the approaches taken in developing HPMOC, presenting several novel methods for efficiently storing and analyzing large data sets, with applications to MMA and other data-intensive fields. A family of depth-first multi-resolution ordering of HEALPix images — DEPTH9, DEPTH19, and DEPTH40 — is defined, along with algorithms and use cases where it can improve on current approaches, including high-speed streaming calculations suitable for serverless compute or FPGAs. For performance-constrained analyses on HEALPix data (e.g. image analysis in multi-messenger search pipelines) using SIMD processors, breadth-first data structures can provide short-circuiting calculations in a data-parallel way on compressed data; a simple compression method is described with application to further improving LLAMA performance. A new storage scheme and associated algorithms for efficiently compressing and contracting tensors of varying sparsity is presented; these demuxed tensors (D-Tensors) have equivalent asymptotic time and space complexity to optimal representations of both dense and sparse matrices, and could be used as a universal drop-in replacement to reduce code complexity and developer effort while improving performance of existing non-optimized numerical code. Finally, the big bucket hash table (B-Table), a novel type of hash table making guarantees on data layout (vs. load factor), is described, along with optimizations it allows for (like hardware acceleration, online rebuilds, and hard realtime applications) that are not possible with existing hash table approaches. These innovations are presented in the hope that some will prove useful for improving future MMA searches and other data-intensive applications

Columbia University Academic Commons