8 research outputs found
Extending the PCIe Interface with Parallel Compression/Decompression Hardware for Energy and Performance Optimization
PCIe is a high-performing interface used to move data from a central host PC to an accelerator such as Field Programmable Gate Arrays (FPGA). This interface allows a system to perform fast data transfers in High-Performance Computing (HPC) and provide a performance boost. However, HPC systems normally require large datasets, and in these situations PCIe can become a bottleneck. To address this issue, we propose an open-source hardware compression/decompression system that can be used to adapt with continuously-streamed data with low latency and high throughput. We implement a compressor and decompressor engines on FPGA, scale up with multiple engines working in parallel, and evaluate the energy reduction and performance with different numbers of multiple engines. To alleviate the performance bottleneck in the processor acting as a controller, we propose a hardware scheduler to fairly distribute the datasets among the engines. Our design reduces the transmission time in PCIe, and the results show an energy reduction of up to 48% in the PCIe transfers, thanks to the decrease in the number of bits that have to be transmitted. The overhead in terms of latency is maintained to a minimum and user selectable depending on the tolerances of the intended application
CHERI: A hybrid capability-system architecture for scalable software compartmentalization
CHERI extends a conventional RISC Instruction-
Set Architecture, compiler, and operating system to support
fine-grained, capability-based memory protection to mitigate
memory-related vulnerabilities in C-language TCBs. We describe
how CHERI capabilities can also underpin a hardware-software
object-capability model for application compartmentalization
that can mitigate broader classes of attack. Prototyped as an
extension to the open-source 64-bit BERI RISC FPGA softcore
processor, FreeBSD operating system, and LLVM compiler,
we demonstrate multiple orders-of-magnitude improvement in
scalability, simplified programmability, and resulting tangible
security benefits as compared to compartmentalization based on
pure Memory-Management Unit (MMU) designs. We evaluate
incrementally deployable CHERI-based compartmentalization
using several real-world UNIX libraries and applications.We thank our colleagues Ross Anderson, Ruslan Bukin,
Gregory Chadwick, Steve Hand, Alexandre Joannou, Chris
Kitching, Wojciech Koszek, Bob Laddaga, Patrick Lincoln,
Ilias Marinos, A Theodore Markettos, Ed Maste, Andrew W.
Moore, Alan Mujumdar, Prashanth Mundkur, Colin Rothwell,
Philip Paeps, Jeunese Payne, Hassen Saidi, Howie Shrobe, and
Bjoern Zeeb, our anonymous reviewers, and shepherd Frank
Piessens, for their feedback and assistance. This work is part of
the CTSRD and MRC2 projects sponsored by the Defense Advanced
Research Projects Agency (DARPA) and the Air Force
Research Laboratory (AFRL), under contracts FA8750-10-C-
0237 and FA8750-11-C-0249. The views, opinions, and/or
findings contained in this paper are those of the authors and
should not be interpreted as representing the official views
or policies, either expressed or implied, of the Department
of Defense or the U.S. Government. We acknowledge the EPSRC
REMS Programme Grant [EP/K008528/1], Isaac Newton
Trust, UK Higher Education Innovation Fund (HEIF), Thales
E-Security, and Google, Inc.This is the author accepted manuscript. The final version is available at http://dx.doi.org/10.1109/SP.2015.
A Reconfigurable Fabric for Accelerating Large-Scale Datacenter Services
Datacenter workloads demand high computational capabilities, flexibility, power efficiency, and low cost. It is challenging to improve all of these factors simultaneously. To advance datacenter capabilities beyond what commodity server designs can provide, we designed and built a composable, reconfigurable hardware fabric based on field programmable gate arrays (FPGA). Each server in the fabric contains one FPGA, and all FPGAs within a 48-server rack are interconnected over a low-latency, high-bandwidth network. We describe a medium-scale deployment of this fabric on a bed of 1632 servers, and measure its effectiveness in accelerating the ranking component of the Bing web search engine. We describe the requirements and architecture of the system, detail the critical engineering challenges and solutions needed to make the system robust in the presence of failures, and measure the performance, power, and resilience of the system. Under high load, the large-scale reconfigurable fabric improves the ranking throughput of each server by 95% at a desirable latency distribution or reduces tail latency by 29% at a fixed throughput. In other words, the reconfigurable fabric enables the same throughput using only half the number of servers
Compressed kNN: K-Nearest Neighbors with Data Compression
The kNN (k-nearest neighbors) classification algorithm is one of the most widely used non-parametric classification methods, however it is limited due to memory consumption related to the size of the dataset, which makes them impractical to apply to large volumes of data. Variations of this method have been proposed, such as condensed KNN which divides the training dataset into clusters to be classified, other variations reduce the input dataset in order to apply the algorithm. This paper presents a variation of the kNN algorithm, of the type structure less NN, to work with categorical data. Categorical data, due to their nature, can be compressed in order to decrease the memory requirements at the time of executing the classification. The method proposes a previous phase of compression of the data to then apply the algorithm on the compressed data. This allows us to maintain the whole dataset in memory which leads to a considerable reduction of the amount of memory required. Experiments and tests carried out on known datasets show the reduction in the volume of information stored in memory and maintain the accuracy of the classification. They also show a slight decrease in processing time because the information is decompressed in real time (on-the-fly) while the algorithm is running
Content-aware compression for big textual data analysis
A substantial amount of information on the Internet is present in the form of text. The value of this semi-structured and unstructured data has been widely acknowledged, with consequent scientific and commercial exploitation. The ever-increasing data production, however, pushes data analytic platforms to their limit. This thesis proposes techniques for more efficient textual big data analysis suitable for the Hadoop analytic platform. This research explores the direct processing of compressed textual data. The focus is on developing novel compression methods with a number of desirable properties to support text-based big data analysis in distributed environments. The novel contributions of this work include the following. Firstly, a Content-aware Partial Compression (CaPC) scheme is developed. CaPC makes a distinction between informational and functional content in which only the informational content is compressed. Thus, the compressed data is made transparent to existing software libraries which often rely on functional content to work. Secondly, a context-free bit-oriented compression scheme (Approximated Huffman Compression) based on the Huffman algorithm is developed. This uses a hybrid data structure that allows pattern searching in compressed data in linear time. Thirdly, several modern compression schemes have been extended so that the compressed data can be safely split with respect to logical data records in distributed file systems. Furthermore, an innovative two layer compression architecture is used, in which each compression layer is appropriate for the corresponding stage of data processing. Peripheral libraries are developed that seamlessly link the proposed compression schemes to existing analytic platforms and computational frameworks, and also make the use of the compressed data transparent to developers. The compression schemes have been evaluated for a number of standard MapReduce analysis tasks using a collection of real-world datasets. In comparison with existing solutions, they have shown substantial improvement in performance and significant reduction in system resource requirements
Understanding and Optimizing Flash-based Key-value Systems in Data Centers
Flash-based key-value systems are widely deployed in today’s data centers for providing high-speed data processing services. These systems deploy flash-friendly data structures, such as slab and Log Structured Merge(LSM) tree, on flash-based Solid State Drives(SSDs) and provide efficient solutions in caching and storage scenarios. With the rapid evolution of data centers, there appear plenty of challenges and opportunities for future optimizations.
In this dissertation, we focus on understanding and optimizing flash-based key-value systems from the perspective of workloads, software, and hardware as data centers evolve. We first propose an on-line compression scheme, called SlimCache, considering the unique characteristics of key-value workloads, to virtually enlarge the cache space, increase the hit ratio, and improve the cache performance. Furthermore, to appropriately configure increasingly complex modern key-value data systems, which can have more than 50 parameters with additional hardware and system settings, we quantitatively study and compare five multi-objective optimization methods for auto-tuning the performance of an LSM-tree based key-value store in terms of throughput, the 99th percentile tail latency, convergence time, real-time system throughput, and the iteration process, etc. Last but not least, we conduct an in-depth, comprehensive measurement work on flash-optimized key-value stores with recently emerging 3D XPoint SSDs. We reveal several unexpected bottlenecks in the current key-value store design and present three exemplary case studies to showcase the efficacy of removing these bottlenecks with simple methods on 3D XPoint SSDs. Our experimental results show that our proposed solutions significantly outperform traditional methods. Our study also contributes to providing system implications for auto-tuning the key-value system on flash-based SSDs and optimizing it on revolutionary 3D XPoint based SSDs
Recommended from our members
System Design and Implementation for Hybrid Network Function Virtualization
With the application of virtualization technology in computer networks, many new research areas and techniques have been explored, such as network function virtualization (NFV). A significant benefit of virtualization is that it reduces the cost of a network system and increases its flexibility. Due to the increasing complexity of the network environment and constantly improving network scale and bandwidth, it is imperative to aim for higher performance, extensibility, and flexibility in the future network systems. In this dissertation, hybrid NFV platforms applying virtualization technology are proposed. We further explore the techniques used to improve the performance, scalability and resilience of these systems.
In the first part of this dissertation, we describe a new heterogeneous hardware-software NFV platform that provides scalability and programmability while supporting significant hardware-level parallelism and reconfiguration. Our computing platform takes advantage of both field-programmable gate arrays (FPGAs) and microprocessors to implement numerous virtual network functions (VNFs) that can be dynamically customized to specific network flow needs. Traffic management and hardware reconfiguration functions are performed by a global coordinator which allows for the rapid sharing of network function states and continuous evaluation of network function needs. With the help of state sharing mechanism offered by the coordinator, customer-defined VNF instances can be easily migrated between heterogeneous middleboxes as the network environment changes. A resource allocation algorithm dynamically assesses resource deployments as network flows and conditions are updated.
In the second part of this thesis document, we explore a new session-level approach for NFV that implements distributed agents in heterogeneous middleboxes to steer packets belonging to different sessions through session-specific service chains. Our session-level approach supports inter-domain service chaining with both FPGA- and processor-based middleboxes, dynamic reconfiguration of service chains for ongoing sessions, and the application of session-level approaches for UDP-based protocols. To demonstrate our approach, we establish inter-domain service chains for QUIC sessions, and reconfigure the service chains across a range of FPGA- and processor-based middleboxes. We show that our session-level approach can successfully reconfigure service chains for individual QUIC sessions. Compared with software implementations, the distributed agents implemented on FPGAs show better performance in various test scenarios
Shortest Route at Dynamic Location with Node Combination-Dijkstra Algorithm
Abstract— Online transportation has become a basic
requirement of the general public in support of all activities to go
to work, school or vacation to the sights. Public transportation
services compete to provide the best service so that consumers
feel comfortable using the services offered, so that all activities
are noticed, one of them is the search for the shortest route in
picking the buyer or delivering to the destination. Node
Combination method can minimize memory usage and this
methode is more optimal when compared to A* and Ant Colony
in the shortest route search like Dijkstra algorithm, but can’t
store the history node that has been passed. Therefore, using
node combination algorithm is very good in searching the
shortest distance is not the shortest route. This paper is
structured to modify the node combination algorithm to solve the
problem of finding the shortest route at the dynamic location
obtained from the transport fleet by displaying the nodes that
have the shortest distance and will be implemented in the
geographic information system in the form of map to facilitate
the use of the system.
Keywords— Shortest Path, Algorithm Dijkstra, Node
Combination, Dynamic Location (key words