Search CORE

8 research outputs found

Extending the PCIe Interface with Parallel Compression/Decompression Hardware for Energy and Performance Optimization

Author: Mohd Amiruddin Zainol, Jose Luis Nunez-Yanez
Publication venue: Auricle Global Society of Education and Research
Publication date: 26/02/2018
Field of study

PCIe is a high-performing interface used to move data from a central host PC to an accelerator such as Field Programmable Gate Arrays (FPGA). This interface allows a system to perform fast data transfers in High-Performance Computing (HPC) and provide a performance boost. However, HPC systems normally require large datasets, and in these situations PCIe can become a bottleneck. To address this issue, we propose an open-source hardware compression/decompression system that can be used to adapt with continuously-streamed data with low latency and high throughput. We implement a compressor and decompressor engines on FPGA, scale up with multiple engines working in parallel, and evaluate the energy reduction and performance with different numbers of multiple engines. To alleviate the performance bottleneck in the processor acting as a controller, we propose a hardware scheduler to fairly distribute the datasets among the engines. Our design reduces the transmission time in PCIe, and the results show an energy reduction of up to 48% in the PCIe transfers, thanks to the decrease in the number of bits that have to be transmitted. The overhead in terms of latency is maintained to a minimum and user selectable depending on the tolerances of the intended application

International Journal on Future Revolution in Computer Science & Communication Engineering

CHERI: A hybrid capability-system architecture for scalable software compartmentalization

Author: Anderson J
Chisnall D
Dave N
Davis B
Gudka K
Laurie B
Moore SW
Murdoch SJ
Neumann PG
Norton R
Roe M
Son S
Vadera M
Watson RNM
Woodruff J
Publication venue: Proceedings - IEEE Symposium on Security and Privacy
Publication date: 01/01/2015
Field of study

CHERI extends a conventional RISC Instruction- Set Architecture, compiler, and operating system to support fine-grained, capability-based memory protection to mitigate memory-related vulnerabilities in C-language TCBs. We describe how CHERI capabilities can also underpin a hardware-software object-capability model for application compartmentalization that can mitigate broader classes of attack. Prototyped as an extension to the open-source 64-bit BERI RISC FPGA softcore processor, FreeBSD operating system, and LLVM compiler, we demonstrate multiple orders-of-magnitude improvement in scalability, simplified programmability, and resulting tangible security benefits as compared to compartmentalization based on pure Memory-Management Unit (MMU) designs. We evaluate incrementally deployable CHERI-based compartmentalization using several real-world UNIX libraries and applications.We thank our colleagues Ross Anderson, Ruslan Bukin, Gregory Chadwick, Steve Hand, Alexandre Joannou, Chris Kitching, Wojciech Koszek, Bob Laddaga, Patrick Lincoln, Ilias Marinos, A Theodore Markettos, Ed Maste, Andrew W. Moore, Alan Mujumdar, Prashanth Mundkur, Colin Rothwell, Philip Paeps, Jeunese Payne, Hassen Saidi, Howie Shrobe, and Bjoern Zeeb, our anonymous reviewers, and shepherd Frank Piessens, for their feedback and assistance. This work is part of the CTSRD and MRC2 projects sponsored by the Defense Advanced Research Projects Agency (DARPA) and the Air Force Research Laboratory (AFRL), under contracts FA8750-10-C- 0237 and FA8750-11-C-0249. The views, opinions, and/or findings contained in this paper are those of the authors and should not be interpreted as representing the official views or policies, either expressed or implied, of the Department of Defense or the U.S. Government. We acknowledge the EPSRC REMS Programme Grant [EP/K008528/1], Isaac Newton Trust, UK Higher Education Innovation Fund (HEIF), Thales E-Security, and Google, Inc.This is the author accepted manuscript. The final version is available at http://dx.doi.org/10.1109/SP.2015.

UCL Discovery

Apollo (Cambridge)

A Reconfigurable Fabric for Accelerating Large-Scale Datacenter Services

Author: Burger Doug
Caulfield Adrian M.
Chiou Derek
Chung Eric S.
Constantinides Kypros
Demme John
Esmaeilzadeh Hadi
Fowers Jeremy
Gopal Gopi Prashanth
Gray Jan
Haselman Michael
Hauck Scott
Heil Stephen
Hormati Amir
Kim Joo-Young
Lanka Sitaram
Larus James
Peterson Eric
Pope Simon
Putnam Andrew
Smith Aaron
Thong Jason
Xiao Phillip Yi
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 24/01/2017
Field of study

Datacenter workloads demand high computational capabilities, flexibility, power efficiency, and low cost. It is challenging to improve all of these factors simultaneously. To advance datacenter capabilities beyond what commodity server designs can provide, we designed and built a composable, reconfigurable hardware fabric based on field programmable gate arrays (FPGA). Each server in the fabric contains one FPGA, and all FPGAs within a 48-server rack are interconnected over a low-latency, high-bandwidth network. We describe a medium-scale deployment of this fabric on a bed of 1632 servers, and measure its effectiveness in accelerating the ranking component of the Bing web search engine. We describe the requirements and architecture of the system, detail the critical engineering challenges and solutions needed to make the system robust in the presence of failures, and measure the performance, power, and resilience of the system. Under high load, the large-scale reconfigurable fabric improves the ranking throughput of each server by 95% at a desirable latency distribution or reduces tail latency by 29% at a fixed throughput. In other words, the reconfigurable fabric enables the same throughput using only half the number of servers

Infoscience - École polytechnique fédérale de Lausanne

Compressed kNN: K-Nearest Neighbors with Data Compression

Author: Garcia-Rodriguez Jose
Ruiz Zoila
Salvador Jaime
Publication venue: 'MDPI AG'
Publication date: 01/01/2019
Field of study

The kNN (k-nearest neighbors) classification algorithm is one of the most widely used non-parametric classification methods, however it is limited due to memory consumption related to the size of the dataset, which makes them impractical to apply to large volumes of data. Variations of this method have been proposed, such as condensed KNN which divides the training dataset into clusters to be classified, other variations reduce the input dataset in order to apply the algorithm. This paper presents a variation of the kNN algorithm, of the type structure less NN, to work with categorical data. Categorical data, due to their nature, can be compressed in order to decrease the memory requirements at the time of executing the classification. The method proposes a previous phase of compression of the data to then apply the algorithm on the compressed data. This allows us to maintain the whole dataset in memory which leads to a considerable reduction of the amount of memory required. Experiments and tests carried out on known datasets show the reduction in the volume of information stored in memory and maintain the accuracy of the classification. They also show a slight decrease in processing time because the information is decompressed in real time (on-the-fly) while the algorithm is running

Repositorio Institucional de la Universidad de Alicante

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

Directory of Open Access Journals

Content-aware compression for big textual data analysis

Author: Dong Dapeng
Publication venue: 'University College Cork'
Publication date: 01/01/2016
Field of study

A substantial amount of information on the Internet is present in the form of text. The value of this semi-structured and unstructured data has been widely acknowledged, with consequent scientific and commercial exploitation. The ever-increasing data production, however, pushes data analytic platforms to their limit. This thesis proposes techniques for more efficient textual big data analysis suitable for the Hadoop analytic platform. This research explores the direct processing of compressed textual data. The focus is on developing novel compression methods with a number of desirable properties to support text-based big data analysis in distributed environments. The novel contributions of this work include the following. Firstly, a Content-aware Partial Compression (CaPC) scheme is developed. CaPC makes a distinction between informational and functional content in which only the informational content is compressed. Thus, the compressed data is made transparent to existing software libraries which often rely on functional content to work. Secondly, a context-free bit-oriented compression scheme (Approximated Huffman Compression) based on the Huffman algorithm is developed. This uses a hybrid data structure that allows pattern searching in compressed data in linear time. Thirdly, several modern compression schemes have been extended so that the compressed data can be safely split with respect to logical data records in distributed file systems. Furthermore, an innovative two layer compression architecture is used, in which each compression layer is appropriate for the corresponding stage of data processing. Peripheral libraries are developed that seamlessly link the proposed compression schemes to existing analytic platforms and computational frameworks, and also make the use of the compressed data transparent to developers. The compression schemes have been evaluated for a number of standard MapReduce analysis tasks using a collection of real-world datasets. In comparison with existing solutions, they have shown substantial improvement in performance and significant reduction in system resource requirements

Irish Universities

Cork Open Research Archive

Understanding and Optimizing Flash-based Key-value Systems in Data Centers

Author: Jia Yichen
Publication venue: LSU Digital Commons
Publication date: 09/03/2020
Field of study

Flash-based key-value systems are widely deployed in today’s data centers for providing high-speed data processing services. These systems deploy flash-friendly data structures, such as slab and Log Structured Merge(LSM) tree, on flash-based Solid State Drives(SSDs) and provide efficient solutions in caching and storage scenarios. With the rapid evolution of data centers, there appear plenty of challenges and opportunities for future optimizations. In this dissertation, we focus on understanding and optimizing flash-based key-value systems from the perspective of workloads, software, and hardware as data centers evolve. We first propose an on-line compression scheme, called SlimCache, considering the unique characteristics of key-value workloads, to virtually enlarge the cache space, increase the hit ratio, and improve the cache performance. Furthermore, to appropriately configure increasingly complex modern key-value data systems, which can have more than 50 parameters with additional hardware and system settings, we quantitatively study and compare five multi-objective optimization methods for auto-tuning the performance of an LSM-tree based key-value store in terms of throughput, the 99th percentile tail latency, convergence time, real-time system throughput, and the iteration process, etc. Last but not least, we conduct an in-depth, comprehensive measurement work on flash-optimized key-value stores with recently emerging 3D XPoint SSDs. We reveal several unexpected bottlenecks in the current key-value store design and present three exemplary case studies to showcase the efficacy of removing these bottlenecks with simple methods on 3D XPoint SSDs. Our experimental results show that our proposed solutions significantly outperform traditional methods. Our study also contributes to providing system implications for auto-tuning the key-value system on flash-based SSDs and optimizing it on revolutionary 3D XPoint based SSDs

Louisiana State University

Recommended from our members

System Design and Implementation for Hybrid Network Function Virtualization

Author: Zhang Xuzhi
Publication venue: ScholarWorks@UMass Amherst
Publication date: 18/12/2020
Field of study

With the application of virtualization technology in computer networks, many new research areas and techniques have been explored, such as network function virtualization (NFV). A significant benefit of virtualization is that it reduces the cost of a network system and increases its flexibility. Due to the increasing complexity of the network environment and constantly improving network scale and bandwidth, it is imperative to aim for higher performance, extensibility, and flexibility in the future network systems. In this dissertation, hybrid NFV platforms applying virtualization technology are proposed. We further explore the techniques used to improve the performance, scalability and resilience of these systems. In the first part of this dissertation, we describe a new heterogeneous hardware-software NFV platform that provides scalability and programmability while supporting significant hardware-level parallelism and reconfiguration. Our computing platform takes advantage of both field-programmable gate arrays (FPGAs) and microprocessors to implement numerous virtual network functions (VNFs) that can be dynamically customized to specific network flow needs. Traffic management and hardware reconfiguration functions are performed by a global coordinator which allows for the rapid sharing of network function states and continuous evaluation of network function needs. With the help of state sharing mechanism offered by the coordinator, customer-defined VNF instances can be easily migrated between heterogeneous middleboxes as the network environment changes. A resource allocation algorithm dynamically assesses resource deployments as network flows and conditions are updated. In the second part of this thesis document, we explore a new session-level approach for NFV that implements distributed agents in heterogeneous middleboxes to steer packets belonging to different sessions through session-specific service chains. Our session-level approach supports inter-domain service chaining with both FPGA- and processor-based middleboxes, dynamic reconfiguration of service chains for ongoing sessions, and the application of session-level approaches for UDP-based protocols. To demonstrate our approach, we establish inter-domain service chains for QUIC sessions, and reconfigure the service chains across a range of FPGA- and processor-based middleboxes. We show that our session-level approach can successfully reconfigure service chains for individual QUIC sessions. Compared with software implementations, the distributed agents implemented on FPGAs show better performance in various test scenarios

ScholarWorks@UMass Amherst

Shortest Route at Dynamic Location with Node Combination-Dijkstra Algorithm

Author: Fitro Achmad
Kusumaningrum Retno
Suryono Suryono
Publication venue
Publication date: 01/10/2018
Field of study

Abstract— Online transportation has become a basic requirement of the general public in support of all activities to go to work, school or vacation to the sights. Public transportation services compete to provide the best service so that consumers feel comfortable using the services offered, so that all activities are noticed, one of them is the search for the shortest route in picking the buyer or delivering to the destination. Node Combination method can minimize memory usage and this methode is more optimal when compared to A* and Ant Colony in the shortest route search like Dijkstra algorithm, but can’t store the history node that has been passed. Therefore, using node combination algorithm is very good in searching the shortest distance is not the shortest route. This paper is structured to modify the node combination algorithm to solve the problem of finding the shortest route at the dynamic location obtained from the transport fleet by displaying the nodes that have the shortest distance and will be implemented in the geographic information system in the form of map to facilitate the use of the system. Keywords— Shortest Path, Algorithm Dijkstra, Node Combination, Dynamic Location (key words

Politeknik NSC Surabay Repository