Search CORE

2,852 research outputs found

Moving Processing to Data: On the Influence of Processing in Memory on Data Management

Author: Koch Andreas
Petrov Ilia
Vincon Tobias
Publication venue
Publication date: 12/05/2019
Field of study

Near-Data Processing refers to an architectural hardware and software paradigm, based on the co-location of storage and compute units. Ideally, it will allow to execute application-defined data- or compute-intensive operations in-situ, i.e. within (or close to) the physical data storage. Thus, Near-Data Processing seeks to minimize expensive data movement, improving performance, scalability, and resource-efficiency. Processing-in-Memory is a sub-class of Near-Data processing that targets data processing directly within memory (DRAM) chips. The effective use of Near-Data Processing mandates new architectures, algorithms, interfaces, and development toolchains

arXiv.org e-Print Archive

Polystore++: Accelerated Polystore System for Heterogeneous Workloads

Author: Nardi Luigi
Olukotun Kunle
Shahbaz Muhammad
Singhal Rekha
Zhang Nathan
Publication venue
Publication date: 24/05/2019
Field of study

Modern real-time business analytic consist of heterogeneous workloads (e.g, database queries, graph processing, and machine learning). These analytic applications need programming environments that can capture all aspects of the constituent workloads (including data models they work on and movement of data across processing engines). Polystore systems suit such applications; however, these systems currently execute on CPUs and the slowdown of Moore's Law means they cannot meet the performance and efficiency requirements of modern workloads. We envision Polystore++, an architecture to accelerate existing polystore systems using hardware accelerators (e.g, FPGAs, CGRAs, and GPUs). Polystore++ systems can achieve high performance at low power by identifying and offloading components of a polystore system that are amenable to acceleration using specialized hardware. Building a Polystore++ system is challenging and introduces new research problems motivated by the use of hardware accelerators (e.g, optimizing and mapping query plans across heterogeneous computing units and exploiting hardware pipelining and parallelism to improve performance). In this paper, we discuss these challenges in detail and list possible approaches to address these problems.Comment: 11 pages, Accepted in ICDCS 201

arXiv.org e-Print Archive

Co-KV: A Collaborative Key-Value Store Using Near-Data Processing to Improve Compaction for the LSM-tree

Author: Huang Jianzhong
Liu Wei
Shi Weisong
Sun Hui
Publication venue
Publication date: 11/07/2018
Field of study

Log-structured merge tree (LSM-tree) based key-value stores are widely employed in large-scale storage systems. In the compaction of the key-value store, SSTables are merged with overlapping key ranges and sorted for data queries. This, however, incurs write amplification and thus degrades system performance, especially under update-intensive workloads. Current optimization focuses mostly on the reduction of the overload of compaction in the host, but rarely makes full use of computation in the device. To address these issues, we propose Co-KV, a Collaborative Key-Value store between the host and a near-data processing ( i.e., NDP) model based SSD to improve compaction. Co-KV offers three benefits: (1) reducing write amplification by a compaction offloading scheme between host and device; (2) relieving the overload of compaction in the host and leveraging computation in the SSD based on the NDP model; and (3) improving the performance of LSM-tree based key-value stores under update-intensive workloads. Extensive db_bench experiment show that Co-KV largely achieves a 2.0x overall throughput improvement, and a write amplification reduction by up to 36.0% over the state-of-the-art LevelDB. Under YCSB workloads, Co-KV increases the throughput by 1.7x - 2.4x while decreases the write amplification and average latency by up to 30.0% and 43.0%, respectively

arXiv.org e-Print Archive

Application-Driven Near-Data Processing for Similarity Search

Author: Alaghi Armin
Ceze Luis
del Mundo Carlo C.
Lee Vincent T.
Mazumdar Amrita
Oskin Mark
Publication venue
Publication date: 10/07/2017
Field of study

Similarity search is a key to a variety of applications including content-based search for images and video, recommendation systems, data deduplication, natural language processing, computer vision, databases, computational biology, and computer graphics. At its core, similarity search manifests as k-nearest neighbors (kNN), a computationally simple primitive consisting of highly parallel distance calculations and a global top-k sort. However, kNN is poorly supported by today's architectures because of its high memory bandwidth requirements. This paper proposes an application-driven near-data processing accelerator for similarity search: the Similarity Search Associative Memory (SSAM). By instantiating compute units close to memory, SSAM benefits from the higher memory bandwidth and density exposed by emerging memory technologies. We evaluate the SSAM design down to layout on top of the Micron hybrid memory cube (HMC), and show that SSAM can achieve up to two orders of magnitude area-normalized throughput and energy efficiency improvement over multicore CPUs; we also show SSAM is faster and more energy efficient than competing GPUs and FPGAs. Finally, we show that SSAM is also useful for other data intensive tasks like kNN index construction, and can be generalized to semantically function as a high capacity content addressable memory.Comment: 15 pages, 8 figures, 7 table

arXiv.org e-Print Archive

Enabling Practical Processing in and near Memory for Data-Intensive Computing

Author: Ausavarungnirun Rachata
Ghose Saugata
Gómez-Luna Juan
Mutlu Onur
Publication venue
Publication date: 02/05/2019
Field of study

Modern computing systems suffer from the dichotomy between computation on one side, which is performed only in the processor (and accelerators), and data storage/movement on the other, which all other parts of the system are dedicated to. Due to this dichotomy, data moves a lot in order for the system to perform computation on it. Unfortunately, data movement is extremely expensive in terms of energy and latency, much more so than computation. As a result, a large fraction of system energy is spent and performance is lost solely on moving data in a modern computing system. In this work, we re-examine the idea of reducing data movement by performing Processing in Memory (PIM). PIM places computation mechanisms in or near where the data is stored (i.e., inside the memory chips, in the logic layer of 3D-stacked logic and DRAM, or in the memory controllers), so that data movement between the computation units and memory is reduced or eliminated. While the idea of PIM is not new, we examine two new approaches to enabling PIM: 1) exploiting analog properties of DRAM to perform massively-parallel operations in memory, and 2) exploiting 3D-stacked memory technology design to provide high bandwidth to in-memory logic. We conclude by discussing work on solving key challenges to the practical adoption of PIM.Comment: A version of this work is to appear in a DAC 2019 Special Session as an Invited Paper in June 2019. arXiv admin note: substantial text overlap with arXiv:1903.0398

arXiv.org e-Print Archive

Evolutionary Cell Aided Design for Neural Network Architectures

Author: Colangelo Philip
Margala Martin
Segal Oren
Speicher Alexander
Publication venue
Publication date: 11/05/2019
Field of study

Mathematical theory shows us that multilayer feedforward Artificial Neural Networks(ANNs) are universal function approximators, capable of approximating any measurable function to any desired degree of accuracy. In practice designing practical and efficient neural network architectures require significant effort and expertise. We present a novel software framework called Evolutionary Cell Aided Design(ECAD) meant to aid in the exploration and design of efficient Neural Network Architectures(NNAs) for reconfigurable hardware. Given a general neural network structure and a set of constraints and fitness functions, the framework will explore both the space of possible NNA and the space of possible hardware designs, using evolutionary algorithms, and attempt to find the fittest co-design solutions according to a predefined set of goals. We test the framework on an image classification task and use the MNIST data set of hand written digits with an Intel Arria 10 GX 1150 device as our target platform. We design and implement a modular and scalable 2D systolic array with enhancements for machine learning that can be used by the framework for the hardware search space. Our results demonstrate the ability to pair neural network design and hardware development together using an evolutionary algorithm and removing traditional human-in-the-loop development tasks. By running various experiments of the fittest solutions for neural network and hardware searches, we demonstrate the full end-to-end capabilities of the ECAD framework.Comment: Text and image edit

arXiv.org e-Print Archive

Reconfigurable Hardware Accelerators: Opportunities, Trends, and Challenges

Author: Gong Lei
Hu Yahui
Jin Lihui
Li Xi
Lou Wenqi
Tan Luchao
Wang Chao
Zhou Xuehai
Publication venue
Publication date: 13/12/2017
Field of study

With the emerging big data applications of Machine Learning, Speech Recognition, Artificial Intelligence, and DNA Sequencing in recent years, computer architecture research communities are facing the explosive scale of various data explosion. To achieve high efficiency of data-intensive computing, studies of heterogeneous accelerators which focus on latest applications, have become a hot issue in computer architecture domain. At present, the implementation of heterogeneous accelerators mainly relies on heterogeneous computing units such as Application-specific Integrated Circuit (ASIC), Graphics Processing Unit (GPU), and Field Programmable Gate Array (FPGA). Among the typical heterogeneous architectures above, FPGA-based reconfigurable accelerators have two merits as follows: First, FPGA architecture contains a large number of reconfigurable circuits, which satisfy requirements of high performance and low power consumption when specific applications are running. Second, the reconfigurable architectures of employing FPGA performs prototype systems rapidly and features excellent customizability and reconfigurability. Nowadays, in top-tier conferences of computer architecture, emerging a batch of accelerating works based on FPGA or other reconfigurable architectures. To better review the related work of reconfigurable computing accelerators recently, this survey reserves latest high-level research products of reconfigurable accelerator architectures and algorithm applications as the basis. In this survey, we compare hot research issues and concern domains, furthermore, analyze and illuminate advantages, disadvantages, and challenges of reconfigurable accelerators. In the end, we prospect the development tendency of accelerator architectures in the future, hoping to provide a reference for computer architecture researchers

arXiv.org e-Print Archive

In-RDBMS Hardware Acceleration of Advanced Analytics

Author: Ardalan Adel
Esmaeilzadeh Hadi
Kim Joon Kyung
Kumar Arun
Mahajan Divya
Sacks Jacob
Publication venue: 'VLDB Endowment'
Publication date: 18/09/2018
Field of study

The data revolution is fueled by advances in machine learning, databases, and hardware design. Programmable accelerators are making their way into each of these areas independently. As such, there is a void of solutions that enables hardware acceleration at the intersection of these disjoint fields. This paper sets out to be the initial step towards a unifying solution for in-Database Acceleration of Advanced Analytics (DAnA). Deploying specialized hardware, such as FPGAs, for in-database analytics currently requires hand-designing the hardware and manually routing the data. Instead, DAnA automatically maps a high-level specification of advanced analytics queries to an FPGA accelerator. The accelerator implementation is generated for a User Defined Function (UDF), expressed as a part of an SQL query using a Python-embedded Domain-Specific Language (DSL). To realize an efficient in-database integration, DAnA accelerators contain a novel hardware structure, Striders, that directly interface with the buffer pool of the database. Striders extract, cleanse, and process the training data tuples that are consumed by a multi-threaded FPGA engine that executes the analytics algorithm. We integrate DAnA with PostgreSQL to generate hardware accelerators for a range of real-world and synthetic datasets running diverse ML algorithms. Results show that DAnA-enhanced PostgreSQL provides, on average, 8.3x end-to-end speedup for real datasets, with a maximum of 28.2x. Moreover, DAnA-enhanced PostgreSQL is, on average, 4.0x faster than the multi-threaded Apache MADLib running on Greenplum. DAnA provides these benefits while hiding the complexity of hardware design from data scientists and allowing them to express the algorithm in =30-60 lines of Python

arXiv.org e-Print Archive

An Open-Source Benchmark Suite for Cloud and IoT Microservices

Author: Bruno Ariana
Cheng Dailun
Clancy Brett
Colen Chris
Delimitrou Christina
Espinosa Mateo
Gan Yu
He Yuan
Hu Justin
Hu Kelvin
Jackson Brendon
Katarki Nayan
Leung Catherine
Lin Rick
Liu Zhongling
Padilla Jake
Pancholi Meghna
Rathi Priyal
Ritchken Brian
Shetty Ankitha
Wang Siyuan
Wen Fukang
Zaruvinsky Leon
Zhang Yanqi
Publication venue
Publication date: 27/05/2019
Field of study

Cloud services have recently started undergoing a major shift from monolithic applications, to graphs of hundreds of loosely-coupled microservices. Microservices fundamentally change a lot of assumptions current cloud systems are designed with, and present both opportunities and challenges when optimizing for quality of service (QoS) and utilization. In this paper we explore the implications microservices have across the cloud system stack. We first present DeathStarBench, a novel, open-source benchmark suite built with microservices that is representative of large end-to-end services, modular and extensible. DeathStarBench includes a social network, a media service, an e-commerce site, a banking system, and IoT applications for coordination control of UAV swarms. We then use DeathStarBench to study the architectural characteristics of microservices, their implications in networking and operating systems, their challenges with respect to cluster management, and their trade-offs in terms of application design and programming frameworks. Finally, we explore the tail at scale effects of microservices in real deployments with hundreds of users, and highlight the increased pressure they put on performance predictability

arXiv.org e-Print Archive

Untangling Blockchain: A Data Processing View of Blockchain Systems

Author: Chen Gang
Dinh Tien Tuan Anh
Liu Rui
Ooi Beng Chin
Wang Ji
Zhang Meihui
Publication venue
Publication date: 17/08/2017
Field of study

Blockchain technologies are gaining massive momentum in the last few years. Blockchains are distributed ledgers that enable parties who do not fully trust each other to maintain a set of global states. The parties agree on the existence, values and histories of the states. As the technology landscape is expanding rapidly, it is both important and challenging to have a firm grasp of what the core technologies have to offer, especially with respect to their data processing capabilities. In this paper, we first survey the state of the art, focusing on private blockchains (in which parties are authenticated). We analyze both in-production and research systems in four dimensions: distributed ledger, cryptography, consensus protocol and smart contract. We then present BLOCKBENCH, a benchmarking framework for understanding performance of private blockchains against data processing workloads. We conduct a comprehensive evaluation of three major blockchain systems based on BLOCKBENCH, namely Ethereum, Parity and Hyperledger Fabric. The results demonstrate several trade-offs in the design space, as well as big performance gaps between blockchain and database systems. Drawing from design principles of database systems, we discuss several research directions for bringing blockchain performance closer to the realm of databases.Comment: arXiv admin note: text overlap with arXiv:1703.0405

arXiv.org e-Print Archive