148 research outputs found
Data-intensive Systems on Modern Hardware : Leveraging Near-Data Processing to Counter the Growth of Data
Over the last decades, a tremendous change toward using information technology in almost every daily routine of our lives can be perceived in our society, entailing an incredible growth of data collected day-by-day on Web, IoT, and AI applications.
At the same time, magneto-mechanical HDDs are being replaced by semiconductor storage such as SSDs, equipped with modern Non-Volatile Memories, like Flash, which yield significantly faster access latencies and higher levels of parallelism. Likewise, the execution speed of processing units increased considerably as nowadays server architectures comprise up to multiple hundreds of independently working CPU cores along with a variety of specialized computing co-processors such as GPUs or FPGAs.
However, the burden of moving the continuously growing data to the best fitting processing unit is inherently linked to today’s computer architecture that is based on the data-to-code paradigm. In the light of Amdahl's Law, this leads to the conclusion that even with today's powerful processing units, the speedup of systems is limited since the fraction of parallel work is largely I/O-bound.
Therefore, throughout this cumulative dissertation, we investigate the paradigm shift toward code-to-data, formally known as Near-Data Processing (NDP), which relieves the contention on the I/O bus by offloading processing to intelligent computational storage devices, where the data is originally located.
Firstly, we identified Native Storage Management as the essential foundation for NDP due to its direct control of physical storage management within the database. Upon this, the interface is extended to propagate address mapping information and to invoke NDP functionality on the storage device. As the former can become very large, we introduce Physical Page Pointers as one novel NDP abstraction for self-contained immutable database objects.
Secondly, the on-device navigation and interpretation of data are elaborated. Therefore, we introduce cross-layer Parsers and Accessors as another NDP abstraction that can be executed on the heterogeneous processing capabilities of modern computational storage devices. Thereby, the compute placement and resource configuration per NDP request is identified as a major performance criteria. Our experimental evaluation shows an improvement in the execution durations of 1.4x to 2.7x compared to traditional systems. Moreover, we propose a framework for the automatic generation of Parsers and Accessors on FPGAs to ease their application in NDP.
Thirdly, we investigate the interplay of NDP and modern workload characteristics like HTAP. Therefore, we present different offloading models and focus on an intervention-free execution. By propagating the Shared State with the latest modifications of the database to the computational storage device, it is able to process data with transactional guarantees. Thus, we achieve to extend the design space of HTAP with NDP by providing a solution that optimizes for performance isolation, data freshness, and the reduction of data transfers. In contrast to traditional systems, we experience no significant drop in performance when an OLAP query is invoked but a steady and 30% faster throughput.
Lastly, in-situ result-set management and consumption as well as NDP pipelines are proposed to achieve flexibility in processing data on heterogeneous hardware. As those produce final and intermediary results, we continue investigating their management and identified that an on-device materialization comes at a low cost but enables novel consumption modes and reuse semantics. Thereby, we achieve significant performance improvements of up to 400x by reusing once materialized results multiple times
Memory-Centric Computing
Memory-centric computing aims to enable computation capability in and near
all places where data is generated and stored. As such, it can greatly reduce
the large negative performance and energy impact of data access and data
movement, by fundamentally avoiding data movement and reducing data access
latency & energy. Many recent studies show that memory-centric computing can
greatly improve system performance and energy efficiency. Major industrial
vendors and startup companies have also recently introduced memory chips that
have sophisticated computation capabilities.
This talk describes promising ongoing research and development efforts in
memory-centric computing. We classify such efforts into two major fundamental
categories: 1) processing using memory, which exploits analog operational
properties of memory structures to perform massively-parallel operations in
memory, and 2) processing near memory, which integrates processing capability
in memory controllers, the logic layer of 3D-stacked memory technologies, or
memory chips to enable high-bandwidth and low-latency memory access to
near-memory logic. We show both types of architectures (and their combination)
can enable orders of magnitude improvements in performance and energy
consumption of many important workloads, such as graph analytics, databases,
machine learning, video processing, climate modeling, genome analysis. We
discuss adoption challenges for the memory-centric computing paradigm and
conclude with some research & development opportunities.Comment: To appear as an invited special session paper at DAC 202
Accelerating Time Series Analysis via Processing using Non-Volatile Memories
Time Series Analysis (TSA) is a critical workload for consumer-facing
devices. Accelerating TSA is vital for many domains as it enables the
extraction of valuable information and predict future events. The
state-of-the-art algorithm in TSA is the subsequence Dynamic Time Warping
(sDTW) algorithm. However, sDTW's computation complexity increases
quadratically with the time series' length, resulting in two performance
implications. First, the amount of data parallelism available is significantly
higher than the small number of processing units enabled by commodity systems
(e.g., CPUs). Second, sDTW is bottlenecked by memory because it 1) has low
arithmetic intensity and 2) incurs a large memory footprint. To tackle these
two challenges, we leverage Processing-using-Memory (PuM) by performing in-situ
computation where data resides, using the memory cells. PuM provides a
promising solution to alleviate data movement bottlenecks and exposes immense
parallelism.
In this work, we present MATSA, the first MRAM-based Accelerator for Time
Series Analysis. The key idea is to exploit magneto-resistive memory crossbars
to enable energy-efficient and fast time series computation in memory. MATSA
provides the following key benefits: 1) it leverages high levels of parallelism
in the memory substrate by exploiting column-wise arithmetic operations, and 2)
it significantly reduces the data movement costs performing computation using
the memory cells. We evaluate three versions of MATSA to match the requirements
of different environments (e.g., embedded, desktop, or HPC computing) based on
MRAM technology trends. We perform a design space exploration and demonstrate
that our HPC version of MATSA can improve performance by 7.35x/6.15x/6.31x and
energy efficiency by 11.29x/4.21x/2.65x over server CPU, GPU and PNM
architectures, respectively
Rise of the Planet of Serverless Computing: A Systematic Review
Serverless computing is an emerging cloud computing paradigm, being adopted to develop a wide range of software applications.
It allows developers to focus on the application logic in the granularity of function, thereby freeing developers from tedious and
error-prone infrastructure management. Meanwhile, its unique characteristic poses new challenges to the development and deployment
of serverless-based applications. To tackle these challenges, enormous research efforts have been devoted. This paper provides a
comprehensive literature review to characterize the current research state of serverless computing. Specifically, this paper covers 164
papers on 17 research directions of serverless computing, including performance optimization, programming framework, application
migration, multi-cloud development, testing and debugging, etc. It also derives research trends, focus, and commonly-used platforms
for serverless computing, as well as promising research opportunities
- …