Search CORE

32 research outputs found

Charting the design space of query execution using VOILA

Author: Boncz P.A. (Peter)
Gubner T.K. (Tim)
Publication venue: 'VLDB Endowment'
Publication date: 16/08/2021
Field of study

Database architecture, while having been studied for four decades now, has delivered only a few designs with well-understood properties. These few are followed by most actual systems. Acquiring more knowledge about the design space is a very time-consuming processes that requires manually crafting prototypes with a low chance of generating material insight.We propose a framework that aims to accelerat

CWI's Institutional Repository

Optimizing group-by and aggregation using GPU-CPU co-processing

Author: Boncz P.A. (Peter)
Gomes Tomé D. (Diego)
Gubner T.K. (Tim)
Raasveldt M. (Mark)
Rozenberg E. (Eyal)
Publication venue
Publication date: 27/08/2018
Field of study

While GPU query processing is a well-studied area, real adoption is limited in practice as typically GPU execution is only significantly faster than CPU execution if the data resides in GPU memory, which limits scalability to small data scenarios where performance tends to be less critical. Another problem is that not all query code (e.g. UDFs) will realistically be able to run on GPUs. We therefore investigate CPU-GPU co-processing, where both the CPU and GPU are involved in evaluating the query in scenarios where the data does not fit in the GPU memory.As we wish to deeply explore opportunities for optimizing execution speed, we narrow our focus further to a specific well-studied OLAP scenario, amenable to such co-processing, in the form of the TPC-H benchmark Query 1.For this query, and at large scale factors, we are able to improve performance significantly over the state-of-the-art for GPU implementations; we present competitive performance of a GPU versus a state-of-the-art multi-core CPU baseline a novelty for data exceeding GPU memory size; and finally, we show that co-processing does provide significant additional speedup over any of the processors individually.We achieve this performance improvement by utilizing parallelism-friendly compression to alleviate the PCIe transfer bottleneck, query-compilation-like fusion of the processing operations, and a simple yet effective scheduling mechanism. We hope that some of these features can inspire future work on GPU-focused and heterogeneous analytic DBMSes.</p

CWI's Institutional Repository

Recommended from our members

Scalable Emulation of Heterogeneous Systems

Author: Garcia Cota Emilio
Publication venue: 'Columbia University Libraries/Information Services'
Publication date: 01/01/2019
Field of study

The breakdown of Dennard's transistor scaling has driven computing systems toward application-specific accelerators, which can provide orders-of-magnitude improvements in performance and energy efficiency over general-purpose processors. To enable the radical departures from conventional approaches that heterogeneous systems entail, research infrastructure must be able to model processors, memory and accelerators, as well as system-level changes---such as operating system or instruction set architecture (ISA) innovations---that might be needed to realize the accelerators' potential. Unfortunately, existing simulation tools that can support such system-level research are limited by the lack of fast, scalable machine emulators to drive execution. To fill this need, in this dissertation we first present a novel machine emulator design based on dynamic binary translation that makes the following improvements over the state of the art: it scales on multicore hosts while remaining memory efficient, correctly handles cross-ISA differences in atomic instruction semantics, leverages the host floating point (FP) unit to speed up FP emulation without sacrificing correctness, and can be efficiently instrumented to---among other possible uses---drive the execution of a full-system, cross-ISA simulator with support for accelerators. We then demonstrate the utility of machine emulation for studying heterogeneous systems by leveraging it to make two additional contributions. First, we quantify the trade-offs in different coupling models for on-chip accelerators. Second, we present a technique to reuse the private memories of on-chip accelerators when they are otherwise inactive to expand the system's last-level cache, thereby reducing the opportunity cost of the accelerators' integration

Columbia University Academic Commons

Proceedings of the 5th International Workshop on Reconfigurable Communication-centric Systems on Chip 2010 - ReCoSoC\u2710 - May 17-19, 2010 Karlsruhe, Germany. (KIT Scientific Reports ; 7551)

Author: Becker Jürgen
Hübner Michael
Lagadec Loïc
Sander Oliver
Publication venue: KIT Scientific Publishing, Karlsruhe
Publication date: 01/01/2010
Field of study

ReCoSoC is intended to be a periodic annual meeting to expose and discuss gathered expertise as well as state of the art research around SoC related topics through plenary invited papers and posters. The workshop aims to provide a prospective view of tomorrow\u27s challenges in the multibillion transistor era, taking into account the emerging techniques and architectures exploring the synergy between flexible on-chip communication and system reconfigurability

KITopen

Designing an adaptive VM that combines vectorized and JIT execution on heterogeneous hardware

Author: Gubner Tim
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 16/04/2018
Field of study

Modern hardware tends to become increasingly heterogeneous which leads to major challenges for existing systems: (a) Given that performance on modern hardware should be maximized, hardware features should be fully exploited. Together with the increasing heterogeneity this leads to more complex systems. (b) In general it is non-trivial for a system to determine the most efficient way to execute a program on a specific piece of hardware. Based on (a) and (b) we believe it is necessary to extend code generation in data processing systems. First, to mitigate the increasing complexity we propose the usage of domain-specific languages (DSL) that abstract specific details away. Our DSL is based on data-parallel operations and control-flow statements which allows to easily exploit SIMD (on multiple architectures: CPU, GPU etc.). We briefly sketch our idea of such a DSL. Second, we propose a virtual machine executing this DSL. We plan to exploit different implementation flavors (adaptivity) and dynamically compile \& optimize hot paths in the program (JIT-compilation). We sketch ideas about which paths to compile, how to achieve adaptive execution of different JIT-compiled paths and elaborate a bit more on our ideas on workload-specific optimizations. Finally, we present our plan for future research and ideas for major publications

Crossref

CWI's Institutional Repository

Designing an adaptive VM that combines vectorized and JIT execution on heterogeneous hardware

Author: Boncz Peter
Gubner Tim
Publication venue
Publication date: 01/01/2018
Field of study

Designing an adaptive VM that combines vectorized and JIT execution on heterogeneous hardware

Author: Boncz P.A. (Peter)
Gubner T.K. (Tim)
Publication venue
Publication date: 01/01/2018
Field of study

CWI's Institutional Repository

Advancing Urban Flood Resilience With Smart Water Infrastructure

Author: Bartos Matthew
Publication venue
Publication date: 01/01/2020
Field of study

Advances in wireless communications and low-power electronics are enabling a new generation of smart water systems that will employ real-time sensing and control to solve our most pressing water challenges. In a future characterized by these systems, networks of sensors will detect and communicate flood events at the neighborhood scale to improve disaster response. Meanwhile, wirelessly-controlled valves and pumps will coordinate reservoir releases to halt combined sewer overflows and restore water quality in urban streams. While these technologies promise to transform the field of water resources engineering, considerable knowledge gaps remain with regards to how smart water systems should be designed and operated. This dissertation presents foundational work towards building the smart water systems of the future, with a particular focus on applications to urban flooding. First, I introduce a first-of-its-kind embedded platform for real-time sensing and control of stormwater systems that will enable emergency managers to detect and respond to urban flood events in real-time. Next, I introduce new methods for hydrologic data assimilation that will enable real-time geolocation of floods and water quality hazards. Finally, I present theoretical contributions to the problem of controller placement in hydraulic networks that will help guide the design of future decentralized flood control systems. Taken together, these contributions pave the way for adaptive stormwater infrastructure that will mitigate the impacts of urban flooding through real-time response.PHDCivil EngineeringUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttp://deepblue.lib.umich.edu/bitstream/2027.42/163144/1/mdbartos_1.pd

Deep Blue Documents at the University of Michigan

TACKLING PERFORMANCE AND SECURITY ISSUES FOR CLOUD STORAGE SYSTEMS

Author: Kang Luyi
Publication venue
Publication date: 01/01/2022
Field of study

Building data-intensive applications and emerging computing paradigm (e.g., Machine Learning (ML), Artificial Intelligence (AI), Internet of Things (IoT) in cloud computing environments is becoming a norm, given the many advantages in scalability, reliability, security and performance. However, under rapid changes in applications, system middleware and underlying storage device, service providers are facing new challenges to deliver performance and security isolation in the context of shared resources among multiple tenants. The gap between the decades-old storage abstraction and modern storage device keeps widening, calling for software/hardware co-designs to approach more effective performance and security protocols. This dissertation rethinks the storage subsystem from device-level to system-level and proposes new designs at different levels to tackle performance and security issues for cloud storage systems. In the first part, we present an event-based SSD (Solid State Drive) simulator that models modern protocols, firmware and storage backend in detail. The proposed simulator can capture the nuances of SSD internal states under various I/O workloads, which help researchers understand the impact of various SSD designs and workload characteristics on end-to-end performance. In the second part, we study the security challenges of shared in-storage computing infrastructures. Many cloud providers offer isolation at multiple levels to secure data and instance, however, security measures in emerging in-storage computing infrastructures are not studied. We first investigate the attacks that could be conducted by offloaded in-storage programs in a multi-tenancy cloud environment. To defend against these attacks, we build a lightweight Trusted Execution Environment, IceClave to enable security isolation between in-storage programs and internal flash management functions. We show that while enforcing security isolation in the SSD controller with minimal hardware cost, IceClave still keeps the performance benefit of in-storage computing by delivering up to 2.4x better performance than the conventional host-based trusted computing approach. In the third part, we investigate the performance interference problem caused by other tenants' I/O flows. We demonstrate that I/O resource sharing can often lead to performance degradation and instability. The block device abstraction fails to expose SSD parallelism and pass application requirements. To this end, we propose a software/hardware co-design to enforce performance isolation by bridging the semantic gap. Our design can significantly improve QoS (Quality of Service) by reducing throughput penalties and tail latency spikes. Lastly, we explore more effective I/O control to address contention in the storage software stack. We illustrate that the state-of-the-art resource control mechanism, Linux cgroups is insufficient for controlling I/O resources. Inappropriate cgroup configurations may even hurt the performance of co-located workloads under memory intensive scenarios. We add kernel support for limiting page cache usage per cgroup and achieving I/O proportionality

Digital Repository at the University of Maryland