147 research outputs found

    Modern data analytics in the cloud era

    Get PDF
    Cloud Computing ist die dominante Technologie des letzten Jahrzehnts. Die Benutzerfreundlichkeit der verwalteten Umgebung in Kombination mit einer nahezu unbegrenzten Menge an Ressourcen und einem nutzungsabhängigen Preismodell ermöglicht eine schnelle und kosteneffiziente Projektrealisierung für ein breites Nutzerspektrum. Cloud Computing verändert auch die Art und Weise wie Software entwickelt, bereitgestellt und genutzt wird. Diese Arbeit konzentriert sich auf Datenbanksysteme, die in der Cloud-Umgebung eingesetzt werden. Wir identifizieren drei Hauptinteraktionspunkte der Datenbank-Engine mit der Umgebung, die veränderte Anforderungen im Vergleich zu traditionellen On-Premise-Data-Warehouse-Lösungen aufweisen. Der erste Interaktionspunkt ist die Interaktion mit elastischen Ressourcen. Systeme in der Cloud sollten Elastizität unterstützen, um den Lastanforderungen zu entsprechen und dabei kosteneffizient zu sein. Wir stellen einen elastischen Skalierungsmechanismus für verteilte Datenbank-Engines vor, kombiniert mit einem Partitionsmanager, der einen Lastausgleich bietet und gleichzeitig die Neuzuweisung von Partitionen im Falle einer elastischen Skalierung minimiert. Darüber hinaus führen wir eine Strategie zum initialen Befüllen von Puffern ein, die es ermöglicht, skalierte Ressourcen unmittelbar nach der Skalierung auszunutzen. Cloudbasierte Systeme sind von fast überall aus zugänglich und verfügbar. Daten werden häufig von zahlreichen Endpunkten aus eingespeist, was sich von ETL-Pipelines in einer herkömmlichen Data-Warehouse-Lösung unterscheidet. Viele Benutzer verzichten auf die Definition von strikten Schemaanforderungen, um Transaktionsabbrüche aufgrund von Konflikten zu vermeiden oder um den Ladeprozess von Daten zu beschleunigen. Wir führen das Konzept der PatchIndexe ein, die die Definition von unscharfen Constraints ermöglichen. PatchIndexe verwalten Ausnahmen zu diesen Constraints, machen sie für die Optimierung und Ausführung von Anfragen nutzbar und bieten effiziente Unterstützung bei Datenaktualisierungen. Das Konzept kann auf beliebige Constraints angewendet werden und wir geben Beispiele für unscharfe Eindeutigkeits- und Sortierconstraints. Darüber hinaus zeigen wir, wie PatchIndexe genutzt werden können, um fortgeschrittene Constraints wie eine unscharfe Multi-Key-Partitionierung zu definieren, die eine robuste Anfrageperformance bei Workloads mit unterschiedlichen Partitionsanforderungen bietet. Der dritte Interaktionspunkt ist die Nutzerinteraktion. Datengetriebene Anwendungen haben sich in den letzten Jahren verändert. Neben den traditionellen SQL-Anfragen für Business Intelligence sind heute auch datenwissenschaftliche Anwendungen von großer Bedeutung. In diesen Fällen fungiert das Datenbanksystem oft nur als Datenlieferant, während der Rechenaufwand in dedizierten Data-Science- oder Machine-Learning-Umgebungen stattfindet. Wir verfolgen das Ziel, fortgeschrittene Analysen in Richtung der Datenbank-Engine zu verlagern und stellen das Grizzly-Framework als DataFrame-zu-SQL-Transpiler vor. Auf dieser Grundlage identifizieren wir benutzerdefinierte Funktionen (UDFs) und maschinelles Lernen (ML) als wichtige Aufgaben, die von einer tieferen Integration in die Datenbank-Engine profitieren würden. Daher untersuchen und bewerten wir Ansätze für die datenbankinterne Ausführung von Python-UDFs und datenbankinterne ML-Inferenz.Cloud computing has been the groundbreaking technology of the last decade. The ease-of-use of the managed environment in combination with nearly infinite amount of resources and a pay-per-use price model enables fast and cost-efficient project realization for a broad range of users. Cloud computing also changes the way software is designed, deployed and used. This thesis focuses on database systems deployed in the cloud environment. We identify three major interaction points of the database engine with the environment that show changed requirements compared to traditional on-premise data warehouse solutions. First, software is deployed on elastic resources. Consequently, systems should support elasticity in order to match workload requirements and be cost-effective. We present an elastic scaling mechanism for distributed database engines, combined with a partition manager that provides load balancing while minimizing partition reassignments in the case of elastic scaling. Furthermore we introduce a buffer pre-heating strategy that allows to mitigate a cold start after scaling and leads to an immediate performance benefit using scaling. Second, cloud based systems are accessible and available from nearly everywhere. Consequently, data is frequently ingested from numerous endpoints, which differs from bulk loads or ETL pipelines in a traditional data warehouse solution. Many users do not define database constraints in order to avoid transaction aborts due to conflicts or to speed up data ingestion. To mitigate this issue we introduce the concept of PatchIndexes, which allow the definition of approximate constraints. PatchIndexes maintain exceptions to constraints, make them usable in query optimization and execution and offer efficient update support. The concept can be applied to arbitrary constraints and we provide examples of approximate uniqueness and approximate sorting constraints. Moreover, we show how PatchIndexes can be exploited to define advanced constraints like an approximate multi-key partitioning, which offers robust query performance over workloads with different partition key requirements. Third, data-centric workloads changed over the last decade. Besides traditional SQL workloads for business intelligence, data science workloads are of significant importance nowadays. For these cases the database system might only act as data delivery, while the computational effort takes place in data science or machine learning (ML) environments. As this workflow has several drawbacks, we follow the goal of pushing advanced analytics towards the database engine and introduce the Grizzly framework as a DataFrame-to-SQL transpiler. Based on this we identify user-defined functions (UDFs) and machine learning inference as important tasks that would benefit from a deeper engine integration and investigate approaches to push these operations towards the database engine

    LIPIcs, Volume 274, ESA 2023, Complete Volume

    Get PDF
    LIPIcs, Volume 274, ESA 2023, Complete Volum

    Accelerating Halide on an FPGA by using CIRCT and Calyx as an intermediate step to go from a high-level and software-centric IRs down to RTL

    Get PDF
    Image processing and, more generally, array processing play an essential role in modern life: from applying filters to the images that we upload to social media to running object detection algorithms on self-driving cars. Optimizing these algorithms can be complex and often results in non-portable code. The Halide language provides a simple way to write image and array processing algorithms by separating the algorithm definition (what needs to be executed) from its execution schedule (how it is executed), delivering state-of-the-art performance that exceeds hand-tuned parallel and vectorized code. Due to the inherent parallel nature of these algorithms, FPGAs present an attractive acceleration platform. While previous work has added an RTL code generator to Halide, and utilized other heterogeneous computing languages as an intermediate step, these projects are no longer maintained. MLIR is an attractive solution, allowing the generation of code that can target multiple devices, such as parallelized and vectorized CPU code, OpenMP, and CUDA. CIRCT builds on top of MLIR to convert generic MLIR code to register transfer level (RTL) languages by using Calyx, a new intermediate language (IL) for compiling high-level programs into hardware designs. This thesis presents a novel flow that implements an MLIR code generator for Halide that generates RTL code, adding the necessary wrappers to execute that code on Xilinx FPGA devices. Additionally, it implements a Halide runtime using the Xilinx Runtime (XRT), enabling seamless execution of the generated Halide RTL kernels. While this thesis provides initial support for running Halide kernels and not all features and optimizations are supported, it also details the future work needed to improve the performance of the generated RTL kernels. The proposed flow serves as a foundation for further research and development in the field of hardware acceleration for image and array processing applications using Halide

    Development of highly efficient and accurate real-space integration methods for Hartree-Fock and hybrid density functional calculations

    Get PDF
    The central focus of molecular electronic structure theory is to find approximate solutions to the electronic Schrödinger equation for molecules, and as such represents an essential part of any theoretical (in silico) study of chemical processes. However, a steep increase of the computational cost with increasing system size often prevents the application of accurate approximations to the molecules of interest. The main focus of the present work is the efficient evaluation of Fock-exchange contributions, which typically represents the computational bottleneck in Hartree-Fock (HF) and hybrid density functional theory (DFT) calculations. This bottleneck is addressed by means of seminumerical integration, i.e., one electronic coordinate within the 4-center-2-electron integral tensor is represented analytically and one numerically. In this way, an asymptotically linear scaling method for computing the exchange matrix (denoted as sn-LinK) is developed, enabling fast and accurate ab-initio calculations on large molecules, comprising hundreds or even thousands of atoms, even in combination with large atomic orbital basis sets. The novel sn-LinK method comprises improvements to the numerical integration grids, a rigorous, batch-wise integral screening scheme, the optimal utilization of modern, highly parallel compute architectures (e.g., graphics processing units; GPUs), and an efficient combination of single- and double-precision arithmetic. In total, these optimizations enable over two orders of magnitude faster evaluation of Fock-exchange contributions. Consequently, this greatly improved performance allows to perform previously unfeasible computations, which is also demonstrated at the example of an ab initio molecular dynamics simulation (AIMD) study on the hydrogen bond strengths within double-stranded DNA. In addition to Fock-exchange, the other two computational bottlenecks in hybrid-DFT applications – the evaluation of the Coulomb potential and the numerical integration of the semilocal exchange-correlation functional – are also addressed. Finally, more efficient methods to evaluate more accurate post-HF/DFT methods, namely the random-phase approximation (RPA) and the second-order approximate coupled cluster (CC2) method, are also put forward. In this way, the highly efficient methods introduced in this thesis cover some of the most substantial computational bottlenecks in electronic-structure theory – the evaluation of the Coulomb- and the exchange-interactions, the integration of the semilocal exchange-correlation functional, and the computation of post-Hartree-Fock correlation energies. Consequently, computational chemistry studies on large molecules (>100 atoms) are accelerated by multiple orders of magnitude, allowing for much more accurate and thorough in-silico studies than ever before

    Correct Optimized GPU Programs

    Get PDF

    Towards Performance Portable Graph Algorithms

    Get PDF
    In today's data-driven world, our computational resources have become heterogeneous, making the processing of large-scale graphs in an architecture agnostic manner crucial. Traditionally, hand-optimized high-performance computing (HPC) solutions have been studied and used to implement highly efficient and scalable graph algorithms. In recent years, several graph processing and management systems have also been proposed. Hand optimized HPC approaches require high levels of expertise and graph processing frameworks suffer from expressibility and performance. Portability is a major concern for both approaches. The main thesis of this work is that block-based graph algorithms offer a compromise between efficient parallelism and architecture agnostic algorithm design for a wide class of graph problems. This dissertation seeks to prove this thesis by focusing the work on the three pillars; data/computation partitioning, block-based algorithm design, and performance portability. In this dissertation, we first show how we can partition the computation and the data to design efficient block-based algorithms for solving graph merging and triangle counting problems. Then, generalizing from our experiences, we propose an algorithmic framework, for shared-memory, heterogeneous machines for implementing block-based graph algorithms; PGAbB. PGAbB aims to maximally leverage different architectures by implementing a task-based execution on top of a block-based programming model. In this talk we will discuss PGAbB's programming model, algorithmic optimizations for scheduling, and load-balancing strategies for graph problems on real-world and synthetic inputs.Ph.D

    Enabling multi-threaded execution and improved memory access in fine-grain near-data processing systems

    Get PDF
    Orientador: Marco Antonio Zanata AlvesTese (doutorado) - Universidade Federal do Paraná, Setor de Ciências Exatas, Programa de Pós-Graduação em Informática. Defesa : Curitiba, 08/07/2022Inclui referênciasÁrea de concentração: Ciência da ComputaçãoResumo: Aplicações que lidam com grandes quantidades de dados são cada vez mais populares. No entanto, as arquiteturas tradicionais centradas em computação estão mal equipadas para lidar com essas aplicatções, pois elas causam muito movimento de dados no sistema devido aos acessos de dados quase constantes. Isso leva a um processamento ineficiente, com longos tempos de execução e alto consumo de energia. Os problemas causados por essa disparidade são amplamente conhecidos como memory wall. A partir do final da década de 1990, a ideia de mover parte da computação para perto da memória, quando benéfico, começou a ser considerada. Este conceito tornou-se conhecido como processamento próximo à memória e ganhou mais atenção no início da década de 2010 com o advento da tecnologia de Through-Silicon Via (TSV), que permitiu a integração direta das lógicas de processamento e armazenamento de dados no mesmo chip. Memórias 3D, que integram verticalmente armazenamento e lógica, tornaram-se comercialmente disponíveis desde então e pesquisadores da área de arquitetura de computadores reagiram propondo muitos projetos que colocam elementos de processamento na camada lógica normalmente encontrada nesses dispositivos. Esta tese propõe a Vector-In-Memory Architecture (VIMA), uma arquitetura de processamento próximo à memória baseada em memória 3D que implementa o processamento na memória colocando unidades funcionais na camada lógica desses dispositivos. Nosso projeto usa unidades funcionais vetoriais e uma memória cache para armazenamento dedicado e avança o estado da arte implementando exceções precisas e permitindo multi-threading próximo aos dadosna memória. Simulamos a execução de várias aplicações orientadas a dados em nossa arquitetura e, nossos resultados mostram que o design proposto, que utiliza 1 core e a VIMA, é capaz de superar uma arquitetura tradicional moderna de 16 cores em pelo menos 2× ao lidar com grandes tamanhos de conjuntos de dados. Além disso, essa aceleração no tempo de execução é alcançada enquanto se reduz o consumo de energia em pelo menos 75% de acordo com nossas estimativas. Em comparação com um trabalho similar do estado da arte, a VIMA é capaz de reduzir o tempo de execução de aplicações que fazem streaming de dados em pelo menos 32%.Abstract: Applications that deal with large amounts of data are increasingly popular. However, traditional computation-centric architectures are ill-equipped to handle such applications as they cause much data movement across the system due to their near-constant data accesses. This leads to inefficient processing, with long execution times and high energy consumption. Issues caused by this disparity are widely known as the memory wall. Starting in the late 1990s, the idea of moving portions of the computations close to the memory when beneficial began to be considered. This concept has now become known as Near-Data Processing (NDP) and gained more attention in the early 2010s with the advent of TSV technology, which enabled straight-forward integration of processing logic and data storage in the same chip. 3D-stacked memories, which vertically integrate storage and logic, have become commercially available ever since and computer architecture researchers have reacted by proposing many designs that place processing elements on the logic layer typically found in those devices. This thesis proposes VIMA, a 3D-stacked memory-based NDP architecture that implements processing in the memory by placing Functional Units (FUs) on the logic layer of those devices. Our design uses a vector functional units and a cache memory for dedicated storage and advances the state-of-the-art by implementing near-data precise exceptions and enabling near-data multi-threading. We simulate execution of several common data-driven applications on our architecture and, out results show that the proposed design, with only a single processing core and VIMA, is able to outperform a modern 16-thread by at least 2× when dealing with large dataset sizes. Moreover, such a speedup in performance is achieved while reducing energy consumption by at least 75% according to our estimates. In comparison to its most closely related state-of-the-art work, VIMA is able to reduce the execution time of data-streaming applications by at least 32%

    Tools and Algorithms for the Construction and Analysis of Systems

    Get PDF
    This open access book constitutes the proceedings of the 28th International Conference on Tools and Algorithms for the Construction and Analysis of Systems, TACAS 2022, which was held during April 2-7, 2022, in Munich, Germany, as part of the European Joint Conferences on Theory and Practice of Software, ETAPS 2022. The 46 full papers and 4 short papers presented in this volume were carefully reviewed and selected from 159 submissions. The proceedings also contain 16 tool papers of the affiliated competition SV-Comp and 1 paper consisting of the competition report. TACAS is a forum for researchers, developers, and users interested in rigorously based tools and algorithms for the construction and analysis of systems. The conference aims to bridge the gaps between different communities with this common interest and to support them in their quest to improve the utility, reliability, exibility, and efficiency of tools and algorithms for building computer-controlled systems

    GPU Enabled Automated Reasoning

    Get PDF

    Programming Languages and Systems

    Get PDF
    This open access book constitutes the proceedings of the 31st European Symposium on Programming, ESOP 2022, which was held during April 5-7, 2022, in Munich, Germany, as part of the European Joint Conferences on Theory and Practice of Software, ETAPS 2022. The 21 regular papers presented in this volume were carefully reviewed and selected from 64 submissions. They deal with fundamental issues in the specification, design, analysis, and implementation of programming languages and systems
    corecore