280 research outputs found

    Cooperative scans

    Get PDF
    Data mining, information retrieval and other application areas exhibit a query load with multiple concurrent queries touching a large fraction of a relation. This leads to individual query plans based on a table scan or large index scan. The implementation of this access path in most database systems is straightforward. The Scan operator issues next page requests to the buffer manager without concern for the system state. Conversely, the buffer manager is not aware of the work ahead and it focuses on keeping the most-recently-used pages in the buffer pool. This paper introduces cooperative scans -- a new algorithm, based on a better sharing of knowledge and responsibility between the Scan operator and the buffer manager, which significantly improves performance of concurrent scan queries. In this approach, queries share the buffer content, and progress of the scans is optimized by the buffer manager by minimizing the number of disk transfers in light of the total workload ahead. The experimental results are based on a simulation of the various disk-access scheduling policies, and implementation of the cooperative scans within PostgreSQL and MonetDB/X100. These real-life experiments show that with a little effort the performance of existing database systems on concurrent scan queries can be strongly improve

    Materialisierte views in verteilten key-value stores

    Get PDF
    Distributed key-value stores have become the solution of choice for warehousing large volumes of data. However, their architecture is not suitable for real-time analytics. To achieve the required velocity, materialized views can be used to provide summarized data for fast access. The main challenge then, is the incremental, consistent maintenance of views at large scale. Thus, we introduce our View Maintenance System (VMS) to maintain SQL queries in a data-intensive real-time scenario.Verteilte key-value stores sind ein Typ moderner Datenbanken um große Mengen an Daten zu verarbeiten. Trotzdem erlaubt ihre Architektur keine analytischen Abfragen in Echtzeit. Materialisierte Views können diesen Nachteil ausgleichen, indem sie schnellen Zuriff auf Ergebnisse ermöglichen. Die Herausforderung ist dann, das inkrementelle und konsistente Aktualisieren der Views. Daher präsentieren wir unser View Maintenance System (VMS), das datenintensive SQL Abfragen in Echtzeit berechnet

    X-ray CT on the GPU

    Get PDF
    Nondestructive testing (NDT) is a collection of analysis techniques used by scientists and technologists as a way of analyzing the interior of an object without damaging the object. Since the analysis is done without damaging the object, NDT is an extremely valuable technique used in various industries for troubleshooting and research. CNDE has a long history of working with a variety of industrial sectors which include Aerospace (commercial and military aviation) and Defense Systems (ground vehicles and personnel protection); Energy (nuclear, wind, fossil); Infrastructure and Transportation (bridges, roadways, dams, levees); and Petro-Chemical (offshore, processing, fuel transport piping) to provide cost-effective tools and solutions. X-ray tomography is the procedure of using X-rays for generating tomographic slices of the required object. The object is bombarded with X-rays and the scanned image intensity values are collected on a detector. A significant drawback in X-ray tomography is the amount of data collected. It is generally huge in the order of gigabytes and hence the processing of data presents a big challenge. One way to speed up the processing of data is to run the programs on a cluster. CNDE uses a 64 node Beowulf cluster to do the reconstruction of an image. However with the advent of the GPU (Graphic Processing Unit) we have a far more cost efficient and time efficient hardware to run the reconstruction algorithm. The GPU can be fitted into a single PC, costs 10 times less than the cluster and also has a longer life time. This thesis has two major components to it. One of it is the desvelopment of new preprocessing and post processing techniques (includes filters, hot pixel removal etc.) to improve the quality of the input data and the other is the implementation of these techniques as well as the reconstruction program on the GPU using CUDA. Speedup on the GPU is not just a matter of porting the developed algorithms in parallel onto the hardware like in a cluster. GPU architecture is extremely complex and involves the usage of many different types of memory each with its own advantages and disadvantages and also many other optimization techniques for accessing and processing the data. These new techniques as well as the introduction of GPU are a significant addition to X-ray program here at CNDE

    Snapshot : friend or foe of data management - on optimizing transaction processing in database and blockchain systems

    Get PDF
    Data management is a complicated task. Due to a wide range of data management tasks, businesses often need a sophisticated data management infrastructure with a plethora of distinct systems to fulfill their requirements. Moreover, since snapshot is an essential ingredient in solving many data management tasks such as checkpointing and recovery, they have been widely exploited in almost all major data management systems that have appeared in recent years. However, snapshots do not always guarantee exceptional performance. In this dissertation, we will see two different faces of the snapshot, one where it has a tremendous positive impact on the performance and usability of the system, and another where an incorrect usage of the snapshot might have a significant negative impact on the performance of the system. This dissertation consists of three loosely-coupled parts that represent three distinct projects that emerged during this doctoral research. In the first part, we analyze the importance of utilizing snapshots in relational database systems. We identify the bottlenecks in state-of-the-art snapshotting algorithms, propose two snapshotting techniques, and optimize the multi-version concurrency control for handling hybrid workloads effectively. Our snapshotting algorithm is up to 100x faster and reduces the latency of analytical queries by up to 4x in comparison to the state-of-the-art techniques. In the second part, we recognize strict snapshotting used by Fabric as a critical bottleneck, and replace it with MVCC and propose some additional optimizations to improve the throughput of the permissioned-blockchain system by up to 12x under highly contended workloads. In the last part, we propose ChainifyDB, a platform that transforms an existing database infrastructure into a blockchain infrastructure. ChainifyDB achieves up to 6x higher throughput in comparison to another state-of-the-art permissioned blockchain system. Furthermore, its external concurrency control protocol outperforms the internal concurrency control protocol of PostgreSQL and MySQL, achieving up to 2.6x higher throughput in a blockchain setup in comparison to a standalone isolated setup. We also utilize snapshots in ChainifyDB to support recovery, which has been missing so far from the permissioned-blockchain world.Datenverwaltung ist eine komplizierte Aufgabe. Aufgrund der vielfältigen Aufgaben im Bereich der Datenverwaltung benötigen Unternehmen häufig eine anspruchsvolle Infrastruktur mit einer Vielzahl an unterschiedlichen Systemen, um ihre Anforderungen zu erfüllen. Dabei ist Snapshotting ein wesentlicher Bestandteil in nahezu allen aktuellen Datenbanksystemen, um Probleme wie Checkpointing und Recovery zu lösen. Allerdings garantieren Snapshots nicht immer eine gute Performance. In dieser Arbeit werden wir zwei Facetten des Snapshots beleuchten: Einerseits können Snapshots enorm positive Auswirkungen auf die Performance und Usability des Systems haben, andererseits können sie bei falscher Anwendung zu erheblichen Performanceverlusten führen. Diese Dissertation besteht aus drei Teilen basierend auf drei unterschiedlichen Projekten, die im Rahmen der Forschung zu dieser Arbeit entstanden sind. Im ersten Teil untersuchen wir die Bedeutung von Snapshots in relationalen Datenbanksystemen. Wir identifizieren die Bottlenecks gegenwärtiger Snapshottingalgorithmen, stellen zwei leichtgewichtige Snapshottingverfahren vor und optimieren Multi- Version Concurrency Control f¨ur das effiziente Ausführen hybrider Workloads. Unser Snapshottingalgorithmus ist bis zu 100 mal schneller und verringert die Latenz analytischer Anfragen um bis zu Faktor vier gegenüber dem Stand der Technik. Im zweiten Teil identifizieren wir striktes Snapshotting als Bottleneck von Fabric. In Folge dessen ersetzen wir es durch MVCC und schlagen weitere Optimierungen vor, mit denen der Durchsatz des Permissioned Blockchain Systems unter hoher Arbeitslast um Faktor zwölf verbessert werden kann. Im letzten Teil stellen wir ChainifyDB vor, eine Platform die eine existierende Datenbankinfrastruktur in eine Blockchaininfrastruktur überführt. ChainifyDB erreicht dabei einen bis zu sechs mal höheren Durchsatz im Vergleich zu anderen aktuellen Systemen, die auf Permissioned Blockchains basieren. Das externe Concurrency Protokoll übertrifft dabei sogar die internen Varianten von PostgreSQL und MySQL und erreicht einen bis zu 2,6 mal höhren Durchsatz im Blockchain Setup als in einem eigenständigen isolierten Setup. Zusätzlich verwenden wir Snapshots in ChainifyDB zur Unterstützung von Recovery, was bisher im Rahmen von Permissioned Blockchains nicht möglich war

    MxTasks: a novel processing model to support data processing on modern hardware

    Get PDF
    The hardware landscape has changed rapidly in recent years. Modern hardware in today's servers is characterized by many CPU cores, multiple sockets, and vast amounts of main memory structured in NUMA hierarchies. In order to benefit from these highly parallel systems, the software has to adapt and actively engage with newly available features. However, the processing models forming the foundation for many performance-oriented applications have remained essentially unchanged. Threads, which serve as the central processing abstractions, can be considered a "black box" that hardly allows any transparency between the application and the system underneath. On the one hand, applications are aware of the knowledge that could assist the system in optimizing the execution, such as accessed data objects and access patterns. On the other hand, the limited opportunities for information exchange cause operating systems to make assumptions about the applications' intentions to optimize their execution, e.g., for local data access. Applications, on the contrary, implement optimizations tailored to specific situations, such as sophisticated synchronization mechanisms and hardware-conscious data structures. This work presents MxTasking, a task-based runtime environment that assists the design of data structures and applications for contemporary hardware. MxTasking rethinks the interfaces between performance-oriented applications and the execution substrate, streamlining the information exchange between both layers. By breaking patterns of processing models designed with past generations of hardware in mind, MxTasking creates novel opportunities to manage resources in a hardware- and application-conscious way. Accordingly, we question the granularity of "conventional" threads and show that fine-granular MxTasks are a viable abstraction unit for characterizing and optimizing the execution in a general way. Using various demonstrators in the context of database management systems, we illustrate the practical benefits and explore how challenges like memory access latencies and error-prone synchronization of concurrency can be addressed straightforwardly and effectively
    corecore