15 research outputs found

    Cray X1 Evaluation Status Report

    Full text link

    Development of a parallel database environment

    Get PDF

    Impact of Communication Protocol on Performance

    Full text link

    Database System Acceleration on FPGAs

    Get PDF
    Relational database systems provide various services and applications with an efficient means for storing, processing, and retrieving their data. The performance of these systems has a direct impact on the quality of service of the applications that rely on them. Therefore, it is crucial that database systems are able to adapt and grow in tandem with the demands of these applications, ensuring that their performance scales accordingly. In the past, Moore's law and algorithmic advancements have been sufficient to meet these demands. However, with the slowdown of Moore's law, researchers have begun exploring alternative methods, such as application-specific technologies, to satisfy the more challenging performance requirements. One such technology is field-programmable gate arrays (FPGAs), which provide ideal platforms for developing and running custom architectures for accelerating database systems. The goal of this thesis is to develop a domain-specific architecture that can enhance the performance of in-memory database systems when executing analytical queries. Our research is guided by a combination of academic and industrial requirements that seek to strike a balance between generality and performance. The former ensures that our platform can be used to process a diverse range of workloads, while the latter makes it an attractive solution for high-performance use cases. Throughout this thesis, we present the development of a system-on-chip for database system acceleration that meets our requirements. The resulting architecture, called CbMSMK, is capable of processing the projection, sort, aggregation, and equi-join database operators and can also run some complex TPC-H queries. CbMSMK employs a shared sort-merge pipeline for executing all these operators, which results in an efficient use of FPGA resources. This approach enables the instantiation of multiple acceleration cores on the FPGA, allowing it to serve multiple clients simultaneously. CbMSMK can process both arbitrarily deep and wide tables efficiently. The former is achieved through the use of the sort-merge algorithm which utilizes the FPGA RAM for buffering intermediate sort results. The latter is achieved through the use of KeRRaS, a novel variant of the forward radix sort algorithm introduced in this thesis. KeRRaS allows CbMSMK to process a table a few columns at a time, incrementally generating the final result through multiple iterations. Given that acceleration is a key objective of our work, CbMSMK benefits from many performance optimizations. For instance, multi-way merging is employed to reduce the number of merge passes required for the execution of the sort-merge algorithm, thus improving the performance of all our pipeline-breaking operators. Another example is our in-depth analysis of early aggregation, which led to the development of a novel cache-based algorithm that significantly enhances aggregation performance. Our experiments demonstrate that CbMSMK performs on average 5 times faster than the state-of-the-art CPU-based database management system MonetDB.:I Database Systems & FPGAs 1 INTRODUCTION 1.1 Databases & the Importance of Performance 1.2 Accelerators & FPGAs 1.3 Requirements 1.4 Outline & Summary of Contributions 2 BACKGROUND ON DATABASE SYSTEMS 2.1 Databases 2.1.1 Storage Model 2.1.2 Storage Medium 2.2 Database Operators 2.2.1 Projection 2.2.2 Filter 2.2.3 Sort 2.2.4 Aggregation 2.2.5 Join 2.2.6 Operator Classification 2.3 Database Queries 2.4 Impact of Acceleration 3 BACKGROUND ON FPGAS 3.1 FPGA 3.1.1 Logic Element 3.1.2 Block RAM (BRAM) 3.1.3 Digital Signal Processor (DSP) 3.1.4 IO Element 3.1.5 Programmable Interconnect 3.2 FPGADesignFlow 3.2.1 Specifications 3.2.2 RTL Description 3.2.3 Verification 3.2.4 Synthesis, Mapping, Placement, and Routing 3.2.5 TimingAnalysis 3.2.6 Bitstream Generation and FPGA Programming 3.3 Implementation Quality Metrics 3.4 FPGA Cards 3.5 Benefits of Using FPGAs 3.6 Challenges of Using FPGAs 4 RELATED WORK 4.1 Summary of Related Work 4.2 Platform Type 4.2.1 Accelerator Card 4.2.2 Coprocessor 4.2.3 Smart Storage 4.2.4 Network Processor 4.3 Implementation 4.3.1 Loop-based implementation 4.3.2 Sort-based Implementation 4.3.3 Hash-based Implementation 4.3.4 Mixed Implementation 4.4 A Note on Quantitative Performance Comparisons II Cache-Based Morphing Sort-Merge with KeRRaS (CbMSMK) 5 OBJECTIVES AND ARCHITECTURE OVERVIEW 5.1 From Requirements to Objectives 5.2 Architecture Overview 5.3 Outlineof Part II 6 COMPARATIVE ANALYSIS OF OPENCL AND RTL FOR SORT-MERGE PRIMITIVES ON FPGAS 6.1 Programming FPGAs 6.2 RelatedWork 6.3 Architecture 6.3.1 Global Architecture 6.3.2 Sorter Architecture 6.3.3 Merger Architecture 6.3.4 Scalability and Resource Adaptability 6.4 Experiments 6.4.1 OpenCL Sort-Merge Implementation 6.4.2 RTLSorters 6.4.3 RTLMergers 6.4.4 Hybrid OpenCL-RTL Sort-Merge Implementation 6.5 Summary & Discussion 7 RESOURCE-EFFICIENT ACCELERATION OF PIPELINE-BREAKING DATABASE OPERATORS ON FPGAS 7.1 The Case for Resource Efficiency 7.2 Related Work 7.3 Architecture 7.3.1 Sorters 7.3.2 Sort-Network 7.3.3 X:Y Mergers 7.3.4 Merge-Network 7.3.5 Join Materialiser (JoinMat) 7.4 Experiments 7.4.1 Experimental Setup 7.4.2 Implementation Description & Tuning 7.4.3 Sort Benchmarks 7.4.4 Aggregation Benchmarks 7.4.5 Join Benchmarks 7. Summary 8 KERRAS: COLUMN-ORIENTED WIDE TABLE PROCESSING ON FPGAS 8.1 The Scope of Database System Accelerators 8.2 Related Work 8.3 Key-Reduce Radix Sort(KeRRaS) 8.3.1 Time Complexity 8.3.2 Space Complexity (Memory Utilization) 8.3.3 Discussion and Optimizations 8.4 Architecture 8.4.1 MSM 8.4.2 MSMK: Extending MSM with KeRRaS 8.4.3 Payload, Aggregation and Join Processing 8.4.4 Limitations 8.5 Experiments 8.5.1 Experimental Setup 8.5.2 Datasets 8.5.3 MSMK vs. MSM 8.5.4 Payload-Less Benchmarks 8.5.5 Payload-Based Benchmarks 8.5.6 Flexibility 8.6 Summary 9 A STUDY OF EARLY AGGREGATION IN DATABASE QUERY PROCESSING ON FPGAS 9.1 Early Aggregation 9.2 Background & Related Work 9.2.1 Sort-Based Early Aggregation 9.2.2 Cache-Based Early Aggregation 9.3 Simulations 9.3.1 Datasets 9.3.2 Metrics 9.3.3 Sort-Based Versus Cache-Based Early Aggregation 9.3.4 Comparison of Set-Associative Caches 9.3.5 Comparison of Cache Structures 9.3.6 Comparison of Replacement Policies 9.3.7 Cache Selection Methodology 9.4 Cache System Architecture 9.4.1 Window Aggregator 9.4.2 Compressor & Hasher 9.4.3 Collision Detector 9.4.4 Collision Resolver 9.4.5 Cache 9.5 Experiments 9.5.1 Experimental Setup 9.5.2 Resource Utilization and Parameter Tuning 9.5.3 Datasets 9.5.4 Benchmarks on Synthetic Data 9.5.5 Benchmarks on Real Data 9.6 Summary 10 THE FULL PICTURE 10.1 System Architecture 10.2 Benchmarks 10.3 Meeting the Objectives III Conclusion 11 SUMMARY AND OUTLOOK ON FUTURE RESEARCH 11.1 Summary 11.2 Future Work BIBLIOGRAPHY LIST OF FIGURES LIST OF TABLE

    A dynamic prediction and monitoring framework for distributed applications

    Get PDF
    This research builds on an application performance prediction and characterisation environment (known as PACE), whose aim is to characterise the performance-critical elements of both an application and its target execution environment and deduce from this model a predicted behaviour of the application prior to its execution. Underlying the research presented in this thesis are a number of themes: the tasks involved in the performance characterisation of applications and how this might be semi- automated: the level of abstraction at which these characterisations are performed in order to maintain a sufficient predictive accuracy: the automated refinement of these characterisations from runtime performance data: the extension of both the target programming languages and the class of application at which these techniques are aimed. In this thesis a number of novel extensions to PACE are described. These include: a new transaction-based performance characterisation language that provides a flexible framework for describing broader classes of application; a performance monitoring framework (based on an extension to the OpenGroup’s Application Response Measurement (ARM) standard) for the runtime monitoring of an application's data-dependent components and the automated refinement of performance models: an adaptation of this performance characterisation for the prediction of Java applications. These contributions are demonstrated through their application to a number of scientific kernels. This thesis also documents how these predictive results can be used in a real-time distributed runtime management environment, and also how these techniques can be applied to non-scientific codes, in particular to an IBM request-driven distributed web services demonstrator

    Flow-Induced Vibrations of In-Line Cylinder Arrangements at Low Reynolds Numbers

    Get PDF
    RÉSUMÉ Les vibrations induites par sillage (Wake-Induced Vibration ou WIV en anglais) est un type d’interactions fluide-structure qui peut se produire quand deux corps ou plus, montés élastiquement, sont disposés l’un derrière l’autre dans un écoulement transverse. Dans cette configuration, le corps situé en aval est soumis non seulement à son propre lâcher tourbillonnaire mais également à celui généré par le cylindre amont. Par conséquent, le corps aval peut osciller fortement avec des amplitudes maximales pouvant atteindre A/D=10 (Paidaussis et al. (2011)). Les WIV sont encore mal connues. Même un leader mondial en classification dans le domaine de l’ingénierie offshore ne sait pas comment traiter les phénomènes d’interférences entre plusieurs colonnes montantes avec WIV (Det Norske Veritas (2009)). La plus part des études effectuées considèrent simplement une configuration en tandem d’une paire de cylindres. Peu d’études ont été réalisées avec plus de deux corps montés élastiquement. En 2009, Etienne at al. ont considéré 3 cylindres arrangés en ligne dans un écoulement uniforme. Pour un nombre de Reynolds de 200 et une vitesse réduite de 8, ils ont montré par simulation numérique que les cylindres pouvaient subir de fortes oscillations. En 2013,Oviedo-Tolentino et al. ont étudié expérimentalement les oscillations de 10 cylindres placés les uns derrière les autres pour un facteur de masse amortissement de m¤³ = 0.13. Ils ont confirmé que le troisième cylindre, c’est-à-dire celui placé derrière les deux premiers, peut subir des oscillations transverses plus importantes encore que celles subies par le deuxième cylindre. Ces grandes oscillations peuvent non seulement causer une fatigue excessive des matériaux mais également provoquer des collisions entre les cylindres. Ainsi, les WIV peuvent poser de sérieux problèmes lors de la conception de nombreux systèmes en ingénierie. À la lumière de ces études récentes, il est donc nécessaire d’approfondir l’étude des comportements de plusieurs corps placés les uns derrière les autres dans un écoulement transverse et montés élastiquement. Mise à part les fortes oscillations observées, de nombreux aspects des WIV de plusieurs cylindres en ligne restent très mal connus : les réponses fréquentielles et les amplitudes maximales produites, l’influence du nombre de Reynolds de l’écoulement, les effets dus à des ratios de masse ou des facteurs de masse amortissement faibles, etc. Cette thèse vise à explorer numériquement les réponses d’oscillations induites par sillage de 3 cylindres circulaires disposés en ligne et ayant un nombre de masse faible et un amortissement nul pour de faibles nombres de Reynolds. Pour atteindre cet objectif de recherche, on procède en trois étapes.----------ABSTRACT Wake-induced vibration (WIV) is a type of fluid-structure interaction (FSI) that may occur when there are two or more elastically mounted bodies, arranged one after the other, in a cross flow. Here, the downstream body is not only affected by the vortices generated behind the body itself, but also is subjected to the influence of the wake developed behind the upstream body. Under these two disturbances, the downstream body can develop severe oscillations with a maximum amplitude as large as A/D = 10 (Paidoussis et al. (2011)). The knowledge of WIV is still so limited that even in the recommended practice for riser interference from a world class leader in offshore engineering classification does not know yet how to consistently incorporate the consideration of WIV (Det Norske Veritas (2009)). Most investigations consider the configuration with a tandem cylinder pair placed in a uniform flow. Very little is known when there are more than two elastically mounted structural bodies. In a brief investigation, Étienne et al. (2009) numerically showed that three freely oscillating cylinders arranged in-line, in a uniform flow at the Reynolds number of Re = 200 and at a fixed reduced velocity of Ur = 8, can develop significant vibrations. A recent original experiment by Oviedo-Tolentino et al. (2013), who studied the oscillation response of ten collinear cylinders with a medium large mass-damping factor (m¤³ = 0.13) placed in a uniform flow, confirmed that the cylinders behind the second one can develop transverse oscillations that are actually larger than those of the second cylinder. These more severe oscillations, not only can cause fatigue of material, but also can potentially lead to collisions among the cylinders. These conditions pose great challenges for engineering design. Based on these recent findings, it is therefore important to take a closer look at the behavior of multiple elastically mounted bodies arranged in-line placed in a cross flow. Apart from the more significant oscillations observed, many important aspects about WIV of multiple in-line cylinders, e.g. the low mass ratio, the low mass-damping factor, the maximum oscillation amplitude, the frequency responses, and the effect of Reynolds number, etc., remain essentially unknown. This thesis aims to numerically explore the wake-induced vibration responses of three circular cylinders with low mass ratio and zero damping arranged in-line at low Reynolds number in order to advance the fundamental engineering knowledge regarding multiple elastically mounted in-line bodies placed in a cross flow. To reach this research goal, we have identified three specific objectives
    corecore