493 research outputs found

    Bit-Vectorized GPU Implementation of a Stochastic Cellular Automaton Model for Surface Growth

    Full text link
    Stochastic surface growth models aid in studying properties of universality classes like the Kardar--Paris--Zhang class. High precision results obtained from large scale computational studies can be transferred to many physical systems. Many properties, such as roughening and some two-time functions can be studied using stochastic cellular automaton (SCA) variants of stochastic models. Here we present a highly efficient SCA implementation of a surface growth model capable of simulating billions of lattice sites on a single GPU. We also provide insight into cases requiring arbitrary random probabilities which are not accessible through bit-vectorization.Comment: INES 2016, Budapest http://www.ines-conf.org/ines-conf/2016index.htm

    The shape of the CMB lensing bispectrum

    Full text link
    Lensing of the CMB generates a significant bispectrum, which should be detected by the Planck satellite at the 5-sigma level and is potentially a non-negligible source of bias for f_NL estimators of local non-Gaussianity. We extend current understanding of the lensing bispectrum in several directions: (1) we perform a non-perturbative calculation of the lensing bispectrum which is ~10% more accurate than previous, first-order calculations; (2) we demonstrate how to incorporate the signal variance of the lensing bispectrum into estimates of its amplitude, providing a good analytical explanation for previous Monte-Carlo results; and (3) we discover the existence of a significant lensing bispectrum in polarization, due to a previously-unnoticed correlation between the lensing potential and E-polarization as large as 30% at low multipoles. We use this improved understanding of the lensing bispectra to re-evaluate Fisher-matrix predictions, both for Planck and cosmic variance limited data. We confirm that the non-negligible lensing-induced bias for estimation of local non-Gaussianity should be robustly treatable, and will only inflate f_NL error bars by a few percent over predictions where lensing effects are completely ignored (but note that lensing must still be accounted for to obtain unbiased constraints). We also show that the detection significance for the lensing bispectrum itself is ultimately limited to 9 sigma by cosmic variance. The tools that we develop for non-perturbative calculation of the lensing bispectrum are directly relevant to other calculations, and we give an explicit construction of a simple non-perturbative quadratic estimator for the lensing potential and relate its cross-correlation power spectrum to the bispectrum. Our numerical codes are publicly available as part of CAMB and LensPix.Comment: 32 pages, 10 figures; minor changes to match JCAP-accepted version. CMB lensing and primordial local bispectrum codes available as part of CAMB (http://camb.info/

    Resiliency Mechanisms for In-Memory Column Stores

    Get PDF
    The key objective of database systems is to reliably manage data, while high query throughput and low query latency are core requirements. To date, database research activities mostly concentrated on the second part. However, due to the constant shrinking of transistor feature sizes, integrated circuits become more and more unreliable and transient hardware errors in the form of multi-bit flips become more and more prominent. In a more recent study (2013), in a large high-performance cluster with around 8500 nodes, a failure rate of 40 FIT per DRAM device was measured. For their system, this means that every 10 hours there occurs a single- or multi-bit flip, which is unacceptably high for enterprise and HPC scenarios. Causes can be cosmic rays, heat, or electrical crosstalk, with the latter being exploited actively through the RowHammer attack. It was shown that memory cells are more prone to bit flips than logic gates and several surveys found multi-bit flip events in main memory modules of today's data centers. Due to the shift towards in-memory data management systems, where all business related data and query intermediate results are kept solely in fast main memory, such systems are in great danger to deliver corrupt results to their users. Hardware techniques can not be scaled to compensate the exponentially increasing error rates. In other domains, there is an increasing interest in software-based solutions to this problem, but these proposed methods come along with huge runtime and/or storage overheads. These are unacceptable for in-memory data management systems. In this thesis, we investigate how to integrate bit flip detection mechanisms into in-memory data management systems. To achieve this goal, we first build an understanding of bit flip detection techniques and select two error codes, AN codes and XOR checksums, suitable to the requirements of in-memory data management systems. The most important requirement is effectiveness of the codes to detect bit flips. We meet this goal through AN codes, which exhibit better and adaptable error detection capabilities than those found in today's hardware. The second most important goal is efficiency in terms of coding latency. We meet this by introducing a fundamental performance improvements to AN codes, and by vectorizing both chosen codes' operations. We integrate bit flip detection mechanisms into the lowest storage layer and the query processing layer in such a way that the remaining data management system and the user can stay oblivious of any error detection. This includes both base columns and pointer-heavy index structures such as the ubiquitous B-Tree. Additionally, our approach allows adaptable, on-the-fly bit flip detection during query processing, with only very little impact on query latency. AN coding allows to recode intermediate results with virtually no performance penalty. We support our claims by providing exhaustive runtime and throughput measurements throughout the whole thesis and with an end-to-end evaluation using the Star Schema Benchmark. To the best of our knowledge, we are the first to present such holistic and fast bit flip detection in a large software infrastructure such as in-memory data management systems. Finally, most of the source code fragments used to obtain the results in this thesis are open source and freely available.:1 INTRODUCTION 1.1 Contributions of this Thesis 1.2 Outline 2 PROBLEM DESCRIPTION AND RELATED WORK 2.1 Reliable Data Management on Reliable Hardware 2.2 The Shift Towards Unreliable Hardware 2.3 Hardware-Based Mitigation of Bit Flips 2.4 Data Management System Requirements 2.5 Software-Based Techniques For Handling Bit Flips 2.5.1 Operating System-Level Techniques 2.5.2 Compiler-Level Techniques 2.5.3 Application-Level Techniques 2.6 Summary and Conclusions 3 ANALYSIS OF CODING TECHNIQUES 3.1 Selection of Error Codes 3.1.1 Hamming Coding 3.1.2 XOR Checksums 3.1.3 AN Coding 3.1.4 Summary and Conclusions 3.2 Probabilities of Silent Data Corruption 3.2.1 Probabilities of Hamming Codes 3.2.2 Probabilities of XOR Checksums 3.2.3 Probabilities of AN Codes 3.2.4 Concrete Error Models 3.2.5 Summary and Conclusions 3.3 Throughput Considerations 3.3.1 Test Systems Descriptions 3.3.2 Vectorizing Hamming Coding 3.3.3 Vectorizing XOR Checksums 3.3.4 Vectorizing AN Coding 3.3.5 Summary and Conclusions 3.4 Comparison of Error Codes 3.4.1 Effectiveness 3.4.2 Efficiency 3.4.3 Runtime Adaptability 3.5 Performance Optimizations for AN Coding 3.5.1 The Modular Multiplicative Inverse 3.5.2 Faster Softening 3.5.3 Faster Error Detection 3.5.4 Comparison to Original AN Coding 3.5.5 The Multiplicative Inverse Anomaly 3.6 Summary 4 BIT FLIP DETECTING STORAGE 4.1 Column Store Architecture 4.1.1 Logical Data Types 4.1.2 Storage Model 4.1.3 Data Representation 4.1.4 Data Layout 4.1.5 Tree Index Structures 4.1.6 Summary 4.2 Hardened Data Storage 4.2.1 Hardened Physical Data Types 4.2.2 Hardened Lightweight Compression 4.2.3 Hardened Data Layout 4.2.4 UDI Operations 4.2.5 Summary and Conclusions 4.3 Hardened Tree Index Structures 4.3.1 B-Tree Verification Techniques 4.3.2 Justification For Further Techniques 4.3.3 The Error Detecting B-Tree 4.4 Summary 5 BIT FLIP DETECTING QUERY PROCESSING 5.1 Column Store Query Processing 5.2 Bit Flip Detection Opportunities 5.2.1 Early Onetime Detection 5.2.2 Late Onetime Detection 5.2.3 Continuous Detection 5.2.4 Miscellaneous Processing Aspects 5.2.5 Summary and Conclusions 5.3 Hardened Intermediate Results 5.3.1 Materialization of Hardened Intermediates 5.3.2 Hardened Bitmaps 5.4 Summary 6 END-TO-END EVALUATION 6.1 Prototype Implementation 6.1.1 AHEAD Architecture 6.1.2 Diversity of Physical Operators 6.1.3 One Concrete Operator Realization 6.1.4 Summary and Conclusions 6.2 Performance of Individual Operators 6.2.1 Selection on One Predicate 6.2.2 Selection on Two Predicates 6.2.3 Join Operators 6.2.4 Grouping and Aggregation 6.2.5 Delta Operator 6.2.6 Summary and Conclusions 6.3 Star Schema Benchmark Queries 6.3.1 Query Runtimes 6.3.2 Improvements Through Vectorization 6.3.3 Storage Overhead 6.3.4 Summary and Conclusions 6.4 Error Detecting B-Tree 6.4.1 Single Key Lookup 6.4.2 Key Value-Pair Insertion 6.5 Summary 7 SUMMARY AND CONCLUSIONS 7.1 Future Work A APPENDIX A.1 List of Golden As A.2 More on Hamming Coding A.2.1 Code examples A.2.2 Vectorization BIBLIOGRAPHY LIST OF FIGURES LIST OF TABLES LIST OF LISTINGS LIST OF ACRONYMS LIST OF SYMBOLS LIST OF DEFINITION

    ExoCross: a general program for generating spectra from molecular line lists

    Get PDF
    ExoCross is a Fortran code for generating spectra (emission, absorption) and thermodynamic properties (partition function, specific heat etc.) from molecular line lists. Input is taken in several formats, including ExoMol and HITRAN formats. ExoCross is efficiently parallelized showing also a high degree of vectorization. It can work with several line profiles such as Doppler, Lorentzian and Voigt and support several broadening schemes. Voigt profiles are handled by several methods allowing fast and accurate simulations. Two of these methods are new. ExoCross is also capable of working with the recently proposed method of super-lines. It supports calculations of lifetimes, cooling functions, specific heats and other properties. ExoCross can be used to convert between different formats, such as HITRAN, ExoMol and Phoenix. It is capable of simulating non-LTE spectra using a simple two-temperature approach. Different electronic, vibronic or vibrational bands can be simulated separately using an efficient filtering scheme based on the quantum numbers

    Conjugate gradient solvers on Intel Xeon Phi and NVIDIA GPUs

    Full text link
    Lattice Quantum Chromodynamics simulations typically spend most of the runtime in inversions of the Fermion Matrix. This part is therefore frequently optimized for various HPC architectures. Here we compare the performance of the Intel Xeon Phi to current Kepler-based NVIDIA Tesla GPUs running a conjugate gradient solver. By exposing more parallelism to the accelerator through inverting multiple vectors at the same time, we obtain a performance greater than 300 GFlop/s on both architectures. This more than doubles the performance of the inversions. We also give a short overview of the Knights Corner architecture, discuss some details of the implementation and the effort required to obtain the achieved performance.Comment: 7 pages, proceedings, presented at 'GPU Computing in High Energy Physics', September 10-12, 2014, Pisa, Ital

    Solution of the Skyrme-Hartree-Fock equations in the Cartesian deformed harmonic oscillator basis. (I) The method

    Get PDF
    We describe a method of solving the nuclear Skyrme-Hartree-Fock problem by using a deformed Cartesian harmonic oscillator basis. The complete list of expressions required to calculate local densities, total energy, and self-consistent fields is presented, and an implementation of the self-consistent symmetries is discussed. Formulas to calculate matrix elements in the Cartesian harmonic oscillator basis are derived for the nuclear and Coulomb interactions.Comment: 26 LaTeX pages, submitted to Computer Physics Communication

    VaR-implied tail-correlation matrices : [Version October 2013]

    Get PDF
    Empirical evidence suggests that asset returns correlate more strongly in bear markets than conventional correlation estimates imply. We propose a method for determining complete tail correlation matrices based on Value-at-Risk (VaR) estimates. We demonstrate how to obtain more efficient tail-correlation estimates by use of overidentification strategies and how to guarantee positive semidefiniteness, a property required for valid risk aggregation and Markowitz{type portfolio optimization. An empirical application to a 30-asset universe illustrates the practical applicability and relevance of the approach in portfolio management
    • …
    corecore