1,069 research outputs found

    A Massively Parallel Algorithm for the Approximate Calculation of Inverse p-th Roots of Large Sparse Matrices

    Get PDF
    We present the submatrix method, a highly parallelizable method for the approximate calculation of inverse p-th roots of large sparse symmetric matrices which are required in different scientific applications. We follow the idea of Approximate Computing, allowing imprecision in the final result in order to be able to utilize the sparsity of the input matrix and to allow massively parallel execution. For an n x n matrix, the proposed algorithm allows to distribute the calculations over n nodes with only little communication overhead. The approximate result matrix exhibits the same sparsity pattern as the input matrix, allowing for efficient reuse of allocated data structures. We evaluate the algorithm with respect to the error that it introduces into calculated results, as well as its performance and scalability. We demonstrate that the error is relatively limited for well-conditioned matrices and that results are still valuable for error-resilient applications like preconditioning even for ill-conditioned matrices. We discuss the execution time and scaling of the algorithm on a theoretical level and present a distributed implementation of the algorithm using MPI and OpenMP. We demonstrate the scalability of this implementation by running it on a high-performance compute cluster comprised of 1024 CPU cores, showing a speedup of 665x compared to single-threaded execution

    The clock paradox in a static homogeneous gravitational field

    Get PDF
    The gedanken experiment of the clock paradox is solved exactly using the general relativistic equations for a static homogeneous gravitational field. We demonstrate that the general and special relativistic clock paradox solutions are identical and in particular that they are identical for finite acceleration. Practical expressions are obtained for proper time and coordinate time by using the destination distance as the key observable parameter. This solution provides a formal demonstration of the identity between the special and general relativistic clock paradox with finite acceleration and where proper time is assumed to be the same in both formalisms. By solving the equations of motion for a freely falling clock in a static homogeneous field elapsed times are calculated for realistic journeys to the stars.Comment: Revision: Posted with the caption included with the figure

    Extremely high He isotope ratios in MORB-source mantle from the proto-Iceland plume

    Get PDF
    The high <sup>3</sup>He/<sup>4</sup>He ratio of volcanic rocks thought to be derived from mantle plumes is taken as evidence for the existence of a mantle reservoir that has remained largely undegassed since the Earth's accretion. The helium isotope composition of this reservoir places constraints on the origin of volatiles within the Earth and on the evolution and structure of the Earth's mantle. Here we show that olivine phenocrysts in picritic basalts presumably derived from the proto-Iceland plume at Baffin Island, Canada, have the highest magmatic <sup>3</sup>He/<sup>4</sup>He ratios yet recorded. A strong correlation between <sup>3</sup>He/<sup>4</sup>He and <sup>87</sup>Sr/<sup>86</sup>Sr, <sup>143</sup>Nd/<sup>144</sup>Nd and trace element ratios demonstrate that the <sup>3</sup>He-rich end-member is present in basalts that are derived from large-volume melts of depleted upper-mantle rocks. This reservoir is consistent with the recharging of depleted upper-mantle rocks by small volumes of primordial volatile-rich lower-mantle material at a thermal boundary layer between convectively isolated reservoirs. The highest <sup>3</sup>He/<sup>4</sup>He basalts from Hawaii and Iceland plot on the observed mixing trend. This indicates that a <sup>3</sup>He-recharged depleted mantle (HRDM) reservoir may be the principal source of high <sup>3</sup>He/<sup>4</sup>He in mantle plumes, and may explain why the helium concentration of the 'plume' component in ocean island basalts is lower than that predicted for a two-layer, steady-state model of mantle structure

    Towards a grapho-phonologically parsed corpus of medieval Scots:Database design and technical solutions

    Get PDF
    This paper presents a newly constructed corpus of sound-to-spelling mappings in medieval Scots, which stems from the work of the From Inglis to Scots (FITS) project. We have developed a systematic approach to the relationships between individual spellings and proposed sound values, and recorded these mutual links in a relational database. In this paper, we introduce the theoretical underpinnings of sound-to-spelling and spelling-to-sound mappings, and show how a Scots root morpheme undergoes grapho-phonological parsing, the analytical procedure that is employed to break down spelling sequences into sound units. We explain the data collection and annotation for the FITS Corpus (Alcorn et al., forthcoming), drawing attention to the extensive meta-data which accompany each analysed unit of spelling and sound. The database records grammatical and lexical information about the root, the positional arrangement of segments within the root, labels for the nuclei, vowels and consonants, the morphological context, and extra-linguistic detail of the text a given root was taken from (date, place and text type). With this wealth of information, the FITS corpus is capable of answering complex queries about the sound and spelling systems of medieval Scots. We also suggest how our methodology can be transferred to other non-standardised spelling systems

    Towards electronic structure-based ab-initio molecular dynamics simulations with hundreds of millions of atoms

    Full text link
    We push the boundaries of electronic structure-based ab-initio molecular dynamics (AIMD) beyond 100 million atoms. This scale is otherwise barely reachable with classical force-field methods or novel neural network and machine learning potentials. We achieve this breakthrough by combining innovations in linear-scaling AIMD, efficient and approximate sparse linear algebra, low and mixed-precision floating-point computation on GPUs, and a compensation scheme for the errors introduced by numerical approximations. The core of our work is the non-orthogonalized local submatrix method (NOLSM), which scales very favorably to massively parallel computing systems and translates large sparse matrix operations into highly parallel, dense matrix operations that are ideally suited to hardware accelerators. We demonstrate that the NOLSM method, which is at the center point of each AIMD step, is able to achieve a sustained performance of 324 PFLOP/s in mixed FP16/FP32 precision corresponding to an efficiency of 67.7% when running on 1536 NVIDIA A100 GPUs
    corecore