407,005 research outputs found

    The Challenge of Time-Predictability in Modern Many-Core Architectures

    Get PDF
    The recent technological advancements and market trends are causing an interesting phenomenon towards the convergence of High-Performance Computing (HPC) and Embedded Computing (EC) domains. Many recent HPC applications require huge amounts of information to be processed within a bounded amount of time while EC systems are increasingly concerned with providing higher performance in real-time. The convergence of these two domains towards systems requiring both high performance and a predictable time-behavior challenges the capabilities of current hardware architectures. Fortunately, the advent of next-generation many-core embedded platforms has the chance of intercepting this converging need for predictability and high-performance, allowing HPC and EC applications to be executed on efficient and powerful heterogeneous architectures integrating general-purpose processors with many-core computing fabrics. However, addressing this mixed set of requirements is not without its own challenges and it is now of paramount importance to develop new techniques to exploit the massively parallel computation capabilities of many-core platforms in a predictable way

    Accelerating the Rate of Astronomical Discovery with GPU-Powered Clusters

    Full text link
    In recent years, the Graphics Processing Unit (GPU) has emerged as a low-cost alternative for high performance computing, enabling impressive speed-ups for a range of scientific computing applications. Early adopters in astronomy are already benefiting in adapting their codes to take advantage of the GPU's massively parallel processing paradigm. I give an introduction to, and overview of, the use of GPUs in astronomy to date, highlighting the adoption and application trends from the first ~100 GPU-related publications in astronomy. I discuss the opportunities and challenges of utilising GPU computing clusters, such as the new Australian GPU supercomputer, gSTAR, for accelerating the rate of astronomical discovery.Comment: To appear in the proceedings of ADASS XXI, ed. P.Ballester and D.Egret, ASP Conf. Se

    Enhanced mobile computing using cloud resources

    Get PDF
    Summary in English.Includes bibliographical references.The purpose of this research is to investigate, review and analyse the use of cloud resources for the enhancement of mobile computing. Mobile cloud computing refers to a distributed computing relationship between a resource-constrained mobile device and a remote high-capacity cloud resource. Investigation of prevailing trends has shown that this will be a key technology in the development of future mobile computing systems. This research presents a theoretical analysis framework for mobile cloud computing. This analysis framework is a structured consolidation of the salient considerations identified in recent scientific literature and commercial endeavours. The use of this framework in the analysis of various mobile application domains has elucidated several significant benefits of mobile cloud computing including increases in system performance and efficiency. Based on recent scientific literature and commercial endeavours, various implementation approaches for mobile cloud computing have been identified, categorized and analysed according to their architectural characteristics. This has resulted in a set of advantages and disadvantages for each category of system architecture. Overall, through the development and application of the new analysis framework, this work provides a consolidated review and structured critical analysis of the current research and developments in the field of mobile cloud computing

    On the Efficient Evaluation of the Exchange Correlation Potential on Graphics Processing Unit Clusters

    Full text link
    The predominance of Kohn-Sham density functional theory (KS-DFT) for the theoretical treatment of large experimentally relevant systems in molecular chemistry and materials science relies primarily on the existence of efficient software implementations which are capable of leveraging the latest advances in modern high performance computing (HPC). With recent trends in HPC leading towards in increasing reliance on heterogeneous accelerator based architectures such as graphics processing units (GPU), existing code bases must embrace these architectural advances to maintain the high-levels of performance which have come to be expected for these methods. In this work, we purpose a three-level parallelism scheme for the distributed numerical integration of the exchange-correlation (XC) potential in the Gaussian basis set discretization of the Kohn-Sham equations on large computing clusters consisting of multiple GPUs per compute node. In addition, we purpose and demonstrate the efficacy of the use of batched kernels, including batched level-3 BLAS operations, in achieving high-levels of performance on the GPU. We demonstrate the performance and scalability of the implementation of the purposed method in the NWChemEx software package by comparing to the existing scalable CPU XC integration in NWChem.Comment: 26 pages, 9 figure

    IBAS: An Infinite Buffer Abstraction for Streaming

    Get PDF
    Recent technological trends have resulted in the creation, manipulation and storage of vast amounts of information. Effectively dealing with these data volumes requires scalable data computing capable of functioning at high performance and making use of large distributed computing systems. One of the most prominent frameworks for tackling this problem is MapReduce, which abstracts the process of parallelizing and processing a computation across a large dataset. Current open-source implementations of MapReduce, however, are lacking in key aspects: performance, efficiency, and size (or ‘bloat’). To overcome this challenge, new abstractions are necessary in order to provide the lightweight scalability and speed required. In particular, this research examines algorithms for a new infinite-stream abstraction called IBAS, which is no longer limited by the size of the virtual memory supported by the CPU or operating system. Besides having unlimited size, this architecture eliminates the kernel overhead needed for page remapping, which should make it faster than previous streaming abstractions under many streaming conditions
    • …
    corecore