407,005 research outputs found
The Challenge of Time-Predictability in Modern Many-Core Architectures
The recent technological advancements and market trends are causing an interesting phenomenon towards the convergence of High-Performance Computing (HPC) and Embedded Computing (EC) domains. Many recent HPC applications require huge amounts of information to be processed within a bounded amount of time while EC systems are increasingly concerned with providing higher performance in real-time. The convergence of these two domains towards systems requiring both high performance and a predictable time-behavior challenges the capabilities of current hardware architectures. Fortunately, the advent of next-generation many-core embedded platforms has the chance of intercepting this converging need for predictability and high-performance, allowing HPC and EC applications to be executed on efficient and powerful heterogeneous architectures integrating general-purpose processors with many-core computing fabrics. However, addressing this mixed set of requirements is not without its own challenges and it is now of paramount importance to develop new techniques to exploit the massively parallel computation capabilities of many-core platforms in a predictable way
Accelerating the Rate of Astronomical Discovery with GPU-Powered Clusters
In recent years, the Graphics Processing Unit (GPU) has emerged as a low-cost
alternative for high performance computing, enabling impressive speed-ups for a
range of scientific computing applications. Early adopters in astronomy are
already benefiting in adapting their codes to take advantage of the GPU's
massively parallel processing paradigm. I give an introduction to, and overview
of, the use of GPUs in astronomy to date, highlighting the adoption and
application trends from the first ~100 GPU-related publications in astronomy. I
discuss the opportunities and challenges of utilising GPU computing clusters,
such as the new Australian GPU supercomputer, gSTAR, for accelerating the rate
of astronomical discovery.Comment: To appear in the proceedings of ADASS XXI, ed. P.Ballester and
D.Egret, ASP Conf. Se
Enhanced mobile computing using cloud resources
Summary in English.Includes bibliographical references.The purpose of this research is to investigate, review and analyse the use of cloud resources for the enhancement of mobile computing. Mobile cloud computing refers to a distributed computing relationship between a resource-constrained mobile device and a remote high-capacity cloud resource. Investigation of prevailing trends has shown that this will be a key technology in the development of future mobile computing systems. This research presents a theoretical analysis framework for mobile cloud computing. This analysis framework is a structured consolidation of the salient considerations identified in recent scientific literature and commercial endeavours. The use of this framework in the analysis of various mobile application domains has elucidated several significant benefits of mobile cloud computing including increases in system performance and efficiency. Based on recent scientific literature and commercial endeavours, various implementation approaches for mobile cloud computing have been identified, categorized and analysed according to their architectural characteristics. This has resulted in a set of advantages and disadvantages for each category of system architecture. Overall, through the development and application of the new analysis framework, this work provides a consolidated review and structured critical analysis of the current research and developments in the field of mobile cloud computing
On the Efficient Evaluation of the Exchange Correlation Potential on Graphics Processing Unit Clusters
The predominance of Kohn-Sham density functional theory (KS-DFT) for the
theoretical treatment of large experimentally relevant systems in molecular
chemistry and materials science relies primarily on the existence of efficient
software implementations which are capable of leveraging the latest advances in
modern high performance computing (HPC). With recent trends in HPC leading
towards in increasing reliance on heterogeneous accelerator based architectures
such as graphics processing units (GPU), existing code bases must embrace these
architectural advances to maintain the high-levels of performance which have
come to be expected for these methods. In this work, we purpose a three-level
parallelism scheme for the distributed numerical integration of the
exchange-correlation (XC) potential in the Gaussian basis set discretization of
the Kohn-Sham equations on large computing clusters consisting of multiple GPUs
per compute node. In addition, we purpose and demonstrate the efficacy of the
use of batched kernels, including batched level-3 BLAS operations, in achieving
high-levels of performance on the GPU. We demonstrate the performance and
scalability of the implementation of the purposed method in the NWChemEx
software package by comparing to the existing scalable CPU XC integration in
NWChem.Comment: 26 pages, 9 figure
IBAS: An Infinite Buffer Abstraction for Streaming
Recent technological trends have resulted in the creation, manipulation and storage of vast amounts of information. Effectively dealing with these data volumes requires scalable data computing capable of functioning at high performance and making use of large distributed computing systems. One of the most prominent frameworks for tackling this problem is MapReduce, which abstracts the process of parallelizing and processing a computation across a large dataset. Current open-source implementations of MapReduce, however, are lacking in key aspects: performance, efficiency, and size (or ‘bloat’). To overcome this challenge, new abstractions are necessary in order to provide the lightweight scalability and speed required. In particular, this research examines algorithms for a new infinite-stream abstraction called IBAS, which is no longer limited by the size of the virtual memory supported by the CPU or operating system. Besides having unlimited size, this architecture eliminates the kernel overhead needed for page remapping, which should make it faster than previous streaming abstractions under many streaming conditions
- …