25,857 research outputs found
Enhancing Energy Production with Exascale HPC Methods
High Performance Computing (HPC) resources have become the key actor for achieving more ambitious challenges in many disciplines. In this step beyond, an explosion on the available parallelism and the use of special purpose
processors are crucial. With such a goal, the HPC4E project applies new exascale HPC techniques to energy industry simulations, customizing them if necessary, and going beyond the state-of-the-art in the required HPC exascale
simulations for different energy sources. In this paper, a general overview of these methods is presented as well as some specific preliminary results.The research leading to these results has received funding from the European Union's Horizon 2020 Programme (2014-2020) under the HPC4E Project (www.hpc4e.eu), grant agreement n° 689772, the Spanish Ministry of
Economy and Competitiveness under the CODEC2 project (TIN2015-63562-R), and
from the Brazilian Ministry of Science, Technology and Innovation through Rede
Nacional de Pesquisa (RNP). Computer time on Endeavour cluster is provided by the
Intel Corporation, which enabled us to obtain the presented experimental results in
uncertainty quantification in seismic imagingPostprint (author's final draft
Foggy clouds and cloudy fogs: a real need for coordinated management of fog-to-cloud computing systems
The recent advances in cloud services technology are fueling a plethora of information technology innovation, including networking, storage, and computing. Today, various flavors have evolved of IoT, cloud computing, and so-called fog computing, a concept referring to capabilities of edge devices and users' clients to compute, store, and exchange data among each other and with the cloud. Although the rapid pace of this evolution was not easily foreseeable, today each piece of it facilitates and enables the deployment of what we commonly refer to as a smart scenario, including smart cities, smart transportation, and smart homes. As most current cloud, fog, and network services run simultaneously in each scenario, we observe that we are at the dawn of what may be the next big step in the cloud computing and networking evolution, whereby services might be executed at the network edge, both in parallel and in a coordinated fashion, as well as supported by the unstoppable technology evolution. As edge devices become richer in functionality and smarter, embedding capacities such as storage or processing, as well as new functionalities, such as decision making, data collection, forwarding, and sharing, a real need is emerging for coordinated management of fog-to-cloud (F2C) computing systems. This article introduces a layered F2C architecture, its benefits and strengths, as well as the arising open and research challenges, making the case for the real need for their coordinated management. Our architecture, the illustrative use case presented, and a comparative performance analysis, albeit conceptual, all clearly show the way forward toward a new IoT scenario with a set of existing and unforeseen services provided on highly distributed and dynamic compute, storage, and networking resources, bringing together heterogeneous and commodity edge devices, emerging fogs, as well as conventional clouds.Peer ReviewedPostprint (author's final draft
Highly parallel computation
Highly parallel computing architectures are the only means to achieve the computation rates demanded by advanced scientific problems. A decade of research has demonstrated the feasibility of such machines and current research focuses on which architectures designated as multiple instruction multiple datastream (MIMD) and single instruction multiple datastream (SIMD) have produced the best results to date; neither shows a decisive advantage for most near-homogeneous scientific problems. For scientific problems with many dissimilar parts, more speculative architectures such as neural networks or data flow may be needed
Mixing multi-core CPUs and GPUs for scientific simulation software
Recent technological and economic developments have led to widespread availability of
multi-core CPUs and specialist accelerator processors such as graphical processing units
(GPUs). The accelerated computational performance possible from these devices can be very
high for some applications paradigms. Software languages and systems such as NVIDIA's
CUDA and Khronos consortium's open compute language (OpenCL) support a number of
individual parallel application programming paradigms. To scale up the performance of some
complex systems simulations, a hybrid of multi-core CPUs for coarse-grained parallelism and
very many core GPUs for data parallelism is necessary. We describe our use of hybrid applica-
tions using threading approaches and multi-core CPUs to control independent GPU devices.
We present speed-up data and discuss multi-threading software issues for the applications
level programmer and o er some suggested areas for language development and integration
between coarse-grained and ne-grained multi-thread systems. We discuss results from three
common simulation algorithmic areas including: partial di erential equations; graph cluster
metric calculations and random number generation. We report on programming experiences
and selected performance for these algorithms on: single and multiple GPUs; multi-core CPUs;
a CellBE; and using OpenCL. We discuss programmer usability issues and the outlook and
trends in multi-core programming for scienti c applications developers
Design of testbed and emulation tools
The research summarized was concerned with the design of testbed and emulation tools suitable to assist in projecting, with reasonable accuracy, the expected performance of highly concurrent computing systems on large, complete applications. Such testbed and emulation tools are intended for the eventual use of those exploring new concurrent system architectures and organizations, either as users or as designers of such systems. While a range of alternatives was considered, a software based set of hierarchical tools was chosen to provide maximum flexibility, to ease in moving to new computers as technology improves and to take advantage of the inherent reliability and availability of commercially available computing systems
Scalability of broadcast performance in wireless network-on-chip
Networks-on-Chip (NoCs) are currently the paradigm of choice to interconnect the cores of a chip multiprocessor. However, conventional NoCs may not suffice to fulfill the on-chip communication requirements of processors with hundreds or thousands of cores. The main reason is that the performance of such networks drops as the number of cores grows, especially in the presence of multicast and broadcast traffic. This not only limits the scalability of current multiprocessor architectures, but also sets a performance wall that prevents the development of architectures that generate moderate-to-high levels of multicast. In this paper, a Wireless Network-on-Chip (WNoC) where all cores share a single broadband channel is presented. Such design is conceived to provide low latency and ordered delivery for multicast/broadcast traffic, in an attempt to complement a wireline NoC that will transport the rest of communication flows. To assess the feasibility of this approach, the network performance of WNoC is analyzed as a function of the system size and the channel capacity, and then compared to that of wireline NoCs with embedded multicast support. Based on this evaluation, preliminary results on the potential performance of the proposed hybrid scheme are provided, together with guidelines for the design of MAC protocols for WNoC.Peer ReviewedPostprint (published version
An adaptive hierarchical domain decomposition method for parallel contact dynamics simulations of granular materials
A fully parallel version of the contact dynamics (CD) method is presented in
this paper. For large enough systems, 100% efficiency has been demonstrated for
up to 256 processors using a hierarchical domain decomposition with dynamic
load balancing. The iterative scheme to calculate the contact forces is left
domain-wise sequential, with data exchange after each iteration step, which
ensures its stability. The number of additional iterations required for
convergence by the partially parallel updates at the domain boundaries becomes
negligible with increasing number of particles, which allows for an effective
parallelization. Compared to the sequential implementation, we found no
influence of the parallelization on simulation results.Comment: 19 pages, 15 figures, published in Journal of Computational Physics
(2011
- …