1,642 research outputs found
HSTREAM: A directive-based language extension for heterogeneous stream computing
Big data streaming applications require utilization of heterogeneous parallel
computing systems, which may comprise multiple multi-core CPUs and many-core
accelerating devices such as NVIDIA GPUs and Intel Xeon Phis. Programming such
systems require advanced knowledge of several hardware architectures and
device-specific programming models, including OpenMP and CUDA. In this paper,
we present HSTREAM, a compiler directive-based language extension to support
programming stream computing applications for heterogeneous parallel computing
systems. HSTREAM source-to-source compiler aims to increase the programming
productivity by enabling programmers to annotate the parallel regions for
heterogeneous execution and generate target specific code. The HSTREAM runtime
automatically distributes the workload across CPUs and accelerating devices. We
demonstrate the usefulness of HSTREAM language extension with various
applications from the STREAM benchmark. Experimental evaluation results show
that HSTREAM can keep the same programming simplicity as OpenMP, and the
generated code can deliver performance beyond what CPUs-only and GPUs-only
executions can deliver.Comment: Preprint, 21st IEEE International Conference on Computational Science
and Engineering (CSE 2018
Using Cognitive Computing for Learning Parallel Programming: An IBM Watson Solution
While modern parallel computing systems provide high performance resources,
utilizing them to the highest extent requires advanced programming expertise.
Programming for parallel computing systems is much more difficult than
programming for sequential systems. OpenMP is an extension of C++ programming
language that enables to express parallelism using compiler directives. While
OpenMP alleviates parallel programming by reducing the lines of code that the
programmer needs to write, deciding how and when to use these compiler
directives is up to the programmer. Novice programmers may make mistakes that
may lead to performance degradation or unexpected program behavior. Cognitive
computing has shown impressive results in various domains, such as health or
marketing. In this paper, we describe the use of IBM Watson cognitive system
for education of novice parallel programmers. Using the dialogue service of the
IBM Watson we have developed a solution that assists the programmer in avoiding
common OpenMP mistakes. To evaluate our approach we have conducted a survey
with a number of novice parallel programmers at the Linnaeus University, and
obtained encouraging results with respect to usefulness of our approach
PAWS: A performance evaluation tool for parallel computing systems
A description is given of PAWS (parallel assessment window system), a set of tools that provides an interactive user-friendly environment for analysis of existing, prototype, and conceptual machine architectures running a common application. PAWS consists of an application tool, an architectural characterization tool, a performance assessment tool, and an interactive graphical display tool. The application characterization tool provides a facility for evaluating the level and degree of an application's parallelism. The architecture characterization tool allows users to create, store, and retrieve descriptions of machines in a database. This approach permits users to evaluate conceptual machines before building any hardware. The performance assessment tool generates profile plots through the interactive graphical display tool. It shows both the ideal parallelism inherent in the machine-independent dataflow graph and
Novel kinetic consistent 3d mhd algorithm for high performance parallel computing systems
The impressive progress of the kinetic consistent schemes in the solution of the gas
dynamics problems and the development of the effective parallel algorithms for the modern
high performance parallel computing systems lead to the development of advanced methods
for the solution of the magnetohydrodynamics problems for plasma physics. The novel
feature of the method is the formulation of the complex Boltzmann-like distribution function
of the kinetic method with the implementation of the electromagnetic interaction term. The
numerical method is based on the explicit schemes, due to the logical simplicity and high
efficiency of the algorithm and the easy adaptation to the modern high performance parallel
computing systems
Algorithms for solving inverse geophysical problems on parallel computing systems
For solving inverse gravimetry problems, efficient stable parallel algorithms based on iterative gradient methods are proposed. For solving systems of linear algebraic equations with block-tridiagonal matrices arising in geoelectrics problems, a parallel matrix sweep algorithm, a square root method, and a conjugate gradient method with preconditioner are proposed. The algorithms are implemented numerically on a parallel computing system of the Institute of Mathematics and Mechanics (PCS-IMM), NVIDIA graphics processors, and an Intel multi-core CPU with some new computing technologies. The parallel algorithms are incorporated into a system of remote computations entitled "Specialized Web-Portal for Solving Geophysical Problems on Multiprocessor Computers." Some problems with "quasi-model" and real data are solved. © 2013 Pleiades Publishing, Ltd
Voltage, throughput, power, reliability, and multicore scaling
This article studies the interplay between the performance, energy, and reliability (PER) of parallel-computing systems. It describes methods supporting the meaningful cross-platform analysis of this interplay. These methods lead to the PER software tool, which helps designers analyze, compare, and explore these properties
Software-based fault-tolerant routing algorithm in multidimensional networks
Massively parallel computing systems are being built with hundreds or thousands of components such as nodes, links, memories, and connectors. The failure of a component in such systems will not only reduce the computational power but also alter the network's topology. The software-based fault-tolerant routing algorithm is a popular routing to achieve fault-tolerance capability in networks. This algorithm is initially proposed only for two dimensional networks (Suh et al., 2000). Since, higher dimensional networks have been widely employed in many contemporary massively parallel systems; this paper proposes an approach to extend this routing scheme to these indispensable higher dimensional networks. Deadlock and livelock freedom and the performance of presented algorithm, have been investigated for networks with different dimensionality and various fault regions. Furthermore, performance results have been presented through simulation experiments
THE FEASIBILITY STUDY OF RUNNING HPC WORKLOADS ON COMPUTATIONAL CLOUDS
High-performance computing (HPC) applications require high-end computing systems, but not all scientists have access to such powerful systems. Cloud computing provides an opportunity to run these applications on the cloud without the requirement of investing in high-end parallel computing systems. We can analyze the performance of the HPC applications on private as well as public clouds. The performance of the workload on the cloud can be calculated using different benchmarking tools such as NAS parallel benchmarking and Rally. The workloads of HPC applications require use of many parallel computing systems to be run on a physical setup, but this facility is available on cloud computing environment without the need of investing in physical machines. We aim to analyze the ability of the cloud to perform well when running HPC workloads. We shall get the detailed performance of the cloud when running these applications on a private cloud and find the pros and cons of running HPC workloads on cloud environment
- …