1,642 research outputs found

    HSTREAM: A directive-based language extension for heterogeneous stream computing

    Full text link
    Big data streaming applications require utilization of heterogeneous parallel computing systems, which may comprise multiple multi-core CPUs and many-core accelerating devices such as NVIDIA GPUs and Intel Xeon Phis. Programming such systems require advanced knowledge of several hardware architectures and device-specific programming models, including OpenMP and CUDA. In this paper, we present HSTREAM, a compiler directive-based language extension to support programming stream computing applications for heterogeneous parallel computing systems. HSTREAM source-to-source compiler aims to increase the programming productivity by enabling programmers to annotate the parallel regions for heterogeneous execution and generate target specific code. The HSTREAM runtime automatically distributes the workload across CPUs and accelerating devices. We demonstrate the usefulness of HSTREAM language extension with various applications from the STREAM benchmark. Experimental evaluation results show that HSTREAM can keep the same programming simplicity as OpenMP, and the generated code can deliver performance beyond what CPUs-only and GPUs-only executions can deliver.Comment: Preprint, 21st IEEE International Conference on Computational Science and Engineering (CSE 2018

    Using Cognitive Computing for Learning Parallel Programming: An IBM Watson Solution

    Full text link
    While modern parallel computing systems provide high performance resources, utilizing them to the highest extent requires advanced programming expertise. Programming for parallel computing systems is much more difficult than programming for sequential systems. OpenMP is an extension of C++ programming language that enables to express parallelism using compiler directives. While OpenMP alleviates parallel programming by reducing the lines of code that the programmer needs to write, deciding how and when to use these compiler directives is up to the programmer. Novice programmers may make mistakes that may lead to performance degradation or unexpected program behavior. Cognitive computing has shown impressive results in various domains, such as health or marketing. In this paper, we describe the use of IBM Watson cognitive system for education of novice parallel programmers. Using the dialogue service of the IBM Watson we have developed a solution that assists the programmer in avoiding common OpenMP mistakes. To evaluate our approach we have conducted a survey with a number of novice parallel programmers at the Linnaeus University, and obtained encouraging results with respect to usefulness of our approach

    PAWS: A performance evaluation tool for parallel computing systems

    Get PDF
    A description is given of PAWS (parallel assessment window system), a set of tools that provides an interactive user-friendly environment for analysis of existing, prototype, and conceptual machine architectures running a common application. PAWS consists of an application tool, an architectural characterization tool, a performance assessment tool, and an interactive graphical display tool. The application characterization tool provides a facility for evaluating the level and degree of an application's parallelism. The architecture characterization tool allows users to create, store, and retrieve descriptions of machines in a database. This approach permits users to evaluate conceptual machines before building any hardware. The performance assessment tool generates profile plots through the interactive graphical display tool. It shows both the ideal parallelism inherent in the machine-independent dataflow graph and

    Novel kinetic consistent 3d mhd algorithm for high performance parallel computing systems

    Get PDF
    The impressive progress of the kinetic consistent schemes in the solution of the gas dynamics problems and the development of the effective parallel algorithms for the modern high performance parallel computing systems lead to the development of advanced methods for the solution of the magnetohydrodynamics problems for plasma physics. The novel feature of the method is the formulation of the complex Boltzmann-like distribution function of the kinetic method with the implementation of the electromagnetic interaction term. The numerical method is based on the explicit schemes, due to the logical simplicity and high efficiency of the algorithm and the easy adaptation to the modern high performance parallel computing systems

    Algorithms for solving inverse geophysical problems on parallel computing systems

    Full text link
    For solving inverse gravimetry problems, efficient stable parallel algorithms based on iterative gradient methods are proposed. For solving systems of linear algebraic equations with block-tridiagonal matrices arising in geoelectrics problems, a parallel matrix sweep algorithm, a square root method, and a conjugate gradient method with preconditioner are proposed. The algorithms are implemented numerically on a parallel computing system of the Institute of Mathematics and Mechanics (PCS-IMM), NVIDIA graphics processors, and an Intel multi-core CPU with some new computing technologies. The parallel algorithms are incorporated into a system of remote computations entitled "Specialized Web-Portal for Solving Geophysical Problems on Multiprocessor Computers." Some problems with "quasi-model" and real data are solved. © 2013 Pleiades Publishing, Ltd

    Voltage, throughput, power, reliability, and multicore scaling

    Get PDF
    This article studies the interplay between the performance, energy, and reliability (PER) of parallel-computing systems. It describes methods supporting the meaningful cross-platform analysis of this interplay. These methods lead to the PER software tool, which helps designers analyze, compare, and explore these properties

    Software-based fault-tolerant routing algorithm in multidimensional networks

    Get PDF
    Massively parallel computing systems are being built with hundreds or thousands of components such as nodes, links, memories, and connectors. The failure of a component in such systems will not only reduce the computational power but also alter the network's topology. The software-based fault-tolerant routing algorithm is a popular routing to achieve fault-tolerance capability in networks. This algorithm is initially proposed only for two dimensional networks (Suh et al., 2000). Since, higher dimensional networks have been widely employed in many contemporary massively parallel systems; this paper proposes an approach to extend this routing scheme to these indispensable higher dimensional networks. Deadlock and livelock freedom and the performance of presented algorithm, have been investigated for networks with different dimensionality and various fault regions. Furthermore, performance results have been presented through simulation experiments

    THE FEASIBILITY STUDY OF RUNNING HPC WORKLOADS ON COMPUTATIONAL CLOUDS

    Get PDF
    High-performance computing (HPC) applications require high-end computing systems, but not all scientists have access to such powerful systems. Cloud computing provides an opportunity to run these applications on the cloud without the requirement of investing in high-end parallel computing systems. We can analyze the performance of the HPC applications on private as well as public clouds. The performance of the workload on the cloud can be calculated using different benchmarking tools such as NAS parallel benchmarking and Rally. The workloads of HPC applications require use of many parallel computing systems to be run on a physical setup, but this facility is available on cloud computing environment without the need of investing in physical machines. We aim to analyze the ability of the cloud to perform well when running HPC workloads. We shall get the detailed performance of the cloud when running these applications on a private cloud and find the pros and cons of running HPC workloads on cloud environment
    corecore