10,328 research outputs found

    Parallel local search for solving Constraint Problems on the Cell Broadband Engine (Preliminary Results)

    Full text link
    We explore the use of the Cell Broadband Engine (Cell/BE for short) for combinatorial optimization applications: we present a parallel version of a constraint-based local search algorithm that has been implemented on a multiprocessor BladeCenter machine with twin Cell/BE processors (total of 16 SPUs per blade). This algorithm was chosen because it fits very well the Cell/BE architecture and requires neither shared memory nor communication between processors, while retaining a compact memory footprint. We study the performance on several large optimization benchmarks and show that this achieves mostly linear time speedups, even sometimes super-linear. This is possible because the parallel implementation might explore simultaneously different parts of the search space and therefore converge faster towards the best sub-space and thus towards a solution. Besides getting speedups, the resulting times exhibit a much smaller variance, which benefits applications where a timely reply is critical

    High-Precision Numerical Simulations of Rotating Black Holes Accelerated by CUDA

    Full text link
    Hardware accelerators (such as Nvidia's CUDA GPUs) have tremendous promise for computational science, because they can deliver large gains in performance at relatively low cost. In this work, we focus on the use of Nvidia's Tesla GPU for high-precision (double, quadruple and octal precision) numerical simulations in the area of black hole physics -- more specifically, solving a partial-differential-equation using finite-differencing. We describe our approach in detail and present the final performance results as compared with a single-core desktop processor and also the Cell BE. We obtain mixed results -- order-of-magnitude gains in overall performance in some cases and negligible gains in others.Comment: 6 pages, 1 figure, 1 table, Accepted for publication in the International Conference on High Performance Computing Systems (HPCS 2010

    The CBE Hardware Accelerator for Numerical Relativity: A Simple Approach

    Full text link
    Hardware accelerators (such as the Cell Broadband Engine) have recently received a significant amount of attention from the computational science community because they can provide significant gains in the overall performance of many numerical simulations at a low cost. However, such accelerators usually employ a rather unfamiliar and specialized programming model that often requires advanced knowledge of their hardware design. In this article, we demonstrate an alternate and simpler approach towards managing the main complexities in the programming of the Cell processor, called software caching. We apply this technique to a numerical relativity application: a time-domain, finite-difference Kerr black hole perturbation evolver, and present the performance results. We obtain gains in the overall performance of generic simulations that are close to the theoretical maximum that can be obtained through our parallelization approach.Comment: 5 pages, 2 figures; Accepted for publication in the International Journal of Modeling, Simulation, and Scientific Computing (IJMSSC

    A Pure Java Parallel Flow Solver

    Get PDF
    In this paper an overview is given on the "Have Java" project to attain a pure Java parallel Navier-Stokes flow solver (JParNSS) based on the thread concept and remote method invocation (RMI). The goal of this project is to produce an industrial flow solver running on an arbitrary sequential or parallel architecture, utilizing the Internet, capable of handling the most complex 3D geometries as well as flow physics, and also linking to codes in other areas such as aeroelasticity etc. Since Java is completely object-oriented the code has been written in an object-oriented programming (OOP) style. The code also includes a graphics user interface (GUI) as well as an interactive steering package for the parallel architecture. The Java OOP approach provides profoundly improved software productivity, robustness, and security as well as reusability and maintainability. OOP allows code construction similar to the aerodynamic design process because objects can be software coded and integrated, reflecting actual design procedures. In addition, Java is the programming language of the Internet and thus Java is the programming language of the Internet and thus Java objects on disparate machines or even separate networks can be connected. We explain the motivation for the design of JParNSS along with its capabilities that set it apart from other solvers. In the first two sections we present a discussion of the Java language as the programming tool for aerospace applications. In section three the objectives of the Have Java project are presented. In the next section the layer structures of JParNSS are discussed with emphasis on the parallelization and client-server (RMI) layers. JParNSS, like its predecessor ParNSS (ANSI-C), is based on the multiblock idea, and allows for arbitrarily complex topologies. Grids are accepted in GridPro property settings, grids of any size or block number can be directly read by JParNSS without any further modifications, requiring no additional preparation time for the solver input. In the last section, computational results are presented, with emphasis on multiprocessor Pentium and Sun parallel systems run by the Solaris operating system (OS)

    PyCUDA and PyOpenCL: A Scripting-Based Approach to GPU Run-Time Code Generation

    Full text link
    High-performance computing has recently seen a surge of interest in heterogeneous systems, with an emphasis on modern Graphics Processing Units (GPUs). These devices offer tremendous potential for performance and efficiency in important large-scale applications of computational science. However, exploiting this potential can be challenging, as one must adapt to the specialized and rapidly evolving computing environment currently exhibited by GPUs. One way of addressing this challenge is to embrace better techniques and develop tools tailored to their needs. This article presents one simple technique, GPU run-time code generation (RTCG), along with PyCUDA and PyOpenCL, two open-source toolkits that support this technique. In introducing PyCUDA and PyOpenCL, this article proposes the combination of a dynamic, high-level scripting language with the massive performance of a GPU as a compelling two-tiered computing platform, potentially offering significant performance and productivity advantages over conventional single-tier, static systems. The concept of RTCG is simple and easily implemented using existing, robust infrastructure. Nonetheless it is powerful enough to support (and encourage) the creation of custom application-specific tools by its users. The premise of the paper is illustrated by a wide range of examples where the technique has been applied with considerable success.Comment: Submitted to Parallel Computing, Elsevie
    corecore