10,328 research outputs found
Parallel local search for solving Constraint Problems on the Cell Broadband Engine (Preliminary Results)
We explore the use of the Cell Broadband Engine (Cell/BE for short) for
combinatorial optimization applications: we present a parallel version of a
constraint-based local search algorithm that has been implemented on a
multiprocessor BladeCenter machine with twin Cell/BE processors (total of 16
SPUs per blade). This algorithm was chosen because it fits very well the
Cell/BE architecture and requires neither shared memory nor communication
between processors, while retaining a compact memory footprint. We study the
performance on several large optimization benchmarks and show that this
achieves mostly linear time speedups, even sometimes super-linear. This is
possible because the parallel implementation might explore simultaneously
different parts of the search space and therefore converge faster towards the
best sub-space and thus towards a solution. Besides getting speedups, the
resulting times exhibit a much smaller variance, which benefits applications
where a timely reply is critical
High-Precision Numerical Simulations of Rotating Black Holes Accelerated by CUDA
Hardware accelerators (such as Nvidia's CUDA GPUs) have tremendous promise
for computational science, because they can deliver large gains in performance
at relatively low cost. In this work, we focus on the use of Nvidia's Tesla GPU
for high-precision (double, quadruple and octal precision) numerical
simulations in the area of black hole physics -- more specifically, solving a
partial-differential-equation using finite-differencing. We describe our
approach in detail and present the final performance results as compared with a
single-core desktop processor and also the Cell BE. We obtain mixed results --
order-of-magnitude gains in overall performance in some cases and negligible
gains in others.Comment: 6 pages, 1 figure, 1 table, Accepted for publication in the
International Conference on High Performance Computing Systems (HPCS 2010
The CBE Hardware Accelerator for Numerical Relativity: A Simple Approach
Hardware accelerators (such as the Cell Broadband Engine) have recently
received a significant amount of attention from the computational science
community because they can provide significant gains in the overall performance
of many numerical simulations at a low cost. However, such accelerators usually
employ a rather unfamiliar and specialized programming model that often
requires advanced knowledge of their hardware design. In this article, we
demonstrate an alternate and simpler approach towards managing the main
complexities in the programming of the Cell processor, called software caching.
We apply this technique to a numerical relativity application: a time-domain,
finite-difference Kerr black hole perturbation evolver, and present the
performance results. We obtain gains in the overall performance of generic
simulations that are close to the theoretical maximum that can be obtained
through our parallelization approach.Comment: 5 pages, 2 figures; Accepted for publication in the International
Journal of Modeling, Simulation, and Scientific Computing (IJMSSC
A Pure Java Parallel Flow Solver
In this paper an overview is given on the "Have Java" project to attain a pure Java parallel Navier-Stokes flow solver (JParNSS) based on the thread concept and remote method invocation (RMI). The goal of this project is to produce an industrial flow solver running on an arbitrary sequential or parallel architecture, utilizing the Internet, capable of handling the most complex 3D geometries as well as flow physics, and also linking to codes in other areas such as aeroelasticity etc.
Since Java is completely object-oriented the code has been written in an object-oriented programming (OOP) style. The code also includes a graphics user interface (GUI) as well as an interactive steering package for the parallel architecture. The Java OOP approach provides profoundly improved software productivity, robustness, and security as well as reusability and maintainability. OOP allows code construction similar to the aerodynamic design process because objects can be software coded and integrated, reflecting actual design procedures. In addition, Java is the programming language of the Internet and thus Java is the programming language of the Internet and thus Java objects on disparate machines or even separate networks can be connected.
We explain the motivation for the design of JParNSS along with its capabilities that set it apart from other solvers. In the first two sections we present a discussion of the Java language as the programming tool for aerospace applications. In section three the objectives of the Have Java project are presented. In the next section the layer structures of JParNSS are discussed with emphasis on the parallelization and client-server (RMI) layers. JParNSS, like its predecessor ParNSS (ANSI-C), is based on the multiblock idea, and allows for arbitrarily complex topologies. Grids are accepted in GridPro property settings, grids of any size or block number can be directly read by JParNSS without any further modifications, requiring no additional preparation time for the solver input. In the last section, computational results are presented, with emphasis on multiprocessor Pentium and Sun parallel systems run by the Solaris operating system (OS)
PyCUDA and PyOpenCL: A Scripting-Based Approach to GPU Run-Time Code Generation
High-performance computing has recently seen a surge of interest in
heterogeneous systems, with an emphasis on modern Graphics Processing Units
(GPUs). These devices offer tremendous potential for performance and efficiency
in important large-scale applications of computational science. However,
exploiting this potential can be challenging, as one must adapt to the
specialized and rapidly evolving computing environment currently exhibited by
GPUs. One way of addressing this challenge is to embrace better techniques and
develop tools tailored to their needs. This article presents one simple
technique, GPU run-time code generation (RTCG), along with PyCUDA and PyOpenCL,
two open-source toolkits that support this technique.
In introducing PyCUDA and PyOpenCL, this article proposes the combination of
a dynamic, high-level scripting language with the massive performance of a GPU
as a compelling two-tiered computing platform, potentially offering significant
performance and productivity advantages over conventional single-tier, static
systems. The concept of RTCG is simple and easily implemented using existing,
robust infrastructure. Nonetheless it is powerful enough to support (and
encourage) the creation of custom application-specific tools by its users. The
premise of the paper is illustrated by a wide range of examples where the
technique has been applied with considerable success.Comment: Submitted to Parallel Computing, Elsevie
- …