757 research outputs found

    64-bit architechtures and compute clusters for high performance simulations

    Get PDF
    Simulation of large complex systems remains one of the most demanding of high performance computer systems both in terms of raw compute performance and efficient memory management. Recent availability of 64-bit architectures has opened up the possibilities of commodity computers accessing more than the 4 Gigabyte memory limit previously enforced by 32-bit addressing. We report on some performance measurements we have made on two 64-bit architectures and their consequences for some high performance simulations. We discuss performance of our codes for simulations of artificial life models; computational physics models of point particles on lattices; and with interacting clusters of particles. We have summarised pertinent features of these codes into benchmark kernels which we discuss in the context of wellknown benchmark kernels of the 32-bit era. We report on how these these findings were useful in the context of designing 64-bit compute clusters for high-performance simulations

    GPU Accelerated Approach to Numerical Linear Algebra and Matrix Analysis with CFD Applications

    Get PDF
    A GPU accelerated approach to numerical linear algebra and matrix analysis with CFD applications is presented. The works objectives are to (1) develop stable and efficient algorithms utilizing multiple NVIDIA GPUs with CUDA to accelerate common matrix computations, (2) optimize these algorithms through CPU/GPU memory allocation, GPU kernel development, CPU/GPU communication, data transfer and bandwidth control to (3) develop parallel CFD applications for Navier Stokes and Lattice Boltzmann analysis methods. Special consideration will be given to performing the linear algebra algorithms under certain matrix types (banded, dense, diagonal, sparse, symmetric and triangular). Benchmarks are performed for all analyses with baseline CPU times being determined to find speed-up factors and measure computational capability of the GPU accelerated algorithms. The GPU implemented algorithms used in this work along with the optimization techniques performed are measured against preexisting work and test matrices available in the NIST Matrix Market. CFD analysis looked to strengthen the assessment of this work by providing a direct engineering application to analysis that would benefit from matrix optimization techniques and accelerated algorithms. Overall, this work desired to develop optimization for selected linear algebra and matrix computations performed with modern GPU architectures and CUDA developer which were applied directly to mathematical and engineering applications through CFD analysis

    On the Minimization of Convex Functionals of Probability Distributions Under Band Constraints

    Full text link
    The problem of minimizing convex functionals of probability distributions is solved under the assumption that the density of every distribution is bounded from above and below. A system of sufficient and necessary first-order optimality conditions as well as a bound on the optimality gap of feasible candidate solutions are derived. Based on these results, two numerical algorithms are proposed that iteratively solve the system of optimality conditions on a grid of discrete points. Both algorithms use a block coordinate descent strategy and terminate once the optimality gap falls below the desired tolerance. While the first algorithm is conceptually simpler and more efficient, it is not guaranteed to converge for objective functions that are not strictly convex. This shortcoming is overcome in the second algorithm, which uses an additional outer proximal iteration, and, which is proven to converge under mild assumptions. Two examples are given to demonstrate the theoretical usefulness of the optimality conditions as well as the high efficiency and accuracy of the proposed numerical algorithms.Comment: 13 pages, 5 figures, 2 tables, published in the IEEE Transactions on Signal Processing. In previous versions, the example in Section VI.B contained some mistakes and inaccuracies, which have been fixed in this versio

    Accelerating Pattern Recognition Algorithms On Parallel Computing Architectures

    Get PDF
    The move to more parallel computing architectures places more responsibility on the programmer to achieve greater performance. The programmer must now have a greater understanding of the underlying architecture and the inherent algorithmic parallelism. Using parallel computing architectures for exploiting algorithmic parallelism can be a complex task. This dissertation demonstrates various techniques for using parallel computing architectures to exploit algorithmic parallelism. Specifically, three pattern recognition (PR) approaches are examined for acceleration across multiple parallel computing architectures, namely field programmable gate arrays (FPGAs) and general purpose graphical processing units (GPGPUs). Phase-only filter correlation for fingerprint identification was studied as the first PR approach. This approach\u27s sensitivity to angular rotations, scaling, and missing data was surveyed. Additionally, a novel FPGA implementation of this algorithm was created using fixed point computations, deep pipelining, and four computation phases. Communication and computation were overlapped to efficiently process large fingerprint galleries. The FPGA implementation showed approximately a 47 times speedup over a central processing unit (CPU) implementation with negligible impact on precision. For the second PR approach, a spiking neural network (SNN) algorithm for a character recognition application was examined. A novel FPGA implementation of the approach was developed incorporating a scalable modular SNN processing element (PE) to efficiently perform neural computations. The modular SNN PE incorporated streaming memory, fixed point computation, and deep pipelining. This design showed speedups of approximately 3.3 and 8.5 times over CPU implementations for 624 and 9,264 sized neural networks, respectively. Results indicate that the PE design could scale to process larger sized networks easily. Finally for the third PR approach, cellular simultaneous recurrent networks (CSRNs) were investigated for GPGPU acceleration. Particularly, the applications of maze traversal and face recognition were studied. Novel GPGPU implementations were developed employing varying quantities of task-level, data-level, and instruction-level parallelism to achieve efficient runtime performance. Furthermore, the performance of the face recognition application was examined across a heterogeneous cluster of multi-core and GPGPU architectures. A combination of multi-core processors and GPGPUs achieved roughly a 996 times speedup over a single-core CPU implementation. From examining these PR approaches for acceleration, this dissertation presents useful techniques and insight applicable to other algorithms to improve performance when designing a parallel implementation

    並列計算アクセラレータへの効率的なアプリケーションマッピングに関する研究

    Get PDF
    長崎大学学位論文 学位記番号:博(工)甲第3号 学位授与年月日:平成26年3月20日Nagasaki University (長崎大学)課程博

    Research and Education in Computational Science and Engineering

    Get PDF
    Over the past two decades the field of computational science and engineering (CSE) has penetrated both basic and applied research in academia, industry, and laboratories to advance discovery, optimize systems, support decision-makers, and educate the scientific and engineering workforce. Informed by centuries of theory and experiment, CSE performs computational experiments to answer questions that neither theory nor experiment alone is equipped to answer. CSE provides scientists and engineers of all persuasions with algorithmic inventions and software systems that transcend disciplines and scales. Carried on a wave of digital technology, CSE brings the power of parallelism to bear on troves of data. Mathematics-based advanced computing has become a prevalent means of discovery and innovation in essentially all areas of science, engineering, technology, and society; and the CSE community is at the core of this transformation. However, a combination of disruptive developments---including the architectural complexity of extreme-scale computing, the data revolution that engulfs the planet, and the specialization required to follow the applications to new frontiers---is redefining the scope and reach of the CSE endeavor. This report describes the rapid expansion of CSE and the challenges to sustaining its bold advances. The report also presents strategies and directions for CSE research and education for the next decade.Comment: Major revision, to appear in SIAM Revie
    corecore