2,405 research outputs found
Parallelizing the QUDA Library for Multi-GPU Calculations in Lattice Quantum Chromodynamics
Graphics Processing Units (GPUs) are having a transformational effect on
numerical lattice quantum chromodynamics (LQCD) calculations of importance in
nuclear and particle physics. The QUDA library provides a package of mixed
precision sparse matrix linear solvers for LQCD applications, supporting single
GPUs based on NVIDIA's Compute Unified Device Architecture (CUDA). This
library, interfaced to the QDP++/Chroma framework for LQCD calculations, is
currently in production use on the "9g" cluster at the Jefferson Laboratory,
enabling unprecedented price/performance for a range of problems in LQCD.
Nevertheless, memory constraints on current GPU devices limit the problem sizes
that can be tackled. In this contribution we describe the parallelization of
the QUDA library onto multiple GPUs using MPI, including strategies for the
overlapping of communication and computation. We report on both weak and strong
scaling for up to 32 GPUs interconnected by InfiniBand, on which we sustain in
excess of 4 Tflops.Comment: 11 pages, 7 figures, to appear in the Proceedings of Supercomputing
2010 (submitted April 12, 2010
Studying the effect of parallelization on the performance of Andromeda Search Engine: A search engine for peptides
Human body is made of proteins. The analysis of structure and functions of these proteins reveal important information about human body. An important technique used for protein evaluation is Mass Spectrometry. The protein data generated using mass spectrometer is analyzed for the detection of patterns in proteins. A wide variety of operations are performed on the data obtained from a mass spectrometer namely visualization, spectral deconvolution, peak alignment, normalization, pattern recognition and significance testing. There are a number of software that analyze the huge volume of data generated from a mass spectrometer. An example of such a software is MaxQuant that analyzes high resolution mass spectrometric data. A search engine called Andromeda is integrated into MaxQuant that is used for peptide identification. ^ One major drawback of the Andromeda Search Engine is its execution time. Identification of peptides involves a number of complex operations and intensive data processing. Therefore this research work focuses on implementing parallelization as a way to improve the performance of the Andromeda Search Engine. This is done by partitioning the data and distributing it across various cores and nodes. Also multiple tasks are executed concurrently on multiple nodes and cores. ^ A number of bioinformatics applications have been parallelized with significant improvement in execution time over the serial version. For this research work Task Parallel Library (TPL) and Common Library Runtime (CLR) constructs are used for parallelizing the application. The aim of this research work is to implement these techniques to parallelize the Andromeda Search Engine and gain improvement in the execution time by leveraging multi core architecture
Parallel processing and expert systems
Whether it be monitoring the thermal subsystem of Space Station Freedom, or controlling the navigation of the autonomous rover on Mars, NASA missions in the 90's cannot enjoy an increased level of autonomy without the efficient use of expert systems. Merely increasing the computational speed of uniprocessors may not be able to guarantee that real time demands are met for large expert systems. Speed-up via parallel processing must be pursued alongside the optimization of sequential implementations. Prototypes of parallel expert systems have been built at universities and industrial labs in the U.S. and Japan. The state-of-the-art research in progress related to parallel execution of expert systems was surveyed. The survey is divided into three major sections: (1) multiprocessors for parallel expert systems; (2) parallel languages for symbolic computations; and (3) measurements of parallelism of expert system. Results to date indicate that the parallelism achieved for these systems is small. In order to obtain greater speed-ups, data parallelism and application parallelism must be exploited
- …