1,192 research outputs found
A Domain Decomposition Strategy for Alignment of Multiple Biological Sequences on Multiprocessor Platforms
Multiple Sequences Alignment (MSA) of biological sequences is a fundamental
problem in computational biology due to its critical significance in wide
ranging applications including haplotype reconstruction, sequence homology,
phylogenetic analysis, and prediction of evolutionary origins. The MSA problem
is considered NP-hard and known heuristics for the problem do not scale well
with increasing number of sequences. On the other hand, with the advent of new
breed of fast sequencing techniques it is now possible to generate thousands of
sequences very quickly. For rapid sequence analysis, it is therefore desirable
to develop fast MSA algorithms that scale well with the increase in the dataset
size. In this paper, we present a novel domain decomposition based technique to
solve the MSA problem on multiprocessing platforms. The domain decomposition
based technique, in addition to yielding better quality, gives enormous
advantage in terms of execution time and memory requirements. The proposed
strategy allows to decrease the time complexity of any known heuristic of
O(N)^x complexity by a factor of O(1/p)^x, where N is the number of sequences,
x depends on the underlying heuristic approach, and p is the number of
processing nodes. In particular, we propose a highly scalable algorithm,
Sample-Align-D, for aligning biological sequences using Muscle system as the
underlying heuristic. The proposed algorithm has been implemented on a cluster
of workstations using MPI library. Experimental results for different problem
sizes are analyzed in terms of quality of alignment, execution time and
speed-up.Comment: 36 pages, 17 figures, Accepted manuscript in Journal of Parallel and
Distributed Computing(JPDC
Parallel Homologous Search With Hirschberg Algorithm: A Hybrid MPI-Pthreads Solution.
In this paper, we apply two different parallel programming model, the message passing model using Message Passing Interface (MPI) and the multithreaded model using Pthreads, to-protein sequence homologous search. The protein sequence homologous search uses Hirschberg algorithm for the pair-wise
sequence alignment
Parallel Architectures for Planetary Exploration Requirements (PAPER)
The Parallel Architectures for Planetary Exploration Requirements (PAPER) project is essentially research oriented towards technology insertion issues for NASA's unmanned planetary probes. It was initiated to complement and augment the long-term efforts for space exploration with particular reference to NASA/LaRC's (NASA Langley Research Center) research needs for planetary exploration missions of the mid and late 1990s. The requirements for space missions as given in the somewhat dated Advanced Information Processing Systems (AIPS) requirements document are contrasted with the new requirements from JPL/Caltech involving sensor data capture and scene analysis. It is shown that more stringent requirements have arisen as a result of technological advancements. Two possible architectures, the AIPS Proof of Concept (POC) configuration and the MAX Fault-tolerant dataflow multiprocessor, were evaluated. The main observation was that the AIPS design is biased towards fault tolerance and may not be an ideal architecture for planetary and deep space probes due to high cost and complexity. The MAX concepts appears to be a promising candidate, except that more detailed information is required. The feasibility for adding neural computation capability to this architecture needs to be studied. Key impact issues for architectural design of computing systems meant for planetary missions were also identified
Modeling Algorithm Performance on Highly-threaded Many-core Architectures
The rapid growth of data processing required in various arenas of computation over the past decades necessitates extensive use of parallel computing engines. Among those, highly-threaded many-core machines, such as GPUs have become increasingly popular for accelerating a diverse range of data-intensive applications. They feature a large number of hardware threads with low-overhead context switches to hide the memory access latencies and therefore provide high computational throughput. However, understanding and harnessing such machines places great challenges on algorithm designers and performance tuners due to the complex interaction of threads and hierarchical memory subsystems of these machines. The achieved performance jointly depends on the parallelism exploited by the algorithm, the effectiveness of latency hiding, and the utilization of multiprocessors (occupancy). Contemporary work tries to model the performance of GPUs from various aspects with different emphasis and granularity. However, no model considers all of these factors together at the same time.
This dissertation presents an analytical framework that jointly addresses parallelism, latency-hiding, and occupancy for both theoretical and empirical performance analysis of algorithms on highly-threaded many-core machines so that it can guide both algorithm design and performance tuning. In particular, this framework not only helps to explore and reduce the runtime configuration space for tuning kernel execution on GPUs, but also reflects performance bottlenecks and predicts how the runtime will trend as the problem and other parameters scale. The framework consists of a pair of analytical models with one focusing on higher-level asymptotic algorithm performance on GPUs and the other one emphasizing lower-level details about scheduling and runtime configuration. Based on the two models, we have conducted extensive analysis of a large set of algorithms. Two analysis provides interesting results and explains previously unexplained data. In addition, the two models are further bridged and combined as a consistent framework. The framework is able to provide an end-to-end methodology for algorithm design, evaluation, comparison, implementation, and prediction of real runtime on GPUs fairly accurately.
To demonstrate the viability of our methods, the models are validated through data from implementations of a variety of classic algorithms, including hashing, Bloom filters, all-pairs shortest path, matrix multiplication, FFT, merge sort, list ranking, string matching via suffix tree/array, etc. We evaluate the models\u27 performance across a wide spectrum of parameters, data values, and machines. The results indicate that the models can be effectively used for algorithm performance analysis and runtime prediction on highly-threaded many-core machines
On Longest Repeat Queries Using GPU
Repeat finding in strings has important applications in subfields such as
computational biology. The challenge of finding the longest repeats covering
particular string positions was recently proposed and solved by \.{I}leri et
al., using a total of the optimal time and space, where is the
string size. However, their solution can only find the \emph{leftmost} longest
repeat for each of the string position. It is also not known how to
parallelize their solution. In this paper, we propose a new solution for
longest repeat finding, which although is theoretically suboptimal in time but
is conceptually simpler and works faster and uses less memory space in practice
than the optimal solution. Further, our solution can find \emph{all} longest
repeats of every string position, while still maintaining a faster processing
speed and less memory space usage. Moreover, our solution is
\emph{parallelizable} in the shared memory architecture (SMA), enabling it to
take advantage of the modern multi-processor computing platforms such as the
general-purpose graphics processing units (GPU). We have implemented both the
sequential and parallel versions of our solution. Experiments with both
biological and non-biological data show that our sequential and parallel
solutions are faster than the optimal solution by a factor of 2--3.5 and 6--14,
respectively, and use less memory space.Comment: 14 page
Simulation of P systems with active membranes on CUDA
P systems or Membrane Systems provide a high-level computational modelling framework that
combines the structure and dynamic aspects of biological systems in a relevant and understandable way.
They are inherently parallel and non-deterministic computing devices. In this article, we discuss the
motivation, design principles and key of the implementation of a simulator for the class of recognizer P
systems with active membranes running on a (GPU). We compare our parallel simulator for GPUs to the
simulator developed for a single central processing unit (CPU), showing that GPUs are better suited than
CPUs to simulate P systems due to their highly parallel nature.Ministerio de Educación y Ciencia TIN2006-13425Junta de Andalucía P08–TIC-0420
- …