19,619 research outputs found
The Parallelism Motifs of Genomic Data Analysis
Genomic data sets are growing dramatically as the cost of sequencing
continues to decline and small sequencing devices become available. Enormous
community databases store and share this data with the research community, but
some of these genomic data analysis problems require large scale computational
platforms to meet both the memory and computational requirements. These
applications differ from scientific simulations that dominate the workload on
high end parallel systems today and place different requirements on programming
support, software libraries, and parallel architectural design. For example,
they involve irregular communication patterns such as asynchronous updates to
shared data structures. We consider several problems in high performance
genomics analysis, including alignment, profiling, clustering, and assembly for
both single genomes and metagenomes. We identify some of the common
computational patterns or motifs that help inform parallelization strategies
and compare our motifs to some of the established lists, arguing that at least
two key patterns, sorting and hashing, are missing
Turbo NOC: a framework for the design of Network On Chip based turbo decoder architectures
This work proposes a general framework for the design and simulation of
network on chip based turbo decoder architectures. Several parameters in the
design space are investigated, namely the network topology, the parallelism
degree, the rate at which messages are sent by processing nodes over the
network and the routing strategy. The main results of this analysis are: i) the
most suited topologies to achieve high throughput with a limited complexity
overhead are generalized de-Bruijn and generalized Kautz topologies; ii)
depending on the throughput requirements different parallelism degrees, message
injection rates and routing algorithms can be used to minimize the network area
overhead.Comment: submitted to IEEE Trans. on Circuits and Systems I (submission date
27 may 2009
Introduction to a system for implementing neural net connections on SIMD architectures
Neural networks have attracted much interest recently, and using parallel architectures to simulate neural networks is a natural and necessary application. The SIMD model of parallel computation is chosen, because systems of this type can be built with large numbers of processing elements. However, such systems are not naturally suited to generalized communication. A method is proposed that allows an implementation of neural network connections on massively parallel SIMD architectures. The key to this system is an algorithm permitting the formation of arbitrary connections between the neurons. A feature is the ability to add new connections quickly. It also has error recovery ability and is robust over a variety of network topologies. Simulations of the general connection system, and its implementation on the Connection Machine, indicate that the time and space requirements are proportional to the product of the average number of connections per neuron and the diameter of the interconnection network
An Enhanced Multiway Sorting Network Based on n-Sorters
Merging-based sorting networks are an important family of sorting networks.
Most merge sorting networks are based on 2-way or multi-way merging algorithms
using 2-sorters as basic building blocks. An alternative is to use n-sorters,
instead of 2-sorters, as the basic building blocks so as to greatly reduce the
number of sorters as well as the latency. Based on a modified Leighton's
columnsort algorithm, an n-way merging algorithm, referred to as SS-Mk, that
uses n-sorters as basic building blocks was proposed. In this work, we first
propose a new multiway merging algorithm with n-sorters as basic building
blocks that merges n sorted lists of m values each in 1 + ceil(m/2) stages (n
<= m). Based on our merging algorithm, we also propose a sorting algorithm,
which requires O(N log2 N) basic sorters to sort N inputs. While the asymptotic
complexity (in terms of the required number of sorters) of our sorting
algorithm is the same as the SS-Mk, for wide ranges of N, our algorithm
requires fewer sorters than the SS-Mk. Finally, we consider a binary sorting
network, where the basic sorter is implemented in threshold logic and scales
linearly with the number of inputs, and compare the complexity in terms of the
required number of gates. For wide ranges of N, our algorithm requires fewer
gates than the SS-Mk.Comment: 13 pages, 14 figure
- …