12,180 research outputs found
The Parallelism Motifs of Genomic Data Analysis
Genomic data sets are growing dramatically as the cost of sequencing
continues to decline and small sequencing devices become available. Enormous
community databases store and share this data with the research community, but
some of these genomic data analysis problems require large scale computational
platforms to meet both the memory and computational requirements. These
applications differ from scientific simulations that dominate the workload on
high end parallel systems today and place different requirements on programming
support, software libraries, and parallel architectural design. For example,
they involve irregular communication patterns such as asynchronous updates to
shared data structures. We consider several problems in high performance
genomics analysis, including alignment, profiling, clustering, and assembly for
both single genomes and metagenomes. We identify some of the common
computational patterns or motifs that help inform parallelization strategies
and compare our motifs to some of the established lists, arguing that at least
two key patterns, sorting and hashing, are missing
Asynchronous iterative computations with Web information retrieval structures: The PageRank case
There are several ideas being used today for Web information retrieval, and
specifically in Web search engines. The PageRank algorithm is one of those that
introduce a content-neutral ranking function over Web pages. This ranking is
applied to the set of pages returned by the Google search engine in response to
posting a search query. PageRank is based in part on two simple common sense
concepts: (i)A page is important if many important pages include links to it.
(ii)A page containing many links has reduced impact on the importance of the
pages it links to. In this paper we focus on asynchronous iterative schemes to
compute PageRank over large sets of Web pages. The elimination of the
synchronizing phases is expected to be advantageous on heterogeneous platforms.
The motivation for a possible move to such large scale distributed platforms
lies in the size of matrices representing Web structure. In orders of
magnitude: pages with nonzero elements and bytes
just to store a small percentage of the Web (the already crawled); distributed
memory machines are necessary for such computations. The present research is
part of our general objective, to explore the potential of asynchronous
computational models as an underlying framework for very large scale
computations over the Grid. The area of ``internet algorithmics'' appears to
offer many occasions for computations of unprecedent dimensionality that would
be good candidates for this framework.Comment: 8 pages to appear at ParCo2005 Conference Proceeding
- …