Search CORE

103 research outputs found

Distribution-independent hierarchical N-body methods

Author: Aluru Srinivas
Publication venue: Iowa State University Digital Repository
Publication date: 01/01/1994
Field of study

The N-body problem is to simulate the motion of N particles under the influence of mutual force fields based on an inverse square Law; The problem has applications in several domains including astrophysics, molecular dynamics, fluid dynamics, radiosity methods in computer graphics and numerical complex analysis. Research efforts have focused on reducing the O(N[superscript]2) time per iteration required by the naive algorithm of computing each pairwise interaction. Widely respected among these are the Barnes-Hut and Greengard methods. Greengard claims his algorithm reduces the complexity to O(N) time per iteration;Throughout this thesis, we concentrate on rigorous, distribution-independent, worst-case analysis of the N-body methods. We show that Greengard\u27s algorithm is not O(N), as claimed. Both Barnes-Hut and Greengard\u27s methods depend on the same data structure, which we show is distribution-dependent. For the distribution that results in the smallest running time, we show that Greengard\u27s algorithm is [omega](N log[superscript]2N) in two dimensions and [omega](N log[superscript]4N) in three dimensions. Both algorithms are unbounded for arbitrary distributions;We have designed a hierarchical data structure whose size depends entirely upon the number of particles and is independent of the distribution of the particles. We show that both Greengard\u27s and Barnes-Hut algorithms can be used in conjunction with this data structure to reduce their complexity. Apart from reducing the complexity of the Barnes-Hut algorithm, the data structure also permits more accurate error estimation. We present two- and three-dimensional algorithms for creating the data structure. The multipole method designed using this data structure has a complexity of O(N log N) in two dimensions and O(N log[superscript]2 N) in three dimensions

Digital Repository @ Iowa State University (ISU)

Crossref

Greengard\u27s N-body Algorithm is not order N

Author: Srinivas Aluru
Publication venue: Iowa State University Digital Repository
Publication date: 30/12/1993
Field of study

Greengard\u27s N-body algorithm claims to compute the pairwise interactions in a system of N particles in O(N) time for a fixed precision. In this paper, we show that the choice of precision is not independent of N and has a lower bound of log N. We use this result to show that Greengard\u27s algorithm is not O(N)

Digital Repository @ Iowa State University (ISU)

A Parallel Algorithm for Exact Bayesian Structure Discovery in Bayesian Networks

Author: Jin Tian
Olga Nikolova
Sage Bionetworks
Srinivas Aluru
Yetian Chen
Publication venue
Publication date: 13/08/2016
Field of study

Exact Bayesian structure discovery in Bayesian networks requires exponential time and space. Using dynamic programming (DP), the fastest known sequential algorithm computes the exact posterior probabilities of structural features in

O(2(d+1)n2^n)

time and space, if the number of nodes (variables) in the Bayesian network is

n

and the in-degree (the number of parents) per node is bounded by a constant

d

. Here we present a parallel algorithm capable of computing the exact posterior probabilities for all

n(n-1)

edges with optimal parallel space efficiency and nearly optimal parallel time efficiency. That is, if

p=2^k

processors are used, the run-time reduces to

O(5(d+1)n2^{n-k}+k(n-k)^d)

and the space usage becomes

O(n2^{n-k})

per processor. Our algorithm is based the observation that the subproblems in the sequential DP algorithm constitute a

n

D

hypercube. We take a delicate way to coordinate the computation of correlated DP procedures such that large amount of data exchange is suppressed. Further, we develop parallel techniques for two variants of the well-known \emph{zeta transform}, which have applications outside the context of Bayesian networks. We demonstrate the capability of our algorithm on datasets with up to 33 variables and its scalability on up to 2048 processors. We apply our algorithm to a biological data set for discovering the yeast pheromone response pathways.Comment: 32 pages, 12 figure

arXiv.org e-Print Archive

CiteSeerX

Feasibility of Flow Decomposition with Subpath Constraints in Linear Time

Author: Aluru Srinivas
Gibney Daniel
Thankachan Sharma V.
Publication venue: LIPIcs - Leibniz International Proceedings in Informatics. 22nd International Workshop on Algorithms in Bioinformatics (WABI 2022)
Publication date: 01/01/2022
Field of study

Dagstuhl Research Online Publication Server

Validating Paired-End Read Alignments in Sequence Graphs

Author: Aluru Srinivas
Dilthey Alexander
Jain Chirag
Zhang Haowen
Publication venue: LIPIcs - Leibniz International Proceedings in Informatics. 19th International Workshop on Algorithms in Bioinformatics (WABI 2019)
Publication date: 01/01/2019
Field of study

Graph based non-linear reference structures such as variation graphs and colored de Bruijn graphs enable incorporation of full genomic diversity within a population. However, transitioning from a simple string-based reference to graphs requires addressing many computational challenges, one of which concerns accurately mapping sequencing read sets to graphs. Paired-end Illumina sequencing is a commonly used sequencing platform in genomics, where the paired-end distance constraints allow disambiguation of repeats. Many recent works have explored provably good index-based and alignment-based strategies for mapping individual reads to graphs. However, validating distance constraints efficiently over graphs is not trivial, and existing sequence to graph mappers rely on heuristics. We introduce a mathematical formulation of the problem, and provide a new algorithm to solve it exactly. We take advantage of the high sparsity of reference graphs, and use sparse matrix-matrix multiplications (SpGEMM) to build an index which can be queried efficiently by a mapping algorithm for validating the distance constraints. Effectiveness of the algorithm is demonstrated using real reference graphs, including a human MHC variation graph, and a pan-genome de-Bruijn graph built using genomes of 20 B. anthracis strains. While the one-time indexing time can vary from a few minutes to a few hours using our algorithm, answering a million distance queries takes less than a second

Dagstuhl Research Online Publication Server

Parallel Framework for Dimensionality Reduction of Large-Scale Datasets

Author: Aluru Srinivas
Ganapathysubramanian Baskar
Ganapathysubramanian Baskar
Samudrala Sai
Zola Jaroslaw
Publication venue: Iowa State University Digital Repository
Publication date: 01/01/2015
Field of study

Dimensionality reduction refers to a set of mathematical techniques used to reduce complexity of the original high-dimensional data, while preserving its selected properties. Improvements in simulation strategies and experimental data collection methods are resulting in a deluge of heterogeneous and high-dimensional data, which often makes dimensionality reduction the only viable way to gain qualitative and quantitative understanding of the data. However, existing dimensionality reduction software often does not scale to datasets arising in real-life applications, which may consist of thousands of points with millions of dimensions. In this paper, we propose a parallel framework for dimensionality reduction of large-scale data. We identify key components underlying the spectral dimensionality reduction techniques, and propose their efficient parallel implementation. We show that the resulting framework can be used to process datasets consisting of millions of points when executed on a 16,000-core cluster, which is beyond the reach of currently available methods. To further demonstrate applicability of our framework we perform dimensionality reduction of 75,000 images representing morphology evolution during manufacturing of organic solar cells in order to identify how processing parameters affect morphology evolution

Digital Repository @ Iowa State University (ISU)

Crossref

Directory of Open Access Journals

PCAP: A Whole-Genome Assembly Program

Author: Aluru Srinivas
Hillier LaDeana
Huang Xiaoqiu
Huang Xiaoqiu
Wang Jianmin
Yang Shiaw-Pyng
Publication venue: Digital Commons@Becker
Publication date: 01/01/2003
Field of study

We describe a whole-genome assembly program named PCAP for processing tens of millions of reads. The PCAP program has several features to address efficiency and accuracy issues in assembly. Multiple processors are used to perform most time-consuming computations in assembly. A more sensitive method is used to avoid missing overlaps caused by sequencing errors. Repetitive regions of reads are detected on the basis of many overlaps with other reads, instead of many shorter word matches with other reads. Contaminated end regions of reads are identified and removed. Generation of a consensus sequence for a contig is based on an alignment of reads in the contig, in which both base quality values and coverage information are used to determine every consensus base. The PCAP program was tested on a mouse whole-genome data set of 30 million reads and a human Chromosome 20 data set of 1.7 million reads. The program is freely available for academic use

Digital Repository @ Iowa State University (ISU)

Digital Commons@Becker

PubMed Central