Search CORE

36,496 research outputs found

A Parallel Algorithm for Exact Bayesian Structure Discovery in Bayesian Networks

Author: Jin Tian
Olga Nikolova
Sage Bionetworks
Srinivas Aluru
Yetian Chen
Publication venue
Publication date: 13/08/2016
Field of study

Exact Bayesian structure discovery in Bayesian networks requires exponential time and space. Using dynamic programming (DP), the fastest known sequential algorithm computes the exact posterior probabilities of structural features in

O(2(d+1)n2^n)

time and space, if the number of nodes (variables) in the Bayesian network is

n

and the in-degree (the number of parents) per node is bounded by a constant

d

. Here we present a parallel algorithm capable of computing the exact posterior probabilities for all

n(n-1)

edges with optimal parallel space efficiency and nearly optimal parallel time efficiency. That is, if

p=2^k

processors are used, the run-time reduces to

O(5(d+1)n2^{n-k}+k(n-k)^d)

and the space usage becomes

O(n2^{n-k})

per processor. Our algorithm is based the observation that the subproblems in the sequential DP algorithm constitute a

n

D

hypercube. We take a delicate way to coordinate the computation of correlated DP procedures such that large amount of data exchange is suppressed. Further, we develop parallel techniques for two variants of the well-known \emph{zeta transform}, which have applications outside the context of Bayesian networks. We demonstrate the capability of our algorithm on datasets with up to 33 variables and its scalability on up to 2048 processors. We apply our algorithm to a biological data set for discovering the yeast pheromone response pathways.Comment: 32 pages, 12 figure

arXiv.org e-Print Archive

CiteSeerX

Computing Runs on a General Alphabet

Author: Kosolobov Dmitry
Publication venue
Publication date: 22/11/2015
Field of study

We describe a RAM algorithm computing all runs (maximal repetitions) of a given string of length

n

over a general ordered alphabet in

O(n\log^{\frac{2}3} n)

time and linear space. Our algorithm outperforms all known solutions working in

\Theta(n\log\sigma)

time provided

\sigma = n^{\Omega(1)}

, where

\sigma

is the alphabet size. We conjecture that there exists a linear time RAM algorithm finding all runs.Comment: 4 pages, 2 figure

arXiv.org e-Print Archive

Institutional repository of Ural Federal University named after the first President of Russia B.N.Yeltsin

Engineering Parallel String Sorting

Author: Bingmann Timo
Eberle Andreas
Sanders Peter
Publication venue
Publication date: 09/03/2014
Field of study

We discuss how string sorting algorithms can be parallelized on modern multi-core shared memory machines. As a synthesis of the best sequential string sorting algorithms and successful parallel sorting algorithms for atomic objects, we first propose string sample sort. The algorithm makes effective use of the memory hierarchy, uses additional word level parallelism, and largely avoids branch mispredictions. Then we focus on NUMA architectures, and develop parallel multiway LCP-merge and -mergesort to reduce the number of random memory accesses to remote nodes. Additionally, we parallelize variants of multikey quicksort and radix sort that are also useful in certain situations. Comprehensive experiments on five current multi-core platforms are then reported and discussed. The experiments show that our implementations scale very well on real-world inputs and modern machines.Comment: 46 pages, extension of "Parallel String Sample Sort" arXiv:1305.115

arXiv.org e-Print Archive

CiteSeerX

KITopen

Optimal-Time Text Indexing in BWT-runs Bounded Space

Author: Gagie Travis
Navarro Gonzalo
Prezza Nicola
Publication venue
Publication date: 11/07/2017
Field of study

Indexing highly repetitive texts --- such as genomic databases, software repositories and versioned text collections --- has become an important problem since the turn of the millennium. A relevant compressibility measure for repetitive texts is

r

, the number of runs in their Burrows-Wheeler Transform (BWT). One of the earliest indexes for repetitive collections, the Run-Length FM-index, used

O(r)

space and was able to efficiently count the number of occurrences of a pattern of length

m

in the text (in loglogarithmic time per pattern symbol, with current techniques). However, it was unable to locate the positions of those occurrences efficiently within a space bounded in terms of

r

. Since then, a number of other indexes with space bounded by other measures of repetitiveness --- the number of phrases in the Lempel-Ziv parse, the size of the smallest grammar generating the text, the size of the smallest automaton recognizing the text factors --- have been proposed for efficiently locating, but not directly counting, the occurrences of a pattern. In this paper we close this long-standing problem, showing how to extend the Run-Length FM-index so that it can locate the