Search CORE

3,353 research outputs found

Reconstruction of Integers from Pairwise Distances

Author: Hassibi Babak
Jaganathan Kishore
Publication venue
Publication date: 11/12/2012
Field of study

Given a set of integers, one can easily construct the set of their pairwise distances. We consider the inverse problem: given a set of pairwise distances, find the integer set which realizes the pairwise distance set. This problem arises in a lot of fields in engineering and applied physics, and has confounded researchers for over 60 years. It is one of the few fundamental problems that are neither known to be NP-hard nor solvable by polynomial-time algorithms. Whether unique recovery is possible also remains an open question. In many practical applications where this problem occurs, the integer set is naturally sparse (i.e., the integers are sufficiently spaced), a property which has not been explored. In this work, we exploit the sparse nature of the integer set and develop a polynomial-time algorithm which provably recovers the set of integers (up to linear shift and reversal) from the set of their pairwise distances with arbitrarily high probability if the sparsity is O(n^{1/2-\eps}). Numerical simulations verify the effectiveness of the proposed algorithm.Comment: 14 pages, 4 figures, submitted to ICASSP 201

arXiv.org e-Print Archive

Crossref

Caltech Authors

Hidden breakpoints in genome alignments

Author: A. Rambaut
A.C.E. Darling
A.E. Darling
A.L. Delcher
C.D. Greenman
D. Medini
E. Tannier
G. Fudenberg
M. Blanchette
M. Nowacki
M.A. Umbarger
S. De
S. Schwartz
S.V. Angiuoli
V. Kolmogorov
Publication venue
Publication date: 01/01/2012
Field of study

During the course of evolution, an organism's genome can undergo changes that affect the large-scale structure of the genome. These changes include gene gain, loss, duplication, chromosome fusion, fission, and rearrangement. When gene gain and loss occurs in addition to other types of rearrangement, breakpoints of rearrangement can exist that are only detectable by comparison of three or more genomes. An arbitrarily large number of these "hidden" breakpoints can exist among genomes that exhibit no rearrangements in pairwise comparisons. We present an extension of the multichromosomal breakpoint median problem to genomes that have undergone gene gain and loss. We then demonstrate that the median distance among three genomes can be used to calculate a lower bound on the number of hidden breakpoints present. We provide an implementation of this calculation including the median distance, along with some practical improvements on the time complexity of the underlying algorithm. We apply our approach to measure the abundance of hidden breakpoints in simulated data sets under a wide range of evolutionary scenarios. We demonstrate that in simulations the hidden breakpoint counts depend strongly on relative rates of inversion and gene gain/loss. Finally we apply current multiple genome aligners to the simulated genomes, and show that all aligners introduce a high degree of error in hidden breakpoint counts, and that this error grows with evolutionary distance in the simulation. Our results suggest that hidden breakpoint error may be pervasive in genome alignments.Comment: 13 pages, 4 figure

arXiv.org e-Print Archive

Crossref

OPUS - University of Technology Sydney

Repository: Freie Universität Berlin (FU), Math Department (fu_mi_publications)

On pairwise distances and median score of three genomes under DCJ

Author: A Bergeron
A Caprara
A Goeffon
AW Xu
AW Xu
AW Xu
E Tannier
MA Alekseyev
MA Alekseyev
MA Alekseyev
MA Alekseyev
Max A Alekseyev
R Lenne
S Yancopoulos
Sergey Aganezov
V Rajan
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 22/10/2012
Field of study

In comparative genomics, the rearrangement distance between two genomes (equal the minimal number of genome rearrangements required to transform them into a single genome) is often used for measuring their evolutionary remoteness. Generalization of this measure to three genomes is known as the median score (while a resulting genome is called median genome). In contrast to the rearrangement distance between two genomes which can be computed in linear time, computing the median score for three genomes is NP-hard. This inspires a quest for simpler and faster approximations for the median score, the most natural of which appears to be the halved sum of pairwise distances which in fact represents a lower bound for the median score. In this work, we study relationship and interplay of pairwise distances between three genomes and their median score under the model of Double-Cut-and-Join (DCJ) rearrangements. Most remarkably we show that while a rearrangement may change the sum of pairwise distances by at most 2 (and thus change the lower bound by at most 1), even the most "powerful" rearrangements in this respect that increase the lower bound by 1 (by moving one genome farther away from each of the other two genomes), which we call strong, do not necessarily affect the median score. This observation implies that the two measures are not as well-correlated as one's intuition may suggest. We further prove that the median score attains the lower bound exactly on the triples of genomes that can be obtained from a single genome with strong rearrangements. While the sum of pairwise distances with the factor 2/3 represents an upper bound for the median score, its tightness remains unclear. Nonetheless, we show that the difference of the median score and its lower bound is not bounded by a constant.Comment: Proceedings of the 10-th Annual RECOMB Satellite Workshop on Comparative Genomics (RECOMB-CG), 2012. (to appear

arXiv.org e-Print Archive

Crossref

Springer - Publisher Connector

Reconstructing pedigrees: some identifiability questions for a recombination-mutation model

Author: BD McKay
BD Thatte
Bhalchandra D. Thatte
C Semple
H Whitney
J Pearl
JBS Haldane
JT Chang
K Lange
KA Zareckiĭ
L Lovász
M Steel
M Steel
O Bininda-Emonds
SM Ulam
T Petrie
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 05/09/2011
Field of study

Pedigrees are directed acyclic graphs that represent ancestral relationships between individuals in a population. Based on a schematic recombination process, we describe two simple Markov models for sequences evolving on pedigrees - Model R (recombinations without mutations) and Model RM (recombinations with mutations). For these models, we ask an identifiability question: is it possible to construct a pedigree from the joint probability distribution of extant sequences? We present partial identifiability results for general pedigrees: we show that when the crossover probabilities are sufficiently small, certain spanning subgraph sequences can be counted from the joint distribution of extant sequences. We demonstrate how pedigrees that earlier seemed difficult to distinguish are distinguished by counting their spanning subgraph sequences.Comment: 40 pages, 9 figure

arXiv.org e-Print Archive

Crossref

Lossless Representation of Graphs using Distributions

Author: Boutin Mireille
Kemper Gregor
Publication venue
Publication date: 01/01/2007
Field of study

We consider complete graphs with edge weights and/or node weights taking values in some set. In the first part of this paper, we show that a large number of graphs are completely determined, up to isomorphism, by the distribution of their sub-triangles. In the second part, we propose graph representations in terms of one-dimensional distributions (e.g., distribution of the node weights, sum of adjacent weights, etc.). For the case when the weights of the graph are real-valued vectors, we show that all graphs, except for a set of measure zero, are uniquely determined, up to isomorphism, from these distributions. The motivating application for this paper is the problem of browsing through large sets of graphs.Comment: 19 page

arXiv.org e-Print Archive

CiteSeerX