3,353 research outputs found
Reconstruction of Integers from Pairwise Distances
Given a set of integers, one can easily construct the set of their pairwise
distances. We consider the inverse problem: given a set of pairwise distances,
find the integer set which realizes the pairwise distance set. This problem
arises in a lot of fields in engineering and applied physics, and has
confounded researchers for over 60 years. It is one of the few fundamental
problems that are neither known to be NP-hard nor solvable by polynomial-time
algorithms. Whether unique recovery is possible also remains an open question.
In many practical applications where this problem occurs, the integer set is
naturally sparse (i.e., the integers are sufficiently spaced), a property which
has not been explored. In this work, we exploit the sparse nature of the
integer set and develop a polynomial-time algorithm which provably recovers the
set of integers (up to linear shift and reversal) from the set of their
pairwise distances with arbitrarily high probability if the sparsity is
O(n^{1/2-\eps}). Numerical simulations verify the effectiveness of the
proposed algorithm.Comment: 14 pages, 4 figures, submitted to ICASSP 201
Hidden breakpoints in genome alignments
During the course of evolution, an organism's genome can undergo changes that
affect the large-scale structure of the genome. These changes include gene
gain, loss, duplication, chromosome fusion, fission, and rearrangement. When
gene gain and loss occurs in addition to other types of rearrangement,
breakpoints of rearrangement can exist that are only detectable by comparison
of three or more genomes. An arbitrarily large number of these "hidden"
breakpoints can exist among genomes that exhibit no rearrangements in pairwise
comparisons.
We present an extension of the multichromosomal breakpoint median problem to
genomes that have undergone gene gain and loss. We then demonstrate that the
median distance among three genomes can be used to calculate a lower bound on
the number of hidden breakpoints present. We provide an implementation of this
calculation including the median distance, along with some practical
improvements on the time complexity of the underlying algorithm.
We apply our approach to measure the abundance of hidden breakpoints in
simulated data sets under a wide range of evolutionary scenarios. We
demonstrate that in simulations the hidden breakpoint counts depend strongly on
relative rates of inversion and gene gain/loss. Finally we apply current
multiple genome aligners to the simulated genomes, and show that all aligners
introduce a high degree of error in hidden breakpoint counts, and that this
error grows with evolutionary distance in the simulation. Our results suggest
that hidden breakpoint error may be pervasive in genome alignments.Comment: 13 pages, 4 figure
On pairwise distances and median score of three genomes under DCJ
In comparative genomics, the rearrangement distance between two genomes
(equal the minimal number of genome rearrangements required to transform them
into a single genome) is often used for measuring their evolutionary
remoteness. Generalization of this measure to three genomes is known as the
median score (while a resulting genome is called median genome). In contrast to
the rearrangement distance between two genomes which can be computed in linear
time, computing the median score for three genomes is NP-hard. This inspires a
quest for simpler and faster approximations for the median score, the most
natural of which appears to be the halved sum of pairwise distances which in
fact represents a lower bound for the median score.
In this work, we study relationship and interplay of pairwise distances
between three genomes and their median score under the model of
Double-Cut-and-Join (DCJ) rearrangements. Most remarkably we show that while a
rearrangement may change the sum of pairwise distances by at most 2 (and thus
change the lower bound by at most 1), even the most "powerful" rearrangements
in this respect that increase the lower bound by 1 (by moving one genome
farther away from each of the other two genomes), which we call strong, do not
necessarily affect the median score. This observation implies that the two
measures are not as well-correlated as one's intuition may suggest.
We further prove that the median score attains the lower bound exactly on the
triples of genomes that can be obtained from a single genome with strong
rearrangements. While the sum of pairwise distances with the factor 2/3
represents an upper bound for the median score, its tightness remains unclear.
Nonetheless, we show that the difference of the median score and its lower
bound is not bounded by a constant.Comment: Proceedings of the 10-th Annual RECOMB Satellite Workshop on
Comparative Genomics (RECOMB-CG), 2012. (to appear
Reconstructing pedigrees: some identifiability questions for a recombination-mutation model
Pedigrees are directed acyclic graphs that represent ancestral relationships
between individuals in a population. Based on a schematic recombination
process, we describe two simple Markov models for sequences evolving on
pedigrees - Model R (recombinations without mutations) and Model RM
(recombinations with mutations). For these models, we ask an identifiability
question: is it possible to construct a pedigree from the joint probability
distribution of extant sequences? We present partial identifiability results
for general pedigrees: we show that when the crossover probabilities are
sufficiently small, certain spanning subgraph sequences can be counted from the
joint distribution of extant sequences. We demonstrate how pedigrees that
earlier seemed difficult to distinguish are distinguished by counting their
spanning subgraph sequences.Comment: 40 pages, 9 figure
Lossless Representation of Graphs using Distributions
We consider complete graphs with edge weights and/or node weights taking
values in some set. In the first part of this paper, we show that a large
number of graphs are completely determined, up to isomorphism, by the
distribution of their sub-triangles. In the second part, we propose graph
representations in terms of one-dimensional distributions (e.g., distribution
of the node weights, sum of adjacent weights, etc.). For the case when the
weights of the graph are real-valued vectors, we show that all graphs, except
for a set of measure zero, are uniquely determined, up to isomorphism, from
these distributions. The motivating application for this paper is the problem
of browsing through large sets of graphs.Comment: 19 page
- …