28,507 research outputs found
The distribution of cycles in breakpoint graphs of signed permutations
Breakpoint graphs are ubiquitous structures in the field of genome
rearrangements. Their cycle decomposition has proved useful in computing and
bounding many measures of (dis)similarity between genomes, and studying the
distribution of those cycles is therefore critical to gaining insight on the
distributions of the genomic distances that rely on it. We extend here the work
initiated by Doignon and Labarre, who enumerated unsigned permutations whose
breakpoint graph contains cycles, to signed permutations, and prove
explicit formulas for computing the expected value and the variance of the
corresponding distributions, both in the unsigned case and in the signed case.
We also compare these distributions to those of several well-studied distances,
emphasising the cases where approximations obtained in this way stand out.
Finally, we show how our results can be used to derive simpler proofs of other
previously known results
A Distance-Based Test of Association Between Paired Heterogeneous Genomic Data
Due to rapid technological advances, a wide range of different measurements
can be obtained from a given biological sample including single nucleotide
polymorphisms, copy number variation, gene expression levels, DNA methylation
and proteomic profiles. Each of these distinct measurements provides the means
to characterize a certain aspect of biological diversity, and a fundamental
problem of broad interest concerns the discovery of shared patterns of
variation across different data types. Such data types are heterogeneous in the
sense that they represent measurements taken at very different scales or
described by very different data structures. We propose a distance-based
statistical test, the generalized RV (GRV) test, to assess whether there is a
common and non-random pattern of variability between paired biological
measurements obtained from the same random sample. The measurements enter the
test through distance measures which can be chosen to capture particular
aspects of the data. An approximate null distribution is proposed to compute
p-values in closed-form and without the need to perform costly Monte Carlo
permutation procedures. Compared to the classical Mantel test for association
between distance matrices, the GRV test has been found to be more powerful in a
number of simulation settings. We also report on an application of the GRV test
to detect biological pathways in which genetic variability is associated to
variation in gene expression levels in ovarian cancer samples, and present
results obtained from two independent cohorts
Permutation-invariant distance between atomic configurations
We present a permutation-invariant distance between atomic configurations,
defined through a functional representation of atomic positions. This distance
enables to directly compare different atomic environments with an arbitrary
number of particles, without going through a space of reduced dimensionality
(i.e. fingerprints) as an intermediate step. Moreover, this distance is
naturally invariant through permutations of atoms, avoiding the time consuming
associated minimization required by other common criteria (like the Root Mean
Square Distance). Finally, the invariance through global rotations is accounted
for by a minimization procedure in the space of rotations solved by Monte Carlo
simulated annealing. A formal framework is also introduced, showing that the
distance we propose verifies the property of a metric on the space of atomic
configurations. Two examples of applications are proposed. The first one
consists in evaluating faithfulness of some fingerprints (or descriptors), i.e.
their capacity to represent the structural information of a configuration. The
second application concerns structural analysis, where our distance proves to
be efficient in discriminating different local structures and even classifying
their degree of similarity
A Study of Metrics of Distance and Correlation Between Ranked Lists for Compositionality Detection
Compositionality in language refers to how much the meaning of some phrase
can be decomposed into the meaning of its constituents and the way these
constituents are combined. Based on the premise that substitution by synonyms
is meaning-preserving, compositionality can be approximated as the semantic
similarity between a phrase and a version of that phrase where words have been
replaced by their synonyms. Different ways of representing such phrases exist
(e.g., vectors [1] or language models [2]), and the choice of representation
affects the measurement of semantic similarity.
We propose a new compositionality detection method that represents phrases as
ranked lists of term weights. Our method approximates the semantic similarity
between two ranked list representations using a range of well-known distance
and correlation metrics. In contrast to most state-of-the-art approaches in
compositionality detection, our method is completely unsupervised. Experiments
with a publicly available dataset of 1048 human-annotated phrases shows that,
compared to strong supervised baselines, our approach provides superior
measurement of compositionality using any of the distance and correlation
metrics considered
- …