Search CORE

28,507 research outputs found

The distribution of cycles in breakpoint graphs of signed permutations

Author: Anthony Labarre
Bafna
Björner
Bóna
Christie
Diestel
Doignon
Elias
Fertin
Goodman
Graham
Grusea
Hanlon
Hannenhalli
Kwak
Labarre
Labarre
Li
Macdonald
Simona Grusea
Sury
Székely
Wielandt
Wilf
Publication venue: 'Elsevier BV'
Publication date: 01/01/2012
Field of study

Breakpoint graphs are ubiquitous structures in the field of genome rearrangements. Their cycle decomposition has proved useful in computing and bounding many measures of (dis)similarity between genomes, and studying the distribution of those cycles is therefore critical to gaining insight on the distributions of the genomic distances that rely on it. We extend here the work initiated by Doignon and Labarre, who enumerated unsigned permutations whose breakpoint graph contains

k

cycles, to signed permutations, and prove explicit formulas for computing the expected value and the variance of the corresponding distributions, both in the unsigned case and in the signed case. We also compare these distributions to those of several well-studied distances, emphasising the cases where approximations obtained in this way stand out. Finally, we show how our results can be used to derive simpler proofs of other previously known results

arXiv.org e-Print Archive

CiteSeerX

Crossref

Scientific Publications of the University of Toulouse II Le Mirail

HAL Descartes

HAL-INSA Toulouse

Hal-Diderot

HAL-Ecole des Ponts ParisTech

HAL - UPEC / UPEM

A Distance-Based Test of Association Between Paired Heterogeneous Genomic Data

Author: Curry Edward
Minas Christopher
Montana Giovanni
Publication venue
Publication date: 27/03/2013
Field of study

Due to rapid technological advances, a wide range of different measurements can be obtained from a given biological sample including single nucleotide polymorphisms, copy number variation, gene expression levels, DNA methylation and proteomic profiles. Each of these distinct measurements provides the means to characterize a certain aspect of biological diversity, and a fundamental problem of broad interest concerns the discovery of shared patterns of variation across different data types. Such data types are heterogeneous in the sense that they represent measurements taken at very different scales or described by very different data structures. We propose a distance-based statistical test, the generalized RV (GRV) test, to assess whether there is a common and non-random pattern of variability between paired biological measurements obtained from the same random sample. The measurements enter the test through distance measures which can be chosen to capture particular aspects of the data. An approximate null distribution is proposed to compute p-values in closed-form and without the need to perform costly Monte Carlo permutation procedures. Compared to the classical Mantel test for association between distance matrices, the GRV test has been found to be more powerful in a number of simulation settings. We also report on an application of the GRV test to detect biological pathways in which genetic variability is associated to variation in gene expression levels in ovarian cancer samples, and present results obtained from two independent cohorts

arXiv.org e-Print Archive

Crossref

King's Research Portal

Permutation-invariant distance between atomic configurations

Author: Gabriel Stoltz
Grégoire Ferré
Jean-Bernard Maillet
Rose M. E.
Spall J. C.
Publication venue: 'AIP Publishing'
Publication date: 10/07/2015
Field of study

We present a permutation-invariant distance between atomic configurations, defined through a functional representation of atomic positions. This distance enables to directly compare different atomic environments with an arbitrary number of particles, without going through a space of reduced dimensionality (i.e. fingerprints) as an intermediate step. Moreover, this distance is naturally invariant through permutations of atoms, avoiding the time consuming associated minimization required by other common criteria (like the Root Mean Square Distance). Finally, the invariance through global rotations is accounted for by a minimization procedure in the space of rotations solved by Monte Carlo simulated annealing. A formal framework is also introduced, showing that the distance we propose verifies the property of a metric on the space of atomic configurations. Two examples of applications are proposed. The first one consists in evaluating faithfulness of some fingerprints (or descriptors), i.e. their capacity to represent the structural information of a configuration. The second application concerns structural analysis, where our distance proves to be efficient in discriminating different local structures and even classifying their degree of similarity

arXiv.org e-Print Archive

Crossref

INRIA a CCSD electronic archive server

HAL-CEA

HAL-Ecole des Ponts ParisTech

A Study of Metrics of Distance and Correlation Between Ranked Lists for Compositionality Detection

Author: Hansen Niels Dalum
Lioma Christina
Publication venue
Publication date: 10/03/2017
Field of study

Compositionality in language refers to how much the meaning of some phrase can be decomposed into the meaning of its constituents and the way these constituents are combined. Based on the premise that substitution by synonyms is meaning-preserving, compositionality can be approximated as the semantic similarity between a phrase and a version of that phrase where words have been replaced by their synonyms. Different ways of representing such phrases exist (e.g., vectors [1] or language models [2]), and the choice of representation affects the measurement of semantic similarity. We propose a new compositionality detection method that represents phrases as ranked lists of term weights. Our method approximates the semantic similarity between two ranked list representations using a range of well-known distance and correlation metrics. In contrast to most state-of-the-art approaches in compositionality detection, our method is completely unsupervised. Experiments with a publicly available dataset of 1048 human-annotated phrases shows that, compared to strong supervised baselines, our approach provides superior measurement of compositionality using any of the distance and correlation metrics considered

arXiv.org e-Print Archive

Copenhagen University Research Information System