80,329 research outputs found

    Coordinate noun phrase disambiguation in a generative parsing model

    Get PDF
    In this paper we present methods for improving the disambiguation of noun phrase (NP) coordination within the framework of a lexicalised history-based parsing model. As well as reducing noise in the data, we look at modelling two main sources of information for disambiguation: symmetry in conjunct structure, and the dependency between conjunct lexical heads. Our changes to the baseline model result in an increase in NP coordination dependency f-score from 69.9% to 73.8%, which represents a relative reduction in f-score error of 13%

    Topological network alignment uncovers biological function and phylogeny

    Full text link
    Sequence comparison and alignment has had an enormous impact on our understanding of evolution, biology, and disease. Comparison and alignment of biological networks will likely have a similar impact. Existing network alignments use information external to the networks, such as sequence, because no good algorithm for purely topological alignment has yet been devised. In this paper, we present a novel algorithm based solely on network topology, that can be used to align any two networks. We apply it to biological networks to produce by far the most complete topological alignments of biological networks to date. We demonstrate that both species phylogeny and detailed biological function of individual proteins can be extracted from our alignments. Topology-based alignments have the potential to provide a completely new, independent source of phylogenetic information. Our alignment of the protein-protein interaction networks of two very different species--yeast and human--indicate that even distant species share a surprising amount of network topology with each other, suggesting broad similarities in internal cellular wiring across all life on Earth.Comment: Algorithm explained in more details. Additional analysis adde

    Sequence alignment, mutual information, and dissimilarity measures for constructing phylogenies

    Get PDF
    Existing sequence alignment algorithms use heuristic scoring schemes which cannot be used as objective distance metrics. Therefore one relies on measures like the p- or log-det distances, or makes explicit, and often simplistic, assumptions about sequence evolution. Information theory provides an alternative, in the form of mutual information (MI) which is, in principle, an objective and model independent similarity measure. MI can be estimated by concatenating and zipping sequences, yielding thereby the "normalized compression distance". So far this has produced promising results, but with uncontrolled errors. We describe a simple approach to get robust estimates of MI from global pairwise alignments. Using standard alignment algorithms, this gives for animal mitochondrial DNA estimates that are strikingly close to estimates obtained from the alignment free methods mentioned above. Our main result uses algorithmic (Kolmogorov) information theory, but we show that similar results can also be obtained from Shannon theory. Due to the fact that it is not additive, normalized compression distance is not an optimal metric for phylogenetics, but we propose a simple modification that overcomes the issue of additivity. We test several versions of our MI based distance measures on a large number of randomly chosen quartets and demonstrate that they all perform better than traditional measures like the Kimura or log-det (resp. paralinear) distances. Even a simplified version based on single letter Shannon entropies, which can be easily incorporated in existing software packages, gave superior results throughout the entire animal kingdom. But we see the main virtue of our approach in a more general way. For example, it can also help to judge the relative merits of different alignment algorithms, by estimating the significance of specific alignments.Comment: 19 pages + 16 pages of supplementary materia

    A new distribution-based test of self-similarity

    Get PDF
    In studying the scale invariance of an empirical time series a twofold problem arises: it is necessary to test the series for self-similarity and, once passed such a test, the goal becomes to estimate the parameter H0 of self-similarity. The estimation is therefore correct only if the sequence is truly self-similar but in general this is just assumed and not tested in advance. In this paper we suggest a solution for this problem. Given the process {X(t)}, we propose a new test based on the diameter d of the space of the rescaled probability distribution functions of X(t). Two necessary conditions are deduced which contribute to discriminate self-similar processes and a closed formula is provided for the diameter of the fractional Brownian motion (fBm). Furthermore, by properly chosing the distance function, we reduce the measure of self-similarity to the Smirnov statistics when the one-dimensional distributions of X(t) are considered. This permits the application of the well-known two-sided test due to Kolmogorov and Smirnov in order to evaluate the statistical significance of the diameter d, even in the case of strongly dependent sequences. As a consequence, our approach both tests the series for self-similarity and provides an estimate of the self-similarity parameter

    Nonlocal Myriad Filters for Cauchy Noise Removal

    Full text link
    The contribution of this paper is two-fold. First, we introduce a generalized myriad filter, which is a method to compute the joint maximum likelihood estimator of the location and the scale parameter of the Cauchy distribution. Estimating only the location parameter is known as myriad filter. We propose an efficient algorithm to compute the generalized myriad filter and prove its convergence. Special cases of this algorithm result in the classical myriad filtering, respective an algorithm for estimating only the scale parameter. Based on an asymptotic analysis, we develop a second, even faster generalized myriad filtering technique. Second, we use our new approaches within a nonlocal, fully unsupervised method to denoise images corrupted by Cauchy noise. Special attention is paid to the determination of similar patches in noisy images. Numerical examples demonstrate the excellent performance of our algorithms which have moreover the advantage to be robust with respect to the parameter choice

    Exploiting source similarity for SMT using context-informed features

    Get PDF
    In this paper, we introduce context informed features in a log-linear phrase-based SMT framework; these features enable us to exploit source similarity in addition to target similarity modeled by the language model. We present a memory-based classification framework that enables the estimation of these features while avoiding sparseness problems. We evaluate the performance of our approach on Italian-to-English and Chinese-to-English translation tasks using a state-of-the-art phrase-based SMT system, and report significant improvements for both BLEU and NIST scores when adding the context-informed features
    • ā€¦
    corecore