80,329 research outputs found
Coordinate noun phrase disambiguation in a generative parsing model
In this paper we present methods for improving the disambiguation of noun phrase (NP) coordination within the framework of a lexicalised history-based parsing model. As
well as reducing noise in the data, we look at modelling two main sources of information for disambiguation: symmetry in conjunct structure, and the dependency between conjunct lexical heads. Our changes to the baseline model result in an increase in NP coordination dependency f-score from 69.9% to
73.8%, which represents a relative reduction in f-score error of 13%
Topological network alignment uncovers biological function and phylogeny
Sequence comparison and alignment has had an enormous impact on our
understanding of evolution, biology, and disease. Comparison and alignment of
biological networks will likely have a similar impact. Existing network
alignments use information external to the networks, such as sequence, because
no good algorithm for purely topological alignment has yet been devised. In
this paper, we present a novel algorithm based solely on network topology, that
can be used to align any two networks. We apply it to biological networks to
produce by far the most complete topological alignments of biological networks
to date. We demonstrate that both species phylogeny and detailed biological
function of individual proteins can be extracted from our alignments.
Topology-based alignments have the potential to provide a completely new,
independent source of phylogenetic information. Our alignment of the
protein-protein interaction networks of two very different species--yeast and
human--indicate that even distant species share a surprising amount of network
topology with each other, suggesting broad similarities in internal cellular
wiring across all life on Earth.Comment: Algorithm explained in more details. Additional analysis adde
Sequence alignment, mutual information, and dissimilarity measures for constructing phylogenies
Existing sequence alignment algorithms use heuristic scoring schemes which
cannot be used as objective distance metrics. Therefore one relies on measures
like the p- or log-det distances, or makes explicit, and often simplistic,
assumptions about sequence evolution. Information theory provides an
alternative, in the form of mutual information (MI) which is, in principle, an
objective and model independent similarity measure. MI can be estimated by
concatenating and zipping sequences, yielding thereby the "normalized
compression distance". So far this has produced promising results, but with
uncontrolled errors. We describe a simple approach to get robust estimates of
MI from global pairwise alignments. Using standard alignment algorithms, this
gives for animal mitochondrial DNA estimates that are strikingly close to
estimates obtained from the alignment free methods mentioned above. Our main
result uses algorithmic (Kolmogorov) information theory, but we show that
similar results can also be obtained from Shannon theory. Due to the fact that
it is not additive, normalized compression distance is not an optimal metric
for phylogenetics, but we propose a simple modification that overcomes the
issue of additivity. We test several versions of our MI based distance measures
on a large number of randomly chosen quartets and demonstrate that they all
perform better than traditional measures like the Kimura or log-det (resp.
paralinear) distances. Even a simplified version based on single letter Shannon
entropies, which can be easily incorporated in existing software packages, gave
superior results throughout the entire animal kingdom. But we see the main
virtue of our approach in a more general way. For example, it can also help to
judge the relative merits of different alignment algorithms, by estimating the
significance of specific alignments.Comment: 19 pages + 16 pages of supplementary materia
A new distribution-based test of self-similarity
In studying the scale invariance of an empirical time series a twofold problem arises: it is necessary to test the series for self-similarity and, once passed such a test, the goal becomes to estimate the parameter H0 of self-similarity. The estimation is therefore correct only if the sequence is truly self-similar but in general this is just assumed and not tested in advance. In this paper we suggest a solution for this problem. Given the process {X(t)}, we propose a new test based on the diameter d of the space of the rescaled probability distribution functions of X(t). Two necessary conditions are deduced which contribute to discriminate self-similar processes and a closed formula is provided for the diameter of the fractional Brownian motion (fBm). Furthermore, by properly chosing the distance function, we reduce the measure of self-similarity to the Smirnov statistics when the one-dimensional distributions of X(t) are considered. This permits the application of the well-known two-sided test due to Kolmogorov and Smirnov in order to evaluate the statistical significance of the diameter d, even in the case of strongly dependent sequences. As a consequence, our approach both tests the series for self-similarity and provides an estimate of the self-similarity parameter
Nonlocal Myriad Filters for Cauchy Noise Removal
The contribution of this paper is two-fold. First, we introduce a generalized
myriad filter, which is a method to compute the joint maximum likelihood
estimator of the location and the scale parameter of the Cauchy distribution.
Estimating only the location parameter is known as myriad filter. We propose an
efficient algorithm to compute the generalized myriad filter and prove its
convergence. Special cases of this algorithm result in the classical myriad
filtering, respective an algorithm for estimating only the scale parameter.
Based on an asymptotic analysis, we develop a second, even faster generalized
myriad filtering technique.
Second, we use our new approaches within a nonlocal, fully unsupervised
method to denoise images corrupted by Cauchy noise. Special attention is paid
to the determination of similar patches in noisy images. Numerical examples
demonstrate the excellent performance of our algorithms which have moreover the
advantage to be robust with respect to the parameter choice
Exploiting source similarity for SMT using context-informed features
In this paper, we introduce context informed features in a log-linear phrase-based SMT framework; these features enable us to exploit source similarity in addition to target similarity modeled by the language model. We
present a memory-based classification framework that enables the estimation of these features while avoiding
sparseness problems. We evaluate the performance of our approach on Italian-to-English and Chinese-to-English translation tasks using a state-of-the-art phrase-based SMT
system, and report significant improvements for both BLEU and NIST scores when adding the context-informed features
- ā¦