33,768 research outputs found

    Using TERp to augment the system combination for SMT

    Get PDF
    TER-Plus (TERp) is an extended TER evaluation metric incorporating morphology, synonymy and paraphrases. There are three new edit operations in TERp: Stem Matches, Synonym Matches and Phrase Substitutions (Para-phrases). In this paper, we propose a TERp-based augmented system combination in terms of the backbone selection and consensus decoding network. Combining the new properties\ud of the TERp, we also propose a two-pass decoding strategy for the lattice-based phrase-level confusion network(CN) to generate the final result. The experiments conducted on the NIST2008 Chinese-to-English test set show that our TERp-based augmented system combination framework achieves significant improvements in terms of BLEU and TERp scores compared to the state-of-the-art word-level system combination framework and a TER-based combination strategy

    Index-free Heat Kernel Coefficients

    Full text link
    Using index-free notation, we present the diagonal values of the first five heat kernel coefficients associated with a general Laplace-type operator on a compact Riemannian space without boundary. The fifth coefficient appears here for the first time. For a flat space with a gauge connection, the sixth coefficient is given too. Also provided are the leading terms for any coefficient, both in ascending and descending powers of the Yang-Mills and Riemann curvatures, to the same order as required for the fourth coefficient. These results are obtained by directly solving the relevant recursion relations, working in Fock-Schwinger gauge and Riemann normal coordinates. Our procedure is thus noncovariant, but we show that for any coefficient the `gauged' respectively `curved' version is found from the corresponding `non-gauged' respectively `flat' coefficient by making some simple covariant substitutions. These substitutions being understood, the coefficients retain their `flat' form and size. In this sense the fifth and sixth coefficient have only 26 and 75 terms respectively, allowing us to write them down. Using index-free notation also clarifies the general structure of the heat kernel coefficients. In particular, in flat space we find that from the fifth coefficient onward, certain scalars are absent. This may be relevant for the anomalies of quantum field theories in ten or more dimensions.Comment: 38 pages, LaTe

    Sequence alignment, mutual information, and dissimilarity measures for constructing phylogenies

    Get PDF
    Existing sequence alignment algorithms use heuristic scoring schemes which cannot be used as objective distance metrics. Therefore one relies on measures like the p- or log-det distances, or makes explicit, and often simplistic, assumptions about sequence evolution. Information theory provides an alternative, in the form of mutual information (MI) which is, in principle, an objective and model independent similarity measure. MI can be estimated by concatenating and zipping sequences, yielding thereby the "normalized compression distance". So far this has produced promising results, but with uncontrolled errors. We describe a simple approach to get robust estimates of MI from global pairwise alignments. Using standard alignment algorithms, this gives for animal mitochondrial DNA estimates that are strikingly close to estimates obtained from the alignment free methods mentioned above. Our main result uses algorithmic (Kolmogorov) information theory, but we show that similar results can also be obtained from Shannon theory. Due to the fact that it is not additive, normalized compression distance is not an optimal metric for phylogenetics, but we propose a simple modification that overcomes the issue of additivity. We test several versions of our MI based distance measures on a large number of randomly chosen quartets and demonstrate that they all perform better than traditional measures like the Kimura or log-det (resp. paralinear) distances. Even a simplified version based on single letter Shannon entropies, which can be easily incorporated in existing software packages, gave superior results throughout the entire animal kingdom. But we see the main virtue of our approach in a more general way. For example, it can also help to judge the relative merits of different alignment algorithms, by estimating the significance of specific alignments.Comment: 19 pages + 16 pages of supplementary materia
    • 

    corecore