8,183 research outputs found

    Multiple sequence alignment in historical linguistics. A sound class based approach

    Get PDF
    In this paper, a new method for multiple sequence alignment in historical linguistics is presented. The algorithm is based on the traditional framework of progressive multiple sequence alignment (cf. Durbin et al. 2002:143-149) whose shortcomings are further enhanced by (1) a sound class representation of phonetic sequences (cf. Dolgopolsky 1986, Turchin et al. 2010) accompanied by specific scoring functions, (2) the modification of gap scores based on prosodic context, (3) a new method for the detection of swapped sites in already aligned sequences. The algorithm is implemented as part of the LingPy library (http://lingulist.de/lingpy), a suite of open source Python modules for various tasks in quantitative historical linguistics. The method was tested on a benchmark dataset of 152 manually edited multiple alignments covering data for 192 Bulgarian dialects (Prokić et al. 2009). The results show that the new method yields alignments which differ only in 5 % of all sequences from the gold standard

    What does Attention in Neural Machine Translation Pay Attention to?

    Full text link
    Attention in neural machine translation provides the possibility to encode relevant parts of the source sentence at each translation step. As a result, attention is considered to be an alignment model as well. However, there is no work that specifically studies attention and provides analysis of what is being learned by attention models. Thus, the question still remains that how attention is similar or different from the traditional alignment. In this paper, we provide detailed analysis of attention and compare it to traditional alignment. We answer the question of whether attention is only capable of modelling translational equivalent or it captures more information. We show that attention is different from alignment in some cases and is capturing useful information other than alignments.Comment: To appear in IJCNLP 201

    Robust Subgraph Generation Improves Abstract Meaning Representation Parsing

    Full text link
    The Abstract Meaning Representation (AMR) is a representation for open-domain rich semantics, with potential use in fields like event extraction and machine translation. Node generation, typically done using a simple dictionary lookup, is currently an important limiting factor in AMR parsing. We propose a small set of actions that derive AMR subgraphs by transformations on spans of text, which allows for more robust learning of this stage. Our set of construction actions generalize better than the previous approach, and can be learned with a simple classifier. We improve on the previous state-of-the-art result for AMR parsing, boosting end-to-end performance by 3 F1_1 on both the LDC2013E117 and LDC2014T12 datasets.Comment: To appear in ACL 201
    • …
    corecore