974 research outputs found

    Human Associations Help to Detect Conventionalized Multiword Expressions

    Full text link
    In this paper we show that if we want to obtain human evidence about conventionalization of some phrases, we should ask native speakers about associations they have to a given phrase and its component words. We have shown that if component words of a phrase have each other as frequent associations, then this phrase can be considered as conventionalized. Another type of conventionalized phrases can be revealed using two factors: low entropy of phrase associations and low intersection of component word and phrase associations. The association experiments were performed for the Russian language

    Mixed up with machine Translation: Multi-word Units Disambiguation Challenge.

    Get PDF
    With the rapid evolution of the Internet, translation has become part of the daily life of ordinary users, not only of professional translators. Machine translation has evolved along with different types of computer-assisted translation tools. Qualitative progress has been made in the field of machine translation, but not all problems have been solved. The current times are auspicious for the development of more sophisticated evaluation tools that measure the performance of specific linguistic phenomena. One problem in particular, namely the poor analysis and translation of multi-word units, is an arena where investment in linguistic knowledge systems with the goal of improving machine translation would be beneficial. This paper addresses the difficulties multi-word units present to machine translation, by comparing translations performed by systems adopting different approaches to machine translation. It proposes a solution for improving the quality of the translation of multi-word units by adopting a methodology that combines Lexicon Grammar resources with OpenLogos lexical resources and semantico-syntactic rules. Finally, it discusses how an ideal machine translation evaluation tool might look to correctly evaluate the performance of machine translation engines with regards to multi-word units and thus to contribute to the improvement of translation quality

    Mapping the constructicon with SYMPAThy. Italian word combinations between fixedness and productivity

    Get PDF
    This work introduces SYMPAThy, a data representation model in which the combinatorial properties of a lexical item are described by merging surface and deeper linguistic information. The proposed approach is then evaluated by comparing, for a sample list of verbal idioms, a set of SYMPAThy-based fixedness indexes against the relevant speaker-elicited indexes available in the descriptive norms collected by Tabossi et al. (2011

    Multiword expression processing: A survey

    Get PDF
    Multiword expressions (MWEs) are a class of linguistic forms spanning conventional word boundaries that are both idiosyncratic and pervasive across different languages. The structure of linguistic processing that depends on the clear distinction between words and phrases has to be re-thought to accommodate MWEs. The issue of MWE handling is crucial for NLP applications, where it raises a number of challenges. The emergence of solutions in the absence of guiding principles motivates this survey, whose aim is not only to provide a focused review of MWE processing, but also to clarify the nature of interactions between MWE processing and downstream applications. We propose a conceptual framework within which challenges and research contributions can be positioned. It offers a shared understanding of what is meant by "MWE processing," distinguishing the subtasks of MWE discovery and identification. It also elucidates the interactions between MWE processing and two use cases: Parsing and machine translation. Many of the approaches in the literature can be differentiated according to how MWE processing is timed with respect to underlying use cases. We discuss how such orchestration choices affect the scope of MWE-aware systems. For each of the two MWE processing subtasks and for each of the two use cases, we conclude on open issues and research perspectives

    Multiword expressions at length and in depth

    Get PDF
    The annual workshop on multiword expressions takes place since 2001 in conjunction with major computational linguistics conferences and attracts the attention of an ever-growing community working on a variety of languages, linguistic phenomena and related computational processing issues. MWE 2017 took place in Valencia, Spain, and represented a vibrant panorama of the current research landscape on the computational treatment of multiword expressions, featuring many high-quality submissions. Furthermore, MWE 2017 included the first shared task on multilingual identification of verbal multiword expressions. The shared task, with extended communal work, has developed important multilingual resources and mobilised several research groups in computational linguistics worldwide. This book contains extended versions of selected papers from the workshop. Authors worked hard to include detailed explanations, broader and deeper analyses, and new exciting results, which were thoroughly reviewed by an internationally renowned committee. We hope that this distinctly joint effort will provide a meaningful and useful snapshot of the multilingual state of the art in multiword expressions modelling and processing, and will be a point point of reference for future work

    The role of constituents in multiword expressions

    Get PDF
    Multiword expressions (MWEs), such as noun compounds (e.g. nickname in English, and Ohrwurm in German), complex verbs (e.g. give up in English, and aufgeben in German) and idioms (e.g. break the ice in English, and das Eis brechen in German), may be interpreted literally but often undergo meaning shifts with respect to their constituents. Theoretical, psycholinguistic as well as computational linguistic research remain puzzled by when and how MWEs receive literal vs. meaning-shifted interpretations, what the contributions of the MWE constituents are to the degree of semantic transparency (i.e., meaning compositionality) of the MWE, and how literal vs. meaning-shifted MWEs are processed and computed. This edited volume presents an interdisciplinary selection of seven papers on recent findings across linguistic, psycholinguistic, corpus-based and computational research fields and perspectives, discussing the interaction of constituent properties and MWE meanings, and how MWE constituents contribute to the processing and representation of MWEs. The collection is based on a workshop at the 2017 annual conference of the German Linguistic Society (DGfS) that took place at Saarland University in Saarbrücken, Germany
    • …
    corecore