422 research outputs found

    Extending the adverbial coverage of a NLP oriented resource for French

    Get PDF
    This paper presents a work on extending the adverbial entries of LGLex: a NLP oriented syntactic resource for French. Adverbs were extracted from the Lexicon-Grammar tables of both simple adverbs ending in -ment '-ly' (Molinier and Levrier, 2000) and compound adverbs (Gross, 1986; 1990). This work relies on the exploitation of fine-grained linguistic information provided in existing resources. Various features are encoded in both LG tables and they haven't been exploited yet. They describe the relations of deleting, permuting, intensifying and paraphrasing that associate, on the one hand, the simple and compound adverbs and, on the other hand, different types of compound adverbs. The resulting syntactic resource is manually evaluated and freely available under the LGPL-LR license.Comment: Proceedings of the 5th International Joint Conference on Natural Language Processing (IJCNLP'11), Chiang Mai : Thailand (2011

    Contractions: to align or not to align, that is the question

    Get PDF
    This paper performs a detailed analysis on the alignment of Portuguese contractions, based on a previously aligned bilingual corpus. The alignment task was performed manually in a subset of the English-Portuguese CLUE4Translation Alignment Collection. The initial parallel corpus was pre-processed and a decision was made as to whether the contraction should be maintained or decomposed in the alignment. Decomposition was required in the cases in which the two words that have been concatenated, i.e., the preposition and the determiner or pronoun, go in two separate translation alignment pairs (PT - [no seio de] [a União Europeia] EN - [within] [the European Union]). Most contractions required decomposition in contexts where they are positioned at the end of a multiword unit. On the other hand, contractions tend to be maintained when they occur at the beginning or in the middle of the multiword unit, i.e., in the frozen part of the multiword (PT - [no que diz respeito a] EN - [with regard to] or PT - [além disso] EN - [in addition]. A correct alignment of multiwords and phrasal units containing contractions is instrumental for machine translation, paraphrasing, and variety adaptationinfo:eu-repo/semantics/acceptedVersio

    Unsupervised Paraphrasing of Multiword Expressions

    Full text link
    We propose an unsupervised approach to paraphrasing multiword expressions (MWEs) in context. Our model employs only monolingual corpus data and pre-trained language models (without fine-tuning), and does not make use of any external resources such as dictionaries. We evaluate our method on the SemEval 2022 idiomatic semantic text similarity task, and show that it outperforms all unsupervised systems and rivals supervised systems.Comment: 13 pages; accepted for Findings of ACL 202

    Discovering multiword expressions

    Get PDF
    In this paper, we provide an overview of research on multiword expressions (MWEs), from a natural lan- guage processing perspective. We examine methods developed for modelling MWEs that capture some of their linguistic properties, discussing their use for MWE discovery and for idiomaticity detection. We con- centrate on their collocational and contextual preferences, along with their fixedness in terms of canonical forms and their lack of word-for-word translatatibility. We also discuss a sample of the MWE resources that have been used in intrinsic evaluation setups for these methods

    An Achilles’ Heel? Helping Interpreting Students Gain Greater Awareness of Literal and Idiomatic English

    Get PDF
    This research paper reports on a study involving the use of literal and non-literal or idiomatic language in a multilingual interpreter classroom. Previous research has shown that interpreters are not always able to identify and correctly interpret idiomatic language. This study first examined student interpreters’ perceptions of the importance of idiomatic language, then followed by assessing their ability to identify phrases that were literal, idiomatic or both. Lastly it looked at student interpreters’ ability to correctly identify and explain idioms in short phrases and dialogues. Findings showed that, after this exercise, students\u27 awareness of the difference between literal and non-literal language increased, however their ability to correctly identify it did not. Furthermore, their previous focus on \u27specialized terminology\u27 led them to believe that language other than this was hardly worth learning. The article concludes with recommendations for incorporating the findings of this research into interpreter education

    Machine translation of non-contiguous multiword units

    Get PDF
    Non-adjacent linguistic phenomena such as non-contiguous multiwords and other phrasal units containing insertions, i.e., words that are not part of the unit, are difficult to process and remain a problem for NLP applications. Non-contiguous multiword units are common across languages and constitute some of the most important challenges to high quality machine translation. This paper presents an empirical analysis of non-contiguous multiwords, and highlights our use of the Logos Model and the Semtab function to deploy semantic knowledge to align non-contiguous multiword units with the goal to translate these units with high fidelity. The phrase level manual alignments illustrated in the paper were produced with the CLUE-Aligner, a Cross-Language Unit Elicitation alignment tool.info:eu-repo/semantics/acceptedVersio
    • …
    corecore