Search CORE

353 research outputs found

Annotation of multiword expressions in French

Author: Esperança-Rodier Emmanuelle
Iborra Manolo
Reverdy Justine
Tutin Agnès
Publication venue: HAL CCSD
Publication date: 29/06/2015
Field of study

International audienceThis paper presents an experiment of annotation of MWEs in French. The corpus used is made of several genres (news, novel, scientific report, film subtitles) and includes a rich annotation scheme including several kinds of MWEs from collocations to routines and full phrasemes. The annotation is performed semi-automatically with finite-state transducers. The inter-annotator agreement score shows that the annotation is quite consistent but the difficulty of the task relies heavily on the textual genre: literary texts are harder to annotate than scientific reports. Besides, two types of categories are difficult to differentiate, collocations and full phrasemes

Hal - Université Grenoble Alpes

A Computational Lexicon and Representational Model for Arabic Multiword Expressions

Author: Alghamdi Ayman Ahmad O.
Publication venue: University of Leeds
Publication date: 01/10/2018
Field of study

The phenomenon of multiword expressions (MWEs) is increasingly recognised as a serious and challenging issue that has attracted the attention of researchers in various language-related disciplines. Research in these many areas has emphasised the primary role of MWEs in the process of analysing and understanding language, particularly in the computational treatment of natural languages. Ignoring MWE knowledge in any NLP system reduces the possibility of achieving high precision outputs. However, despite the enormous wealth of MWE research and language resources available for English and some other languages, research on Arabic MWEs (AMWEs) still faces multiple challenges, particularly in key computational tasks such as extraction, identification, evaluation, language resource building, and lexical representations. This research aims to remedy this deficiency by extending knowledge of AMWEs and making noteworthy contributions to the existing literature in three related research areas on the way towards building a computational lexicon of AMWEs. First, this study develops a general understanding of AMWEs by establishing a detailed conceptual framework that includes a description of an adopted AMWE concept and its distinctive properties at multiple linguistic levels. Second, in the use of AMWE extraction and discovery tasks, the study employs a hybrid approach that combines knowledge-based and data-driven computational methods for discovering multiple types of AMWEs. Third, this thesis presents a representative system for AMWEs which consists of multilayer encoding of extensive linguistic descriptions. This project also paves the way for further in-depth AMWE-aware studies in NLP and linguistics to gain new insights into this complicated phenomenon in standard Arabic. The implications of this research are related to the vital role of the AMWE lexicon, as a new lexical resource, in the improvement of various ANLP tasks and the potential opportunities this lexicon provides for linguists to analyse and explore AMWE phenomena

Multiword expressions at length and in depth

Author
Publication venue: Language Science Press
Publication date: 01/04/2020
Field of study

The annual workshop on multiword expressions takes place since 2001 in conjunction with major computational linguistics conferences and attracts the attention of an ever-growing community working on a variety of languages, linguistic phenomena and related computational processing issues. MWE 2017 took place in Valencia, Spain, and represented a vibrant panorama of the current research landscape on the computational treatment of multiword expressions, featuring many high-quality submissions. Furthermore, MWE 2017 included the first shared task on multilingual identification of verbal multiword expressions. The shared task, with extended communal work, has developed important multilingual resources and mobilised several research groups in computational linguistics worldwide. This book contains extended versions of selected papers from the workshop. Authors worked hard to include detailed explanations, broader and deeper analyses, and new exciting results, which were thoroughly reviewed by an internationally renowned committee. We hope that this distinctly joint effort will provide a meaningful and useful snapshot of the multilingual state of the art in multiword expressions modelling and processing, and will be a point point of reference for future work

Discriminative lexical semantic segmentation with gaps: running the MWE gamut

Author: Danchik Emily
Dyer Chris
Schneider Nathan
Smith Noah A.
Publication venue
Publication date: 01/04/2014
Field of study

We present a novel representation, evaluation measure, and supervised models for the task of identifying the multiword expressions (MWEs) in a sentence, resulting in a lexical seman-tic segmentation. Our approach generalizes a standard chunking representation to encode MWEs containing gaps, thereby enabling effi-cient sequence tagging algorithms for feature-rich discriminative models. Experiments on a new dataset of English web text offer the first linguistically-driven evaluation of MWE iden-tification with truly heterogeneous expression types. Our statistical sequence model greatly outperforms a lookup-based segmentation pro-cedure, achieving nearly 60 % F1 for MWE identification.

CiteSeerX

Can machines sense irony? : exploring automatic irony detection on social media

Author: Van Hee Cynthia
Publication venue: Ghent University. Faculty of Arts and Philosophy
Publication date: 01/01/2017
Field of study

ISO-DR-core plugs into ISO-dialogue acts for a cross-linguistic taxonomy of discourse markers

Author: Damova Mariana
Silvano Maria da Purificação
Publication venue
Publication date: 01/01/2023
Field of study

The present paper proposes an interoperable taxonomy to represent the meaning of discourse markers based on ISO DR-core (ISO 24617-8) but with a plug-in to ISO-dialogue acts (ISO 24617-2). The proposed taxonomy encompasses two dimensions: the semantic, with values regarding the discourse relations signalled by discourse markers, and the pragmatic, with values concerning the communicative function realized by discourse markers. We present a proof of concept for this twodimensional taxonomy in a multilingual parallel dataset in three languages, English, European Portuguese and Bulgarian, comprising 165 textual segments with multiword discourse makers obtained from publicly available TED Talk transcripts. We show that the two-dimensional taxonomy can successfully annotate cross-linguistically the meaning of discourse markers and discuss linguistic evidence where extension of the proposed taxonomy can be relevant