Search CORE

85 research outputs found

Towards Comprehensive Computational Representations of Arabic Multiword Expressions

Author: Abdul Conteh (2667454)
Alessandro Lamorte (3761290)
Enrico Boero (3761296)
Marco Foletti (3761299)
Paola Crida (3761293)
Paolo Narcisi (3761287)
Publication venue: Springer
Publication date: 01/10/2016
Field of study

A successful computational treatment of multiword expressions (MWEs) in natural languages leads to a robust NLP system which considers the long-standing problem of language ambiguity caused primarily by this complex linguistic phenomenon. The first step in addressing this challenge is building an extensive reliable MWEs language resource LR with comprehensive computational representations across all linguistic levels. This forms the cornerstone in understanding the heterogeneous linguistic behaviour of MWEs in their various manifestations. This paper presents a detailed framework for computational representations of Arabic MWEs (ArMWEs) across all linguistic levels based on the state-of-the-art lexical mark-up framework (LMF) with the necessary modifications to suit the distinctive properties of Modern Standard Arabic (MSA). This work forms part of a larger project that aims to develop a comprehensive computational lexicon of ArMWEs for NLP and language pedagogy LP (JOMAL project)

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

White Rose Research Online

FigShare

Towards Comprehensive Computational Representations of Arabic Multiword Expressions

Author: G Francopoulo
G Francopoulo
IA Sag
J Odijk
K Bar
L Wanner
M Butt
M Palmer
MA Attia
T Arts
T Tanabe
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2017
Field of study

Crossref

White Rose Research Online

A Computational Lexicon and Representational Model for Arabic Multiword Expressions

Author: Alghamdi Ayman Ahmad O.
Publication venue: University of Leeds
Publication date: 01/10/2018
Field of study

The phenomenon of multiword expressions (MWEs) is increasingly recognised as a serious and challenging issue that has attracted the attention of researchers in various language-related disciplines. Research in these many areas has emphasised the primary role of MWEs in the process of analysing and understanding language, particularly in the computational treatment of natural languages. Ignoring MWE knowledge in any NLP system reduces the possibility of achieving high precision outputs. However, despite the enormous wealth of MWE research and language resources available for English and some other languages, research on Arabic MWEs (AMWEs) still faces multiple challenges, particularly in key computational tasks such as extraction, identification, evaluation, language resource building, and lexical representations. This research aims to remedy this deficiency by extending knowledge of AMWEs and making noteworthy contributions to the existing literature in three related research areas on the way towards building a computational lexicon of AMWEs. First, this study develops a general understanding of AMWEs by establishing a detailed conceptual framework that includes a description of an adopted AMWE concept and its distinctive properties at multiple linguistic levels. Second, in the use of AMWE extraction and discovery tasks, the study employs a hybrid approach that combines knowledge-based and data-driven computational methods for discovering multiple types of AMWEs. Third, this thesis presents a representative system for AMWEs which consists of multilayer encoding of extensive linguistic descriptions. This project also paves the way for further in-depth AMWE-aware studies in NLP and linguistics to gain new insights into this complicated phenomenon in standard Arabic. The implications of this research are related to the vital role of the AMWE lexicon, as a new lexical resource, in the improvement of various ANLP tasks and the potential opportunities this lexicon provides for linguists to analyse and explore AMWE phenomena

White Rose E-theses Online

Multiword expressions at length and in depth

Author
Publication venue: Language Science Press
Publication date: 01/04/2020
Field of study

The annual workshop on multiword expressions takes place since 2001 in conjunction with major computational linguistics conferences and attracts the attention of an ever-growing community working on a variety of languages, linguistic phenomena and related computational processing issues. MWE 2017 took place in Valencia, Spain, and represented a vibrant panorama of the current research landscape on the computational treatment of multiword expressions, featuring many high-quality submissions. Furthermore, MWE 2017 included the first shared task on multilingual identification of verbal multiword expressions. The shared task, with extended communal work, has developed important multilingual resources and mobilised several research groups in computational linguistics worldwide. This book contains extended versions of selected papers from the workshop. Authors worked hard to include detailed explanations, broader and deeper analyses, and new exciting results, which were thoroughly reviewed by an internationally renowned committee. We hope that this distinctly joint effort will provide a meaningful and useful snapshot of the multilingual state of the art in multiword expressions modelling and processing, and will be a point point of reference for future work

Directory of Open Access Books (DOAB)

Extended papers from the MWE 2017 workshop

Author
Publication venue
Publication date: 01/01/2018
Field of study

Institutional Repository of the Freie Universität Berlin

The automatic processing of multiword expressions in Irish

Author: Walsh Abigail
Publication venue: Dublin City University. ADAPT
Publication date: 01/03/2023
Field of study

It is well-documented that Multiword Expressions (MWEs) pose a unique challenge to a variety of NLP tasks such as machine translation, parsing, information retrieval, and more. For low-resource languages such as Irish, these challenges can be exacerbated by the scarcity of data, and a lack of research in this topic. In order to improve handling of MWEs in various NLP tasks for Irish, this thesis will address both the lack of resources specifically targeting MWEs in Irish, and examine how these resources can be applied to said NLP tasks. We report on the creation and analysis of a number of lexical resources as part of this PhD research. Ilfhocail, a lexicon of Irish MWEs, is created through extract- ing MWEs from other lexical resources such as dictionaries. A corpus annotated with verbal MWEs in Irish is created for the inclusion of Irish in the PARSEME Shared Task 1.2. Additionally, MWEs were tagged in a bilingual EN-GA corpus for inclusion in experiments in machine translation. For the purposes of annotation, a categorisation scheme for nine categories of MWEs in Irish is created, based on combining linguistic analysis on these types of constructions and cross-lingual frameworks for defining MWEs. A case study in applying MWEs to NLP tasks is undertaken, with the exploration of incorporating MWE information while training Neural Machine Translation systems. Finally, the topic of automatic identification of Irish MWEs is explored, documenting the training of a system capable of automatically identifying Irish MWEs from a variety of categories, and the challenges associated with developing such a system. This research contributes towards a greater understanding of Irish MWEs and their applications in NLP, and provides a foundation for future work in exploring other methods for the automatic discovery and identification of Irish MWEs, and further developing the MWE resources described above

DCU Online Research Access Service

Proceedings of the Fifth Italian Conference on Computational Linguistics CLiC-it 2018 : 10-12 December 2018, Torino

Author: Alessandro Mazzei
Elena Cabrio
Fabio Tamburini
Publication venue: 'OpenEdition'
Publication date: 01/01/2018
Field of study

On behalf of the Program Committee, a very warm welcome to the Fifth Italian Conference on Computational Linguistics (CLiC-‐it 2018). This edition of the conference is held in Torino. The conference is locally organised by the University of Torino and hosted into its prestigious main lecture hall “Cavallerizza Reale”. The CLiC-‐it conference series is an initiative of the Italian Association for Computational Linguistics (AILC) which, after five years of activity, has clearly established itself as the premier national forum for research and development in the fields of Computational Linguistics and Natural Language Processing, where leading researchers and practitioners from academia and industry meet to share their research results, experiences, and challenges

Directory of Open Access Books (DOAB)

Essential Speech and Language Technology for Dutch: Results by the STEVIN-programme

Author: Peter Spyns Jan Odijk
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/04/2020
Field of study

Computational Linguistics; Germanic Languages; Artificial Intelligence (incl. Robotics); Computing Methodologie

Directory of Open Access Books (DOAB)

Proceedings of the Fifth Italian Conference on Computational Linguistics CLiC-it 2018

Author: Abramova Ekaterina
Adorni Giovanni
Agrawal Ruchit
Aina Laura
Albanese Teresa
Albanesi Davide
Alzetta Chiara
Amore Matteo
Antonelli Oronzo
Aprosio Alessio Palmero
Balaraman Vevake
Basile Pierpaolo
Basile Valerio
Basili Roberto
Bassignana Elisa
Bellandi Andrea
Bentivogli Luisa
Bernardi Raffaella
Bertoldi Nicola
Bondielli Alessandro
Bos Johan
Bosco Cristina
Bottini Roberto
Brunato Dominique
Brunato⋄ Dominique
Buono Maria Pia di
Busso Lucia
Büchler Marco
Cabrio Elena
Caruso Valeria
Caselli Tommaso
Cecchini Flavio
Celli Fabio
Cervone Alessandra
Chesi Cristiano
Chingacham Anupama
Chiriatti Giulia
Cimino Andrea
Cocciu• Eleonora
Colla Davide
Comandini Gloria
Cordeiro Silvio Ricardo
Crepaldi Davide
Croce Danilo
Curtoni Paolo
Cutugno Francesco
dell’Oglio Pietro
Dell’Orletta Felice
Dell’Orletta⋄ Felice
De Felice Irene
De Martino Maria
Dini Luca
Di Iorio Angelo
Di Nunzio Giorgio Maria
Draetta Lia
Ducceschi Luca
Elia Annibale
Falavigna Daniele
Federico Marcello
Feltracco Anna
Fernández Raquel
Ferro Michele
Fieromonte Martina
Franzini Greta
Gagliardi Gloria
Gala Valentina Della
Gambi Enrico
Ghezzi Ilaria
Giovannetti Emiliano
Gobbi Jacopo
Gretter Roberto
Guarasci Raffaele
Guerini Marco
Gurevych Iryna
Günther Fritz
Herzog Leonardo
Jezek Elisabetta
Koceva Forsina
Lai Mirko
Laudanna Alessandro
Lenci Alessandro
Lepri Bruno
Liano Annarita
Limpens Freddy
Louvan Samuel
Lyding Verena
Magnini Bernardo
Magnolini Simone
Mairano Paolo
Mambrini Francesco
Mana Dario
Mancuso Azzurra
Marchi Simone
Marelli Marco
Marini Costanza
Mazzei Alessandro
McGregor Stephen
Melnikova Elena
Menini Stefano
Mensa Enrico
Merenda Flavio
Mollo Eleonora
Montemagni Simonetta
Montemagni⋄ Simonetta
Monti Johanna
Moretti Giovanni
Moritz Maria
Nadalini Andrea
Negri Matteo
Nicolas Lionel
Nissim Malvina
Novielli Nicole
Okinina Nadezda
Pannitto Ludovica
Paperno Denis
Passalacqua Samuele
Passaro Lucia C.
Passarotti Marco
Patti Viviana
Pecchioli Alessandra
Pellegrini Matteo
Petrolito Ruggero
Pettenati Maria Chiara
Piantanida Giovanni
Poggi Isabella
Porporato Aureliano
Quinci Vito
Radicioni Daniele P.
Ramisch Carlos
Rapp Amon
Riccardi Giuseppe
Rossini Daniele
Rotondi Agata
Ruffolo Paolo
Russo Irene
Sagri Maria Teresa
Sangati Federico
Sanguinetti Manuela
Savary Agata
Savy Renata
Simeoni Rossana
Simi Maria
Sorgente Antonio
Speranza Manuela
Sprugnoli Rachele
Stede Manfred
Stepanov Evgeny A.
Stingo Michele
Tamburini Fabio
Tebbifakhr Amirhossein
Tonelli Sara
Torre Ilaria
Tortoreto Giuliano
Totis Pietro
Trotta Daniela
Turchi Marco
Valeriani Martina
Venturi Giulia
Venturi⋄ Giulia
Vezzani Federica
Villata Serena
Vincze Veronika
Zaghi Claudia
Zovato Enrico
Publication venue: 'OpenEdition'
Publication date: 08/04/2019
Field of study

OpenEdition

Idiom treatment experiments in machine translation

Author: Anastasiou Dimitra
Publication venue: Fakultät 4 - Philosophische Fakultät II. Fachrichtung 4.6 - Angewandte Sprachwissenschaft sowie Übersetzen und Dolmetschen
Publication date: 01/01/2010
Field of study

Idiomatic expressions pose a particular challenge for the today\u27;s Machine Translation systems, because their translation mostly does not result literally, but logically. The present dissertation shows, how with the help of a corpus, and morphosyntactic rules, such idiomatic expressions can be recognized and finally correctly translated. The work leads the reader in the first chapter generally to the field of Machine Translation and following that, it focuses on the special field of Example-based Machine Translation. Next, an important part of the doctoral thesis dissertation is devoted to the theory of idiomatic expressions. The practical part of the thesis describes how the hybrid Example-based Machine Translation system METIS-II, with the help of morphosyntactic rules, is able to correctly process certain idiomatic expressions and finally, to translate them. The following chapter deals with the function of the transfer system CAT2 and its handling of the idiomatic expressions. The last part of the thesis includes the evaluation of three commercial systems, namely SYSTRAN, T1 Langenscheidt, and Power Translator Pro, with respect to continuous and discontinuous idiomatic expressions. For this, both small corpora and a part of the extensive corpus Europarl and the Digital Lexicon of the German Language in 20th century were processed, firstly manually and then automatically. The dissertation concludes with results from this evaluation.Idiomatische Redewendungen stellen für heutige maschinelle Übersetzungssysteme eine besondere Herausforderung dar, da ihre Übersetzung nicht wörtlich, sondern stets sinngemäß erfolgen muss. Die vorliegende Dissertation zeigt, wie mit Hilfe eines Korpus sowie morphosyntaktischer Regeln solche idiomatische Redewendungen erkannt und am Ende richtig übersetzt werden können. Die Arbeit führt den Leser im ersten Kapitel allgemein in das Gebiet der Maschinellen Übersetzung ein und vertieft im Anschluss daran das Spezialgebiet der Beispielbasierten Maschinellen Übersetzung. Im Folgenden widmet sich ein wesentlicher Teil der Doktorarbeit der Theorie über idiomatische Redewendungen. Der praktische Teil der Arbeit beschreibt wie das hybride Beispielbasierte Maschinelle Übersetzungssystem METIS-II mit Hilfe von morphosyntaktischen Regeln befähigt wurde, bestimmte idiomatische Redewendungen korrekt zu bearbeiten und am Ende zu übersetzen. Das nachfolgende Kapitel behandelt die Funktion des Transfersystems CAT2 und dessen Umgang mit idiomatischen Wendungen. Der letzte Teil der Arbeit beinhaltet die Evaluation von drei kommerzielle Systemen, nämlich SYSTRAN, T1 Langenscheidt und Power Translator Pro, in Bezug auf deren Umgang mit kontinuierlichen und diskontinuierlichen idiomatischen Redewendungen. Hierzu wurden sowohl kleine Korpora als auch ein Teil des umfangreichen Korpus Europarl und des Digatalen Wörterbuchs der deutschen Sprache des 20. Jh. erst manuell und dann maschinell bearbeitet. Die Dissertation wird mit Folgerungen aus der Evaluation abgeschlossen