Search CORE

27 research outputs found

Recommended from our members

Using linguistic data for English and Spanish verb-noun combination identification

Author: Aduriz Itziar
Carroll John
Díaz de Ilarraza Arantza
Iñurrieta Uxoa
Labaka Gorka
Sarasola Kepa
Publication venue: International Committee on Computational Linguistics (ICCL)
Publication date: 13/12/2016
Field of study

We present a linguistic analysis of a set of English and Spanish verb+noun combinations (VNCs), and a method to use this information to improve VNC identification. Firstly, a sample of frequent VNCs are analysed in-depth and tagged along lexico-semantic and morphosyntactic dimensions, obtaining satisfactory inter-annotator agreement scores. Then, a VNC identification experiment is undertaken, where the analysed linguistic data is combined with chunking information and syntactic dependencies. A comparison between the results of the experiment and the results obtained by a basic detection method shows that VNC identification can be greatly improved by using linguistic information, as a large number of additional occurrences are detected with high precision

Sussex Research Online

Edition 1.2 of the PARSEME Shared Task on Semi-supervised Identification of Verbal Multiword Expressions

Author: Bhatia Archna
Candito Marie
Giouli Voula
Guillaume Bruno
Güngör Tunga
Iñurrieta Uxoa
Lichte Timm
Liebeskind Chaya
Mititelu Verginica,
Monti Johanna
Polyu Menghan,
Ramisch Carlos
Ramisch Renata
Savary Agata
Stymne Sara
Vaidya Ashwini
Walsh Abigail
Waszczuk Jakub
Xu Hongzhi
Publication venue: HAL CCSD
Publication date: 01/01/2020
Field of study

International audienceWe present edition 1.2 of the PARSEME shared task on identification of verbal multiword expressions (VMWEs). Lessons learned from previous editions indicate that VMWEs have low ambiguity, and that the major challenge lies in identifying test instances never seen in the training data. Therefore, this edition focuses on unseen VMWEs. We have split annotated corpora so that the test corpora contain around 300 unseen VMWEs, and we provide non-annotated raw corpora to be used by complementary discovery methods. We released annotated and raw corpora in 14 languages, and this semi-supervised challenge attracted 7 teams who submitted 9 system results. This paper describes the effort of corpus creation, the task design, and the results obtained by the participating systems, especially their performance on unseen expressions

INRIA a CCSD electronic archive server

PARSEME corpus release 1.3

We present version 1.3 of the PARSEME multilingual corpus annotated with verbal multiword expressions. Since the previous version, new languages have joined the undertaking of creating such a resource, some of the already existing corpora have been enriched with new annotated texts, while others have been enhanced in various ways. The PARSEME multilingual corpus represents 26 languages now. All monolingual corpora therein use Universal Dependencies v.2 tagset. They are (re-)split observing the PARSEME v.1.2 standard, which puts impact on unseen VMWEs. With the current iteration, the corpus release process has been detached from shared tasks; instead, a process for continuous improvement and systematic releases has been introduced

Utrecht University Repository

Creación y Simulación de Metodologías de Análisis, Clasificación e Integración de Nuevos Requerimientos a Software Propietario

Author: Aduriz Itziar
Antoine Jean-Yves
Barbu Mititelu Verginica
Berk Gozde
Bhatia Archna
Candito Marie
Carlino Carola
Caruso Valeria
Chen Jia
Constant Matthieu
Cordeiro Silvio Ricardo
de Medeiros Caseli Helena
Di Buono Maria Pia
Ehren Rafael
Elyovitch Hevi
Erden Berna
Estarrona Ainara
Foster Jennifer
Fotopoulou Aggeliki
Foufi Vassiliki
Ge Xiaomin
Giouli Voula
Gonzalez Itziar
Guillaume Bruno
Gurrutxaga Antton
Güngör Tunga
Ha-Cohen Kerner Yaakov
Hu Fangyuan
Hu Sha
Ionescu Mihaela
Iñurrieta Uxoa
Jain Kanishka
Jiang Menghan
Li Minli
Lichte Timm
Liebeskind Chaya
Liu Siyuan
Louizou Sevasti
Lynn Teresa
Malka Ruth
Markantonatou Stella
Miranda Isaac
Monti Johanna
Onofrei Mihaela
Palka-Binkiewicz Emilia
Papadelli Stella
Parmentier Yannick
Pascucci Antonio
Pasquer Caroline
Puri Vandana
Qin Zhenzhen
Rademaker Alexandre
Raffone Annalisa
Ramisch Carlos
Ramisch Renata
Ramisch Renata
Ratori Shraddha
Riccio Anna
Rizea Monica-Mihaela
Sangati Federico
Savary Agata
Shukla Vishakha
Speranza Giulia
Srivastava Shubham
Stymme Sara
Stymne Sara
Sun Ruilong
Uria Larraitz
Urizar Ruben
Vaidya Ashwini
Vale Oto
Villavicencio Aline
Walsh Abigail
Wang Chenweng
Waszczuk Jakub
Wick Pedro Gabriela
Wilkens Rodrigo
Xiao Huangyang
Xu Hongzhi
Yan Peiyi
Yih Tsy
Yirmibeşoğlu Zeynep
Yu Ke
Yu Songping
Zeng Si
Zhang Yongchen
Zhao Yun
Zilio Leonardo
Publication venue
Publication date: 15/06/2016
Field of study

La priorización de nuevos requerimientos a implementar en un software propietario es un punto fundamental para su mantenimiento, la conservación de la calidad, observación de las reglas de negocio y los estándares de la empresa. Aunque existen herramientas de priorización basadas en técnicas probadas y reconocidas, las mismas requieren una calificación previa de cada requerimiento. Si la empresa cuenta con solicitudes provenientes de varios clientes de un mismo producto, aumentan los factores que afectan a la empresa, las herramientas disponibles no contemplan estos aspectos y hacen mucho más compleja la tarea de calificación. Este trabajo de investigación abarca la realización de un relevamiento de los métodos de priorización y selección de nuevos requerimientos utilizados por empresas de la zona de Rosario, y la definición de una metodología para la selección un nuevo requerimiento, que implica el análisis y evaluación de todas las implicaciones sobre el producto de software y la empresa, respetando sus reglas de negocio. La metodología creada conduce a la definición de los procesos para la construcción de una herramienta de calificación y priorización de nuevos requerimientos en software propietario que tiene solicitudes de varios clientes al mismo tiempo, con instrumentos de calificación que consideran todos los aspectos relacionados, proveerá técnicas de priorización actuales y emitirá informes personalizados según diferentes perspectivas de la empresa.Eje: Ingeniería de SoftwareRed de Universidades con Carreras en Informática (RedUNCI

LINDAT/CLARIN digital library at the Institute of Formal and Applied Linguistics (ÚFAL), Faculty of Mathematics and Physics, Charles University

Identificación y traducción de Expresiones Multipalabra de tipo verbo+sustantivo: análisis de castellano-euskera

Author: Iñurrieta Uxoa
Publication venue: Sociedad Española para el Procesamiento del Lenguaje Natural
Publication date: 01/01/2020
Field of study

This is a summary of the PhD thesis written by Uxoa Iñurrieta under the supervision of Dr. Gorka Labaka and Dr. Itziar Aduriz. Full title of the PhD thesis in Basque: Izena+aditza Unitate Fraseologikoak gaztelaniatik euskarara: azterketa eta tratamendu konputazionala. The defense was held in San Sebastian on November 29, 2019. The doctoral committee was integrated by Ricardo Etxepare (Centre National de la Recherche Scientifique), Margarita Alonso (Universidad de Coruña) and Miren Azkarate (University of the Basque Country).Este es un resumen de la tesis doctoral escrita por Uxoa Iñurrieta bajo la supervisión del Dr. Gorka Labaka y la Dra. Itziar Aduriz. Título completo de la tesis en euskera: Izena+aditza Unitate Fraseologikoak gaztelaniatik euskarara: azterketa eta tratamendu konputazionala. La defensa de la tesis se celebró en Donostia-San Sebastián el 29 de Noviembre de 2019, ante el tribunal formado por Ricardo Etxepare (Centre National de la Recherche Scientifique), Margarita Alonso (Universidad de Coruña) y Miren Azkarate (UPV/EHU).The Spanish Ministry of Economy and Competitiveness, who awarded Uxoa Iñurrieta a predoctoral fellowship (BES-2013-066372) to conduct research within the SKATeR project (TIN2012-38584-C06-02)

Repositorio Institucional de la Universidad de Alicante

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

General and specialised corpora to raise linguistic awareness in a language undergoing the normalisation process: academic writing in Basque [Innovation and digital technologies in Languages for Specific Purposes]

Author: Gonzalez Dios Itziar
Iñurrieta Uxoa
Zabala Igone
Publication venue: Universitat Politècnica de Catalunya
Publication date: 01/07/2021
Field of study

Academic writing is challenging for many university students in any language, but it is especially difficult for students whose instruction language is still on its way to normalisation and has an unstable academic discourse, such as Basque. This paper explains how corpora can be exploited to raise these students' linguistic awareness. To that end, learning objectives are defined, corpora-based exercises are designed, and the difficulties that students overcome are observed. The focus of this paper are students of scientific and technological degrees in the courses on Basque for Academic Purposes, where they are taught how to solve lexical, grammatical, stylistic and register-related doubts. The final aim of the course is that these students become aware of the functional development of Basque, so that they contribute to it in their professional careers

UPCommons. Portal del coneixement obert de la UPC

Literal Occurrences of Multiword Expressions: Rare Birds That Cause a Stir

Author: Cordeiro Silvio
Giouli Voula
Iñurrieta Uxoa
Lichte Timm
Ramisch Carlos
Savary Agata
Publication venue: 'Walter de Gruyter GmbH'
Publication date: 09/04/2019
Field of study

International audienceMultiword expressions can have both idiomatic and literal occurrences. For instance pulling strings can be understood either as making use of one's influence, or literally. Distinguishing these two cases has been addressed in linguistics and psycholinguistics studies, and is also considered one of the major challenges in MWE processing. We suggest that literal occurrences should be considered in both semantic and syntactic terms, which motivates their study in a treebank. We propose heuristics to automatically pre-identify candidate sentences that might contain literal occurrences of verbal VMWEs, and we apply them to existing treebanks in five typologically different languages: Basque, German, Greek, Polish and Portuguese. We also perform a linguistic study of the literal occurrences extracted by the different heuristics. The results suggest that literal occurrences constitute a rare phenomenon. We also identify some properties that may distinguish them from their idiomatic counterparts. This article is a largely extended version of Savary and Cordeiro (2018)

HAL AMU

HAL Descartes

HAL Université de Tours

Hal-Diderot

Multilingual corpus of literal occurrences of multiword expressions

Author: Cordeiro Silvio Ricardo
Giouli Voula
Iñurrieta Uxoa
Lichte Timm
Ramisch Carlos
Savary Agata
Publication venue: PARSEME
Publication date: 01/04/2019
Field of study

The corpus contains sentences with idiomatic, literal and coincidental occurrences of verbal multiword expressions (VMWEs) in Basque, German, Greek, Polish and Portuguese. The source corpus is the PARSEME multilingual corpus of VMWEs v 1.1 (cf. http://hdl.handle.net/11372/LRT-2842). The sentences with VMWEs were extracted from the source corpus and potential co-occurrences of the same lexemes were automatically extracted from the same corpus. These candidates were then manually annotated by native experts into 6 classes, including literal and coincidental occurrences, as well as various annotation errors. The construction of the corpus is described by the following publication: Agata Savary, Silvio Ricardo Cordeiro, Timm Lichte, Carlos Ramisch, Uxoa Iñurrieta, Voula Giouli (forthcoming) "Literal occurrences of multiword expressions: Rare birds that cause a stir", to appear in Prague Bulletin of Mathematical Linguistics

LINDAT/CLARIN digital library at the Institute of Formal and Applied Linguistics (ÚFAL), Faculty of Mathematics and Physics, Charles University

Izen+aditz konbinazioen azterketa elebiduna, hizkuntza-aplikazio aurreratuei begira

Author: Arantza Díaz de Ilarraza
Gorka Labaka
Itziar Aduriz
Kepa Sarasola
Uxoa Iñurrieta Urmeneta
Publication venue: Universidade do Minho & Universidade de Vigo
Publication date: 01/12/2014
Field of study

Hiztegi elebidunak oinarritzat hartuta, euskarazko eta gaztelaniazko izen+aditz konbinazioak izan ditugu aztergai lan honetan. Konbinazioen eta euren ordainen ezaugarri morfosintaktiko zein semantikoei begiratu diegu, eta bi hizkuntzak parez pare jarri ditugu, zer alde eta antzekotasun duten aztertzeko. Artikulu honek agerian uzten du zeinen konplexuak diren era horretako egiturak eta, ondorioz, zeinen garrantzitsua den Hizkuntzaren Prozesamenduko aplikazioetan tratamendu egoki bat ematea, itzulpen automatikoan adibidez. Horrez gain, azterketatik lortutako emaitza guztiak interfaze publiko batean jarri ditugu, edonork bilaketak egin ahal izan ditzan guk landutako konbinazioen gainean

Directory of Open Access Journals