Search CORE

198 research outputs found

Pronominal anaphora in Basque: annotation of a real corpus

Author: Aduriz Itziar
Ceberio Klara
Díaz de Ilarraza Sánchez Arantza
Publication venue: Sociedad Española para el Procesamiento del Lenguaje Natural (SEPLN)
Publication date: 25/02/2019
Field of study

This paper describes the process followed in the annotation of pronominal anaphora in the Eus3LB corpus1 of Basque. Our aim is to use this annotation as the basis for later computational treatment of our language. We present the linguistic analysis carried out, the criteria defined for the tagging and some relevant linguistic conclusions about the features of the antecedents needed to link them correctly to their anaphoric elements

Diposit Digital de la Universitat de Barcelona

Strategies to develop Language Technologies for Less-Resourced Languages based on the case of Basque

Author: Alegria Iñaki,
Artola Xabier,
Díaz De Ilarraza Arantza
Sarasola Kepa
Publication venue: HAL CCSD
Publication date: 25/11/2011
Field of study

IXA group has developed during 23 years a basic set of resources, tools and applications for Basque following to an initial strategy which has been adapted according to technological changes. We think that our strategy and experience can be a reference for other less resourced languages. According to a six level classification of world languages, we estimate that this strategy may be useful for several hundred languages, those that have developed a written standard but that still are beginners in Human Language Technology

ArtXiker - @HAL

Teknologia garatzeko estrategiak baliabide urriko hizkuntzetarako: euskararen eta Ixa taldearen adibidea

Author: Aduriz Itziar,
Alegria Iñaki,
Artola Xabier,
Díaz De Ilarraza Arantza
Sarasola Kepa
Publication venue: HAL CCSD
Publication date: 01/06/2011
Field of study

El artículo comienza presentando varios datos que muestran la situación de la lengua vasca, y a continuación proponiendo una clasificación para las lenguas del mundo según sea su presencia en Internet y en la tecnología de la lengua. El cuerpo del artículo presenta el trabajo hecho por el grupo Ixa en el campo del procesamiento automático del euskara, identificando sus siete hitos principales y describiendo la estrategia que ha guiado este desarrollo. Se plantea que esta estrategia puede servir como referencia para 190 lenguas que según la lasificación propuesta no poseen recursos de tecnología de la lengua pero si poseen una mínima presencia significativa en Internet.Euskararen egoeraren inguruan hainbat datu ematen dira labur-labur, eta horrekin batera munduko hizkuntzak sailkatzeko proposamen bat aurkezten da Interneten eta hizkuntz teknologian duten egoeren araberakoa. Euskararen prozesaketa automatikoan Ixa taldeak izan duen bilakaeraren nondik norakoak zehazten dira gero, hainbat mugarri azpimarratuz eta ibilbide hori jarraitzeko erabili den estrategia deskribatuz. Munduko 190 hizkuntzentzat erreferentzia izan daiteke estrategia hori, hain zuen, Interneten presentzia minimo eduki bai baina oraindik hizkuntza-teknologia mota hau landu ez duten hizkuntzentzat

ArtXiker - @HAL

The Corpus of Basque Simplified Texts (CBST)

Author: Aranzabe Urruzola María Jesús
Díaz de Ilarraza Sánchez Arantza
González Dios Itziar
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/03/2018
Field of study

In this paper we present the corpus of Basque simplified texts. This corpus compiles 227 original sentences of science popularisation domain and two simplified versions of each sentence. The simplified versions have been created following different approaches: the structural, by a court translator who considers easy-to-read guidelines and the intuitive, by a teacher based on her experience. The aim of this corpus is to make a comparative analysis of simplified text. To that end, we also present the annotation scheme we have created to annotate the corpus. The annotation scheme is divided into eight macro-operations: delete, merge, split, transformation, insert, reordering, no operation and other. These macro-operations can be classified into different operations. We also relate our work and results to other languages. This corpus will be used to corroborate the decisions taken and to improve the design of the automatic text simplification system for Basque.Cerrar texto de financiación Itziar Gonzalez-Dios's work was funded by a Ph.D. grant from the Basque Government and a postdoctoral grant for the new doctors from the Vice-rectory of Research of the University of the Basque Country (UPV/EHU). We are very grateful to the translator and teacher that simplified the texts. We also want to thank Dominique Brunato, Felice Dell'Orletta and Giulia Venturi for their help with the Italian annotation scheme and their suggestions when analysing the corpus and Oier Lopez de Lacalle for his help with the statistical analysis. We also want to express our gratitude to the anonymous reviewers for their comments and suggestions. This research was supported by the Basque Government (IT344-10), and the Spanish Ministry of Economy and Competitiveness, EXTRECM Project (TIN2013-46616-C2-1-R)

Archivo Digital para la Docencia y la Investigación

EXTracción de RElaciones entre Conceptos Médicos en fuentes de información heterogéneas (EXTRECM)

Author: Araujo Serna Lourdes
Díaz de Ilarraza Sánchez Arantza
Gojenola Galletebeitia Koldo
Martínez Unanue Raquel
Publication venue: Sociedad Española para el Procesamiento del Lenguaje Natural
Publication date: 01/01/2015
Field of study

En este proyecto se plantea la extracción de relaciones entre conceptos médicos en documentos científicos, historiales médicos e información de carácter general en Internet, en varias lenguas utilizando técnicas y herramientas de Procesamiento de Lenguaje Natural y Recuperación de Información. El proyecto se propone demostrar, mediante dos casos de uso, los beneficios de la aplicación de este tipo de tecnologías lingüísticas al dominio de la salud.This project addresses extraction of medical concepts relationship in scientific documents, medical records and general information on the Internet, in several languages by using advanced Natural Language Processing and Information Retrieval techniques and tools. The project aims to show, through two use cases, the benefits of the application of language technology in the health sector.TIN2013-46616-C2-1-R, TIN2013-46616-C2-2-R

Repositorio Institucional de la Universidad de Alicante

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

Recommended from our members

Using linguistic data for English and Spanish verb-noun combination identification

Author: Aduriz Itziar
Carroll John
Díaz de Ilarraza Arantza
Iñurrieta Uxoa
Labaka Gorka
Sarasola Kepa
Publication venue: International Committee on Computational Linguistics (ICCL)
Publication date: 13/12/2016
Field of study

We present a linguistic analysis of a set of English and Spanish verb+noun combinations (VNCs), and a method to use this information to improve VNC identification. Firstly, a sample of frequent VNCs are analysed in-depth and tagged along lexico-semantic and morphosyntactic dimensions, obtaining satisfactory inter-annotator agreement scores. Then, a VNC identification experiment is undertaken, where the analysed linguistic data is combined with chunking information and syntactic dependencies. A comparison between the results of the experiment and the results obtained by a basic detection method shows that VNC identification can be greatly improved by using linguistic information, as a large number of additional occurrences are detected with high precision

Sussex Research Online

Proyecto de transferencia tecnológica Deteami: tecnologías de procesamiento del lenguaje natural para la ayuda en farmacia y en farmacovigilancia

Author: Casillas Rubio Arantza
Díaz de Ilarraza Sánchez Arantza
Gojenola Galletebeitia Koldo
Mendarte Luis
Oronoz Anchordoqui Maite
Peral Javier
Pérez Ramírez Alicia
Publication venue: Sociedad Española para el Procesamiento del Lenguaje Natural
Publication date: 01/01/2016
Field of study

The goal of the Deteami project is to develop tools that make clinicians aware of adverse drug reactions stated in electronic health records of the clinical digital history. The records produced in hospitals are a valuable though nearly unexplored source of information among others due to the fact that are tough to get due to privacy and confidentiality restrictions. To leverage the clinicians work of reading and analyzing the health records looking for information about the health of the patients, in this project we explore the records automatically, identify among others disorder and drug entities, and infer medical information, in this case, adverse drug reactions. In this project a research-framework was settled with the Galdakao-Usansolo and Basurto Hospitals from Osakidetza (the Basque Health System). Osakidetza provided both the texts and the final user feedback, as well as, specialists that annotate the corpora, an in this way, we obtained a gold-standard.El objetivo del proyecto Deteami es el desarrollo de herramientas para ayudar al personal clínico a identificar reacciones adversas a medicamentos en informes médicos electrónicos de la historia clínica digital. Los informes que se generan en los hospitales son una valiosa fuente de información aún no debidamente explotada debido principalmente a restricciones de privacidad y confidencialidad. Con el objetivo de aliviar el trabajo del personal clínico que se dedica a leer y analizar los informes médicos buscando información sobre la salud de los pacientes, en este proyecto analizamos automáticamente los informes, identificamos entre otras entidades que describen enfermedades y medicamentos, y finalmente, inferimos información médica; en este caso, reacciones adversas a medicamentos. En este proyecto hemos establecido un marco de colaboración con los hospitales de Galdakao-Usansolo y Basurto pertenecientes a Osakidetza (Servicio Vasco de Salud). Osakidetza participa mediante la provisión de los textos y retroalimentando el trabajo técnico con su experiencia, así como expertos que anotan el corpus para la obtención del gold-standard.This work was partially supported by the Spanish Ministry of Science and Innovation (EXTRECM: TIN2013-46616-C2-1-R, TADEEP: TIN2015-70214-P) and the Basque Government (DETEAMI: Ministry of Health 2014111003, IXA Research Group of type A (2010-2015), ELKAROLA: KK-2015/00098)

Repositorio Institucional de la Universidad de Alicante

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

A Cascaded Syntactic Analyser for Basque

Author: Aduriz Itziar,
Aranzabe Maxux,
Arriola Jose Maria,
Díaz De Ilarraza Arantza
Gojenola Koldo,
Oronoz Maite,
Publication venue: 'Springer Fachmedien Wiesbaden GmbH'
Publication date: 01/01/2004
Field of study

This article presents a robust syntactic analyser for Basque and the different modules it contains. Each module is structured in different analysis layers for which each layer takes the information provided by the previous layer as its input; thus creating a gradually deeper syntactic analysis in cascade. This analysis is carried out using the Constraint Grammar (CG) formalism. Moreover, the article describes the standardisation process of the parsing formats using XML

Pronominal Anaphora in Basque: computational point of view and the development of a corpus

Author: Aduriz Itziar
Ceberio Klara
Díaz de Ilarraza Sánchez Arantza
Publication venue: Universidad del País Vasco / Euskal Herriko Unibersitatea
Publication date: 11/11/2019
Field of study

This paper describes the process of annotating pronominal anaphor in a corpus of Basque which consists of 54.000 words. Our aim is to use this annotation as a basis for later computational processing. The linguistic study carried out and the criteria defined for the tagging process are also presented in the pape

Diposit Digital de la Universitat de Barcelona