Search CORE

33 research outputs found

EusDisParser: improving an under-resourced discourse parser with cross-lingual data

Author: Braud Chloé
Iruskieta Mikel
Publication venue: 'Association for Computational Linguistics (ACL)'
Publication date: 01/01/2019
Field of study

International audienceDevelopment of discourse parsers to annotate the relational discourse structure of a text is crucial for many downstream tasks. However, most of the existing work focuses on English, assuming a quite large dataset. Discourse data have been annotated for Basque, but training a system on these data is challenging since the corpus is very small. In this paper, we create the first parser based on RST for Basque, and we investigate the use of data in another language to improve the performance of a Basque discourse parser. More precisely, we build a monolingual system using the small set of data available and investigate the use of multilingual word embeddings to train a system for Basque using data annotated for another language. We found that our approach to building a system limited to the small set of data available for Basque allowed us to get an improvement over previous approaches making use of many data annotated in other languages. At best, we get 34.78 in F1 for the full discourse structure. More data annotation is necessary in order to improve the results obtained with these techniques. We also describe which relations match with the gold standard, in order to understand these results

Crossref

INRIA a CCSD electronic archive server

EusEduSeg: Un Segmentador Discursivo para el Euskera Basado en Dependencias

Author: Iruskieta Quintian Mikel
Zapirain Benat
Publication venue: Sociedad Española para el Procesamiento del Lenguaje Natural
Publication date: 01/01/2015
Field of study

We present the first discursive segmenter for Basque implemented by heuristics based on syntactic dependencies and linguistic rules. Preliminary experiments show F1 values of more than 85% in automatic EDU segmentation for Basque.Presentamos en este artículo el primer segmentador discursivo para el euskera (EusEduSeg) implementado con heurísticas basadas en dependencias sintácticas y reglas lingüísticas. Experimentos preliminares muestran resultados de más del 85 % F1 en el etiquetado de EDUs sobre el Basque RST TreeBank

Repositorio Institucional de la Universidad de Alicante

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

Euskararen i(ra)kaskuntza-prozesuak: hezkuntza eta hizkuntza teknologiak

Author: Camacho Abel
Iruskieta Quintian Mikel
Publication venue: 'UPV/EHU Press'
Publication date: 01/01/2020
Field of study

Ingurune digitalean bizi arren, bi arazo ezberdin izan ditzakegu: bat, teknologiaren eraldaketa azkarra edota egokitu gabekoa izatea eta, bi, euskara eta antzeko baliabide mugatuetako hizkuntzetan behar diren teknologiak ez sortzea edota erabilgarri ez egotea. Artikulu honetan, euskara ikasteko, irakasteko eta ikertzeko prozesuetan teknologiak testuinguru multimodal eta aberats honetan zer-nolako ekarpenak egin ditzakeen aztertu nahi dugu. Horretarako, teknologia-motak kontuan hartuta, ikasleek nahiz irakasleek dituzten erronkak eta aukerak deskribatzen saiatuko gara, horiei erantzuteko hainbat aplikazioren azalpen eta adibide emanez. Horrez gain, erronka horietan ditugun gabeziei erantzuteko, norberak, komunitateak eta hizkuntzaren garapenerako azpiegiturek helduen euskararen irakaskuntzan izan dezakeen garrantzia nabarmenduko dugu.; Although digital technology is present in our life, we can suffer two different problems coming from opposite poles: the rapid or inadequate transformation of technology in language teaching and the scarce development of language technologies in under-resources languages communities like Basque.In this article, we want to explore how technology can contribute to the learning, teaching and research processes of Basque in a multimodal teaching approach. To this end, we describe those challenges that students and teachers have from the technological point of view, explaining and giving examples of technology that can help in the learning process of Basque language. Furthermore, we stress the importance of addressing personal deficiencies, as well as deficiencies of the learning community and infrastructures that works for the development of language technology in the teaching of Basque for adults

Archivo Digital para la Docencia y la Investigación

Un detector de la unidad central para textos en castellano

Author: Bengoetxea Kortazar Kepa
Iruskieta Quintian Mikel
Publication venue: Sociedad Española para el Procesamiento del Lenguaje Natural
Publication date: 01/01/2018
Field of study

En este artículo presentamos el primer detector de la Unidad Central (CU) de resúmenes científicos en castellano basado en técnicas de aprendizaje automático. Para ello, nos hemos basado en la anotación del Spanish RST Treebank anotado bajo la Teoría de la Estructura Retórica o Rhetorical Structure Theory (RST). El método empleado para detectar la unidad central es el modelo de bolsa de palabras utilizando clasificadores como Naive Bayes y SVM. Finalmente, evaluamos el rendimiento de los clasificadores y hemos creado el detector de CUs usando el mejor clasificador.In this paper we present the first automatic detector of the Central Unit (CU) for Spanish scientific abstracts based on machine learning techniques. To do so, learning and evaluation data was extracted from the RST Spanish Treebank annotated under the Rhetorical Structure Theory (RST). We use a bag-of-words model based on Naive Bayes and SVM classifiers to detect the central units of a text. Finally, we evaluate the performance of the classifiers and choose the best to create an automatic CU detector

Repositorio Institucional de la Universidad de Alicante

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

El potencial de las relaciones retóricas para la discriminación de textos especializados de diferentes dominios en euskera y español

Author: da Cunha Iria
Iruskieta Mikel
Publication venue: 'UNISINOS - Universidade do Vale do Rio Dos Sinos'
Publication date: 03/12/2010
Field of study

This study presents our research on the potential of using rhetorical relations and superfi cial marks evidencing them to discriminate among specialized texts of different domains but with a high specialization level, in two very different languages as Basque and Spanish. For our analysis, we employ of the Rhetorical Structure Theory (RST). We compiled a parallel corpus of Spanish-Basque specialized texts that contains two subcorpora of medical and terminological texts. We marked these texts with RST rhetorical relations and we detectedthe discourse markers that evidence them. Finally, we noted that certain types of rhetorical relations and the amount of used discourse markers allow us to differentiate among specialized texts of different domains in both Spanish and Basque.Key words: Rhetorical Structure Theory, rhetorical relations, discourse markers, annotation, specialized text, contrastive study, Spanish, Basque.En este trabajo presentamos un estudio realizado con el fin de averiguar si las relaciones retóricas y las marcas superficiales que las evidencian tienen potencial para distinguir entre textos especializados de diferentes ámbitos que comparten un nivel de especialización alto, en dos lenguas tan diferentes como el euskera y el español. Para el análisis, hemos partido de la Rhetorical Structure Theory (RST). Hemos conformado un corpus paralelo de textos especializados español-euskera que contienedos subcorpus, que incluyen textos del ámbito médico y del ámbito terminológico. Hemos anotado los textos con las relaciones retóricas de la RST y hemos detectado los marcadores del discurso que las evidencian. Finalmente, hemos observado que ciertas relaciones retóricas y la cantidad de marcadores del discurso empleados permiten discriminar un subcorpus de otro, tanto en euskera como en español.Palavras-chave: Rhetorical Structure Theory, relaciones retóricas, marcadores del discurso, anotación, texto especializado, estudio contrastivo, español, eusker

Unisinos (Universidade do Vale do Rio dos Sinos): SEER Unisinos

Detección de la unidad central en dos géneros y lenguajes diferentes: un estudio preliminar en portugués brasileño y euskera

Author: Antonio Juliano Desiderato
Iruskieta Quintian Mikel
Labaka Intxauspe Gorka
Publication venue: Sociedad Española para el Procesamiento del Lenguaje Natural
Publication date: 01/01/2016
Field of study

The aim of this paper is to present the development of a rule-based automatic detector which determines the main idea or the most pertinent discourse unit in two different languages such as Basque and Brazilian Portuguese and in two distinct genres such as scientific abstracts and argumentative answers. The central unit (CU) may be of interest to understand texts regarding relational discourse structure and it can be applied to Natural Language Processing (NLP) tasks such as automatic summarization, question-answer systems or sentiment analysis. In the case of argumentative answer genre, the identification of CU is an essential step for an eventual implementation of an automatic evaluator for this genre. The theoretical background which underlies the paper is Mann and Thompson’s (1988) Rhetorical Structure Theory (RST), following discourse segmentation and CU annotation. Results show that the CUs in different languages and in different genres are detected automatically with similar results, although there is space for improvement.El objetivo de este trabajo es presentar las mejoras de un detector automático basado en reglas que determina la idea principal o unidad discursiva más pertinente de dos lenguas tan diferentes como el euskera y el portugués de Brasil y en dos géneros muy distintos como son los resúmenes de los artículos científicos y las respuestas argumentativas. La unidad central (CU, por sus siglas en inglés) puede ser de interés para entender los textos partiendo de la estructura discursiva relacional y poderlo aplicar en tareas de Procesamiento del Lenguaje Natural (PLN) tales como resumen automático, sistemas de pregunta-respuesta o análisis de sentimiento. En los textos de respuesta argumentativa, identificar la CU es un paso esencial para un evaluador automático de considere la estructura discursiva de dichos textos. El marco teórico en el que hemos desarrollado el trabajo es la Rhetorical Structure Theory (RST) de Mann y Thompson (1988), que parte de la segmentación discursiva y finaliza con la anotación de la unidad central. Los resultados demuestran que las unidades centrales en diferentes lenguas y géneros son detectadas con similares resultados automáticamente, aunque todavía hay espacio para mejora

Repositorio Institucional de la Universidad de Alicante

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

Ipuin-moldaketa herri-hizkerara egokitzeko eta modu esanguratsuan kontatzeko markaketa: ahozko komunikazioa lantzen eta aztertzen Haur Hezkuntzako gelan

Author: Beaskoetxea Udane
Iruskieta Quintian Mikel
Publication venue: 'UPV/EHU Press'
Publication date: 01/01/2019
Field of study

In this work, we designed an intervention and investigation to analyze the direct effects in the development of children´s linguistic and communicative skills when we read aloud the story called Martin Txiki eta Basajaunak. In particular, we modified the story to the occidental dialect and read it aloud in one 4 years-old class of pre-school in the college Legarda of Mungia, in order to analyze the effects in children´s knowledge and use of the dialect. The methodology used in the intervention was based on constructivism. Therefore, we conducted previous activities which would help us consider children’s knowledge about this dialect. Afterwards, the story was read aloud both in batua (unified Basque) and in the occidental dialect. Finally, subsequent activities were carried out in order to weigh the consequences of using dialect in children`s oral communicative skills. In general, children´s have developed communication skills in the occidental dialect, even if the results changed depending on their mother language.; Lan honetan Martin Txiki eta Basajaunak ipuinaren irakurketa ozen esanguratsuak talde zehatz bateko haurren hizkuntza- eta komunikazio-gaitasunean izan duen eragin zuzena aztertu da, beren beregi diseinatutako esku-hartze eta ikerketa baten bidez. Zehatzago, ipuina bizkaierara moldatu da eta Mungiako Legarda HLHI ikastetxeko Haur Hezkuntzako 4 urteko gela batean irakurri da, horrek haurrek duten euskalkiaren ezagutzan eta egiten duten erabileran duen eragina aztertzeko. Eskuhartzea egiteko lan moldea konstruktibismoan oinarritu da. Beraz, haurrek euskalkiaren zein jabekuntza duten zehazteko aurretiazko jarduerak egin dira. Ondoren, ipuin horren irakurketa esanguratsua burutu da euskara batuan eta mendebaldeko euskalkian, horren osteko jardueren bidez, euskalkian irakurtzeak haur ezberdinen ahozko komunikazio-gaitasunean izan duen eragina neurtzeko. Orokorrean ondorioztatu da euskalkian egindako irakurketak gaitasun hori osotu duela, nahiz eta euren ama-hizkuntzaren arabera emaitza ezberdinak eman diren

Archivo Digital para la Docencia y la Investigación

Universidad del País Vasco / Euskal Herriko Unibertsitatea: Ciencia - Portal de revistas digitales de la UPV/EHU

A Machine Learning based Central Unit Detector for Basque Scientific Texts

Author: Atutxa Salazar Aitziber
Bengoetxea Kortazar Kepa
Iruskieta Quintian Mikel
Publication venue: Sociedad Española para el Procesamiento del Lenguaje Natural
Publication date: 01/01/2017
Field of study

En este artículo presentamos el primer detector de la Unidad Central (UC) de resúmenes científicos en euskera basado en técnicas de aprendizaje automático. Después de segmentar el texto en unidades de discurso elementales, la detección de la unidad central es crucial para anotar de forma más fiable la estructura relacional de textos bajo la Teoría de la Estructura Retórica o Rhetorical Structure Theory (RST). Además, la unidad central puede ser explotada en diversas tareas como resumen automático, tareas de pregunta y respuesta o análisis del sentimiento. Los resultados obtenidos demuestran que las técnicas de aprendizaje automático superan a las técnicas basadas en reglas a pesar del pequeño tamaño del corpus y de la heterogeneidad de los dominios que éste muestra, dejando todavía lugar para mejoras y desarrollo.This paper presents an automatic detector of the discourse central unit (CU) in scientific abstracts based on machine learning techniques. After segmenting a text in its elementary discourse units, the detection of the central unit is a crucial step on the way to robustly build discourse trees under the Rhetorical Structure Theory (RST). Besides, CU detection may also be useful in automatic summarization, question answering and sentiment analysis tasks. Results show that the CU detection using machine learning techniques for Basque scientific abstracts outperform rule based techniques, even on a small size corpus on different domains. This leads us to think that there is still room for improvement.Este trabajo ha sido financiado en parte por el siguiente proyecto: TIN2015-65308-C5-1-R (MINECO/FEDER)

Repositorio Institucional de la Universidad de Alicante

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

Potential of rhetorical relations for differentiation among specialized texts from different domains in Basque and Spanish

Author: Iria da Cunha
Mikel Iruskieta
Publication venue: 'UNISINOS - Universidade do Vale do Rio Dos Sinos'
Publication date: 01/12/2010
Field of study

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

Unisinos (Universidade do Vale do Rio dos Sinos): SEER Unisinos

Directory of Open Access Journals