27 research outputs found

    EDBL: a General Lexical Basis for the Automatic Processing of Basque

    Get PDF
    EDBL (Euskararen Datu-Base Lexikala) is a general-purpose lexical database used in Basque text-processing tasks. It is a large repository of lexical knowledge (currently around 80,000 entries) that acts as basis and support in a number of different NLP tasks, thus providing lexical information for several language tools: morphological analysis, spell checking and correction, lemmatization and tagging, syntactic analysis, and so on. It has been designed to be neutral in relation to the different linguistic formalisms, and flexible and open enough to accept new types of information. A browser-based user interface makes the job of consulting the database, correcting and updating entries, adding new ones, etc. easy to the lexicographer. The paper presents the conceptual schema and the main features of the database, along with some problems encountered in its design and implementation in a commercial DBMS. Given the diversity of the lexical entities and the complex relationships existing among them, three total specializations have been defined under the main class of the hierarchy that represents the conceptual schema. The first one divides all the entries in EDBL into Basque standard and non-standard entries. The second divides the units in the database into dictionary entries (classified into the different parts-of-speech) and other entries (mainly non-independent morphemes and irregularly inflected forms). Finally, another total specialization has been established between single-word entries and multiword lexical units; this permits us to describe the morphotactics of single-word entries, and the constitution and surface realization schemas of multiword lexical units.A hierarchy of typed feature structures (FS) has been designed to map the entities and relationships in the database conceptual schema. The FSs are coded in TEI-conformant SGML, and Feature Structure Declarations (FSD) have been made for all the types of the hierarchy. Feature structures are used as a delivery format to export the lexical information from the database. The information coded in this way is subsequently used as input by the different language analysis tools

    Kausazko koherentzia-erlazioen azterketa automatikoa euskarazko laburpen zientifikoetan

    Get PDF
    Detecting automatically the cause relations of a text may be useful in question answering tasks and event information extraction. The aim of this paper is to study how to detect coherence relations of the cause subgroup (CAUSE, RESULT and PURPOSE). TO achieve this aim we have used the Rhetorical Structure Theory (RST) and some automatic linguistic information from different tools developed by IXA Group. We have used a corpus of 60 scientific abstracts, the Basque RST Treebank (Iruskieta et al., 2013), of different domains: science, medicine and terminology. A linguist has annotated all the signals of that corpus and described the most important problems in such task. To report the reliability of this annotator, two linguists have annotated the signals of the cause subgroup and all the annotations were compared and evaluated. After that, a superannotator has harmonized all the signals of those cause relations. Finally, we show the most important signals for such relations

    The Elaboration of human anatomy terminology for the Basque language : the contribution of translators, linguists and experts

    Get PDF
    En aquest article comparem la traducció d'un atles d'anatomia amb la revisió que es va encarregar a experts i lingüistes. L'objectiu és avaluar la mena de contribució que poden fer traductors, lingüistes i experts en l'elaboració de la terminologia de l'anatomia humana en basc. Analitzem les oracions que mostren discordances entre la traducció i la revisió respecte de les unitats lèxiques i les regles de formació usades. Hem observat que les correccions fetes pels experts i lingüistes tendeixen a substituir préstecs i calcs de regles de formació per unitats i estructures genuïnes. Arribem a la conclusió que les polítiques de planificació lingüística que pretenen proporcionar recursos terminològics propis en detriment de solucions dependents d'altres llengües no han estat assumides pels traductors per l'opacitat semàntica de la terminologia de l'anatomia i per la morfologia transparent del basc en comparació amb la del castellà.In this paper we compare the translation of an atlas of anatomy with the review that was carried out by experts in human anatomy and linguists. The goal is to evaluate the type of contribution that translators, linguists and experts can make in the elaboration of the terminology of human anatomy in Basque. We analyzed the sequences that showed discordances between translation and review with respect to the lexical units and the term formation patterns used. We found that the corrections made by experts and linguists show a clear tendency to replace lexical loanwords and calqued term formation rules by genuine elements and structures. We conclude that the aims of language planning policies of gradually providing the language with terminological resources that are less dependent on other languages have not been met by translators due to the semantic opacity of anatomical terminology and the transparent morphology of Basque compared with Spanish

    Emakumeen emozio-zurrunbiloa erditzean

    Get PDF
    Erditzea prozesu indibidual eta konplexua da, emakume bakoitzaren bizitzan mugarri dena. Erditze-gertaeran prozesu fisiologikoen eta psikologikoen arteko erlazioa gertatzen da, emakumearen, semearen edo alabaren ongizatean eta amaren eta bikotearen arteko harremanean eragina izan dezaketen gizarte-, ingurumen-, antolamendu- eta politika-testuinguruek eraginda. Gaur egungo osasun-paradigmen bidez sustatzen da erabiltzaileek eta osasun-profesionalek beren bizipenen alderdi positiboei eta negatiboei buruzko ikuspegia eta gogoetak adieraztea. Beraz, lan honetan aztertuko dira amek ospitaleko erditzean positibotzat edo negatibotzat hartzen dituzten esperientziak eta izandako emozioak eta sentimenduak. Emaitzak funtsezkoak dira erditzean emakumearengan zentratutako zaintza modu indibidualizatuan, pertsonalizatuan, holistikoan eta jarraituan bideratu ahal izateko, emakumearen balioak, aukerak, kultura eta emakumearen eta bikotekidearen nahiak errespetatuz.; Childbirth is an individual and complex process that represents a milestone in every woman's life. The birth episode involves a relationship between physiological and psychological processes influenced by social, environmental, organisational, and political contexts that may affect the well-being of the woman, the child and the relationship between the mother and the partner. Current health paradigms encourage users and health professionals to express their vision and reflections about the positive and negative aspects of their experiences. Therefore, this paper will analyse the experiences, emotions and feelings that mothers perceive as positive or negative in hospital birth. The results are essential for the delivery of woman-centred care in an individualized, personalized, holistic and continuous provision of woman-centred care in childbirth, respecting the values, opportunities, culture and wishes of the woman and the partner

    Discourse unit and rhetorical relations: a study about discourse units in the annotation of a corpus in Basque

    Get PDF
    En este artículo se describe el estudio realizado sobre las características del etiquetado de la estructura de discurso, según la Teoría de la Estructura Retórica, en los niveles inter-oracional e intra-oracional. El corpus etiquetado está compuesto por textos médicos escritos en euskera y extraídos de la Gaceta Médica de Bilbao siendo nuestro objetivo final establecer una metodología general para la anotación de corpus a nivel discursivo. En este trabajo se analizan los acuerdos y desacuerdos de la anotación realizada por dos anotadores en cada nivel. Los resultados obtenidos sugieren que la segmentación en unidades de discurso es más compleja en el nivel intra-oracional mientras que la asignación de relaciones retóricas lo es en el nivel inter-oracional. Además hemos detectado que hay relaciones que aparecen con mayor frecuencia en cada nivel y otras se dan indistintamente en ambos niveles inter- e intra-oracional. Este estudio sienta las bases para el futuro desarrollo de un anotador automático de relaciones.This article describes the study on the features used for labelling the discourse structure, according to the Rhetorical Structure Theory, at the inter-sentential and intra-sentential levels. The tagged corpus is composed of medical texts written in Basque and extracted from the medical journal 'Gaceta Médica de Bilbao'. The difficulties encountered both while identifying the discourse units and while establishing the relations are analysed at each level based on the observation of agreement and disagreement identified in the texts annotated by two annotators. The results obtained suggest that the segmentation into units of discourse is more complex at the intra-sentential level while the assignment of rhetorical relations is more difficult at the inter-sentential level. We also note that some relations occur more frequently at the intra-sentential level and others at the inter-sentential level. However, there are relations that can appear indistinctively in both levels intra- and inter-sentential. This study will lay the foundations to carry out the automatic annotation process that the authors intend to perform shortly.Este trabajo ha sido realizado en el marco de los siguientes proyectos: Grupo IXA, Grupo consolidado 2007-2012 (IT-397-07) [Gobierno Vasco]; KNOW2 (TIN2009-14715-C04-01) [MICCIN], Hibrido Sint (TIN2010-20218) [MICCIN], y GARATERM2 (US10/01) [Gobierno Vasco]

    Extracción de relaciones léxico-semánticas a partir de palabras derivadas usando patrones de definición

    No full text
    Este trabajo se encuadra dentro de un proyecto de extracción de relaciones léxico-semánticas a partir del análisis de un diccionario monolingüe, y se centra en las relaciones existentes entre las palabras derivadas y sus raíces. En el estudio se incluyen tres sufijos con una sola interpretación y otros tres sufijos con más de una interpretación. La principal aportación de este artículo es un método para desambiguar las interpretaciones de los sufijos a partir de la definición que se le da en el diccionario a la palabra derivada. En otras palabras, cada interpretación de un sufijo estará unida a uno o varios patrones definitorios. El método consigue una alta precisión al escoger la interpretación correcta de la relación. Con el objetivo de integrar la información extraída en EuroWordNet, hemos explorado un nuevo método para la desambiguación automática de los sentidos tanto de la raíz como de la palabra derivada con buenos resultados. Por el contrario, la representación de las relaciones extraídas en EuroWordNet es todavía un tema abierto.Este trabajo se ha realizado gracias a una beca predoctoral otorgada por el Gobierno Vasco (BFI98.217,AE). Asimismo, se encuadra en los proyectos Hiztegia2002 (FEDER 2FD1997-1503) y HERMES (MCYT TIC2000-0335)