Search CORE

43 research outputs found

Strategies to develop Language Technologies for Less-Resourced Languages based on the case of Basque

Author: Alegria Iñaki,
Artola Xabier,
Díaz De Ilarraza Arantza
Sarasola Kepa
Publication venue: HAL CCSD
Publication date: 25/11/2011
Field of study

IXA group has developed during 23 years a basic set of resources, tools and applications for Basque following to an initial strategy which has been adapted according to technological changes. We think that our strategy and experience can be a reference for other less resourced languages. According to a six level classification of world languages, we estimate that this strategy may be useful for several hundred languages, those that have developed a written standard but that still are beginners in Human Language Technology

ArtXiker - @HAL

A spelling corrector for basque based on morphology

Author: Aduriz Itziar
Alegria Iñaki
Artola Xabier
Ezeiza Nerea
Sarasola K.
Urkia Miriam
Publication venue: 'Oxford University Press (OUP)'
Publication date: 22/04/2021
Field of study

This paper describes the components used in the elaboration of the commercial Xuxen spelling checker/corrector for Basque. Because Basque is a highly inflected and agglutinative language, the spelling checker/corrector has been conceived as a by-product of a general purpose morphological analyser/generator. The spelling checker/corrector performs morphological decomposition in order to check misspellings and, to correct them, uses a new strategy which combines the use of an additional two-level morphological subsystem for orthographic errors, and the recognition of correct morphemes inside the world-form during the generation of proposals for typographical errors. Due to a late process of standardization of Basque, Xuxen is intended as a useful tool for standardization purposes of present day written Basque

Diposit Digital de la Universitat de Barcelona

Teknologia garatzeko estrategiak baliabide urriko hizkuntzetarako: euskararen eta Ixa taldearen adibidea

Author: Aduriz Itziar,
Alegria Iñaki,
Artola Xabier,
Díaz De Ilarraza Arantza
Sarasola Kepa
Publication venue: HAL CCSD
Publication date: 01/06/2011
Field of study

El artículo comienza presentando varios datos que muestran la situación de la lengua vasca, y a continuación proponiendo una clasificación para las lenguas del mundo según sea su presencia en Internet y en la tecnología de la lengua. El cuerpo del artículo presenta el trabajo hecho por el grupo Ixa en el campo del procesamiento automático del euskara, identificando sus siete hitos principales y describiendo la estrategia que ha guiado este desarrollo. Se plantea que esta estrategia puede servir como referencia para 190 lenguas que según la lasificación propuesta no poseen recursos de tecnología de la lengua pero si poseen una mínima presencia significativa en Internet.Euskararen egoeraren inguruan hainbat datu ematen dira labur-labur, eta horrekin batera munduko hizkuntzak sailkatzeko proposamen bat aurkezten da Interneten eta hizkuntz teknologian duten egoeren araberakoa. Euskararen prozesaketa automatikoan Ixa taldeak izan duen bilakaeraren nondik norakoak zehazten dira gero, hainbat mugarri azpimarratuz eta ibilbide hori jarraitzeko erabili den estrategia deskribatuz. Munduko 190 hizkuntzentzat erreferentzia izan daiteke estrategia hori, hain zuen, Interneten presentzia minimo eduki bai baina oraindik hizkuntza-teknologia mota hau landu ez duten hizkuntzentzat

ArtXiker - @HAL

EDBL: a General Lexical Basis for the Automatic Processing of Basque

Author: Aldezabal Izaskun
Ansa Olatz,
Arrieta Bertol,
Artola Xabier,
Ezeiza Aitzol,
Hernandez G.
Lersundi Mikel,
Publication venue: IRCS Workshop on linguistic databases.
Publication date: 22/06/2006
Field of study

EDBL (Euskararen Datu-Base Lexikala) is a general-purpose lexical database used in Basque text-processing tasks. It is a large repository of lexical knowledge (currently around 80,000 entries) that acts as basis and support in a number of different NLP tasks, thus providing lexical information for several language tools: morphological analysis, spell checking and correction, lemmatization and tagging, syntactic analysis, and so on. It has been designed to be neutral in relation to the different linguistic formalisms, and flexible and open enough to accept new types of information. A browser-based user interface makes the job of consulting the database, correcting and updating entries, adding new ones, etc. easy to the lexicographer. The paper presents the conceptual schema and the main features of the database, along with some problems encountered in its design and implementation in a commercial DBMS. Given the diversity of the lexical entities and the complex relationships existing among them, three total specializations have been defined under the main class of the hierarchy that represents the conceptual schema. The first one divides all the entries in EDBL into Basque standard and non-standard entries. The second divides the units in the database into dictionary entries (classified into the different parts-of-speech) and other entries (mainly non-independent morphemes and irregularly inflected forms). Finally, another total specialization has been established between single-word entries and multiword lexical units; this permits us to describe the morphotactics of single-word entries, and the constitution and surface realization schemas of multiword lexical units.A hierarchy of typed feature structures (FS) has been designed to map the entities and relationships in the database conceptual schema. The FSs are coded in TEI-conformant SGML, and Feature Structure Declarations (FSD) have been made for all the types of the hierarchy. Feature structures are used as a delivery format to export the lexical information from the database. The information coded in this way is subsequently used as input by the different language analysis tools

ArtXiker - @HAL

HAL Descartes

Hal-Diderot

KAF: Kyoto Annotation Framework

Author: Agirre Bengoa Eneko
Artola Zubillaga Xabier
Bosma Wauter
Díaz de Ilarraza Sánchez Arantza
Rigau Claramunt Germán
Soroa Echave Aitor
Publication venue
Publication date: 01/01/2009
Field of study

This document presents the current draft of KAF: Kyoto Annotation Framework to be used within the KYOTO project. KAF aims to provide a reference format for the representation of semantic annotations

Archivo Digital para la Docencia y la Investigación

Teknologia garatzeko estrategiak baliabide urriko hizkuntzetarako: euskararen eta ixa taldearen adibidea

Author: Aduriz Itziar
Alegria Iñaki
Artola Xabier
Díaz de Ilarraza Sánchez Arantza
Sarasola Kepa
Publication venue: Universidade do Minho, Universidade de Vigo
Publication date: 15/11/2017
Field of study

Diposit Digital de la Universitat de Barcelona

Database Models and Data Formats

Author: Agirre Eneko
Aliprandi Carlo
Artola Xabier
Bosma Wauter
Diaz De Ilarraza Arantza
Marchetti Andrea
Monachini Monica
Neri Federico
Rigau German
Ronzano Francesco
Soria Claudia
Soroa Aitor
Tesconi Maurizio
Vossen Piek
Publication venue
Publication date
Field of study

The deliverable describes data structure and XML formats that have been investigated and defined for data representation of linguistic and semantic resources underlying the KYOTO system

PUblication MAnagement

Diseño y construcción de un sistema inteligente de ayuda diccionarial

Author: Artola Zubillaga Xabier
Publication venue: Sociedad Española para el Procesamiento del Lenguaje Natural
Publication date: 01/01/1995
Field of study

Resumen de la Tesis Doctoral presentada en la Facultad de Informática de San Sebastián de la Universidad del País Vasco en mayo de 1993

Repositorio Institucional de la Universidad de Alicante

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

Sorkuntza zentroa eta kafe antzokia Tolosan

Author: Artola Amonarriz Xabier
Publication venue
Publication date: 09/07/2019
Field of study

Arkitektura proiektua Tolosa

Archivo Digital para la Docencia y la Investigación