Search CORE

22 research outputs found

Building the Gold Standard for the surface syntax of Basque

Author: Aduriz Itziar
Aranzabe María Jesús
Arriola José María
Díaz de Ilarraza Sánchez Arantza
Gonzalez-Dios Itziar
Urizar Ruben
Publication venue: Sociedad Española para el Procesamiento del Lenguaje Natural (SEPLN)
Publication date: 28/02/2018
Field of study

In this paper, we present the process in the construction of SF-EPEC, a 300,000-word corpus syntactically annotated that aims to be a Gold Standard for the surface syntactic processing of Basque. First, the tagset designed for this purpose is described; being Basque an agglutinative language, sometimes complex syntactic tags were needed. We also account for the different phases in the construction of SF-EPEC

Diposit Digital de la Universitat de Barcelona

Funtzio sintaktikoen gold estandarra eskuz etiketatzeko gidalerroak

Author: Aduriz Itziar
Arriola Egurrola José María
González Dios Itziar
Urizar Ruben
Publication venue
Publication date: 26/02/2015
Field of study

[EN]In this report we present the tags we use when annotating the gold standard of syntactic functions and the decisions taken during its annotation. The gold standard is a necessary resource to evaluate the rulebased surface syntactic parser (the one based on the Constraint Grammar formalism), and, moreover, it can be useful to develop and evaluate statistical parsers. The tags we are presenting here follow the Constraint Grammar (CG) formalism (Karlsson et al., 1995). In fact, last experiments show that good results have been obtained when parsing with CG (Karlsson et al., 1995; Samuelsson and Voutilainen,1997; Tapanainen and Järvinen, 1997; Bick, 2000).[EU]Txosten honetan funtzio sintaktikoen gold estandarra etiketatzean erabiltzen diren funtzio-etiketak eta horiek aplikatzeko hartutako erabakiak azalduko ditugu. Gold estandarra funtsezkoa dugu erregeletan oinarritutako azaleko analizatzaile sintaktikoa ebaluatzeko eta, halaber, baliagarri izan daiteke analizatzaile estatistikoak garatzeko eta ebaluatzeko ere. Funtzio-etiketa horiek Constraint Grammar(CG) eredua jarraitzen dute (Karlsson et al., 1995). Izan ere, azken urteetan analisi sintaktiko automatikoan emaitza onak lortu dira CG ereduaren aplikazioaren bidez (Karlsson et al., 1995; Samuelsson and Voutilainen, 1997; Tapanainen and Järvinen, 1997; Bick, 2000). Halaber, testuak anotatzeko lana oso handia dela kontuan izanik, Voutilainen-ek (2012) anotaziorako metodologia erdiautomatikoak proposatzen ditu. Ildo horretatik, etiketatze-lan hori arintze aldera,baliabide erdiautomatikoak ere jorratu ditugu (Arriola et al., 2013), baina geratzen den anbiguotasuna ebazteko (% 25) eskuzko etiketatzea egingo da. Hain zuzen ere, txosten honetan eskuzko lan hori aurrera eramateko gidalerroak definitu ditugu. Gidalerroetan analisietan erabiltzen diren laburtzapenak jasotzeaz gain, funtzio-etiketa horiek esleitzeko gidalerroak zehaztuko ditugu

Archivo Digital para la Docencia y la Investigación

SemEval-2015 Task 4: TimeLine: Cross-Document Event Ordering

Author: Anne-Lyse Minard
Bernardo Magnini
Eneko Agirre
German Rigau
Itziar Aldabe
Manuela Speranza
Marieke van Erp
Ruben Urizar
Publication venue
Publication date: 01/01/2015
Field of study

This paper describes the outcomes of the TimeLine task (Cross-Document Event Ordering), that was organised within the Time and Space track of SemEval-2015. Given a set of documents and a set of target entities, the task consisted of building a timeline for each entity, by detecting, anchoring in time and ordering the events involving that entity. The TimeLine task goes a step further than previous evaluation challenges by requiring participant systems to perform both event coreference and temporal relation extraction across documents. Four teams submitted the output of their systems to the four proposed subtracks for a total of 13 runs, the best of which obtained an F1-score of 7.85 in the main track (timeline creation from raw text)

VU Research Portal

Crossref

Archivio della ricerca - Fondazione Bruno Kessler

Open Access Repository

Creación y Simulación de Metodologías de Análisis, Clasificación e Integración de Nuevos Requerimientos a Software Propietario

Author: Aduriz Itziar
Antoine Jean-Yves
Barbu Mititelu Verginica
Berk Gozde
Bhatia Archna
Candito Marie
Carlino Carola
Caruso Valeria
Chen Jia
Constant Matthieu
Cordeiro Silvio Ricardo
de Medeiros Caseli Helena
Di Buono Maria Pia
Ehren Rafael
Elyovitch Hevi
Erden Berna
Estarrona Ainara
Foster Jennifer
Fotopoulou Aggeliki
Foufi Vassiliki
Ge Xiaomin
Giouli Voula
Gonzalez Itziar
Guillaume Bruno
Gurrutxaga Antton
Güngör Tunga
Ha-Cohen Kerner Yaakov
Hu Fangyuan
Hu Sha
Ionescu Mihaela
Iñurrieta Uxoa
Jain Kanishka
Jiang Menghan
Li Minli
Lichte Timm
Liebeskind Chaya
Liu Siyuan
Louizou Sevasti
Lynn Teresa
Malka Ruth
Markantonatou Stella
Miranda Isaac
Monti Johanna
Onofrei Mihaela
Palka-Binkiewicz Emilia
Papadelli Stella
Parmentier Yannick
Pascucci Antonio
Pasquer Caroline
Puri Vandana
Qin Zhenzhen
Rademaker Alexandre
Raffone Annalisa
Ramisch Carlos
Ramisch Renata
Ramisch Renata
Ratori Shraddha
Riccio Anna
Rizea Monica-Mihaela
Sangati Federico
Savary Agata
Shukla Vishakha
Speranza Giulia
Srivastava Shubham
Stymme Sara
Stymne Sara
Sun Ruilong
Uria Larraitz
Urizar Ruben
Vaidya Ashwini
Vale Oto
Villavicencio Aline
Walsh Abigail
Wang Chenweng
Waszczuk Jakub
Wick Pedro Gabriela
Wilkens Rodrigo
Xiao Huangyang
Xu Hongzhi
Yan Peiyi
Yih Tsy
Yirmibeşoğlu Zeynep
Yu Ke
Yu Songping
Zeng Si
Zhang Yongchen
Zhao Yun
Zilio Leonardo
Publication venue
Publication date: 15/06/2016
Field of study

La priorización de nuevos requerimientos a implementar en un software propietario es un punto fundamental para su mantenimiento, la conservación de la calidad, observación de las reglas de negocio y los estándares de la empresa. Aunque existen herramientas de priorización basadas en técnicas probadas y reconocidas, las mismas requieren una calificación previa de cada requerimiento. Si la empresa cuenta con solicitudes provenientes de varios clientes de un mismo producto, aumentan los factores que afectan a la empresa, las herramientas disponibles no contemplan estos aspectos y hacen mucho más compleja la tarea de calificación. Este trabajo de investigación abarca la realización de un relevamiento de los métodos de priorización y selección de nuevos requerimientos utilizados por empresas de la zona de Rosario, y la definición de una metodología para la selección un nuevo requerimiento, que implica el análisis y evaluación de todas las implicaciones sobre el producto de software y la empresa, respetando sus reglas de negocio. La metodología creada conduce a la definición de los procesos para la construcción de una herramienta de calificación y priorización de nuevos requerimientos en software propietario que tiene solicitudes de varios clientes al mismo tiempo, con instrumentos de calificación que consideran todos los aspectos relacionados, proveerá técnicas de priorización actuales y emitirá informes personalizados según diferentes perspectivas de la empresa.Eje: Ingeniería de SoftwareRed de Universidades con Carreras en Informática (RedUNCI

LINDAT/CLARIN digital library at the Institute of Formal and Applied Linguistics (ÚFAL), Faculty of Mathematics and Physics, Charles University

Funtzio sintaktikoen gold estandarra eskuz etiketatzeko gidalerroak

Author: Aduriz Itziar
Arriola Jose Mari
González-Dios Itziar
Urizar Ruben
Publication venue
Publication date: 01/01/2015
Field of study

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

Archivo Digital para la Docencia y la Investigación

Representation and treatment of multiword expressions in basque

Author: Iñaki Alegria
Koldo Gojenola
Nerea Ezeiza
Olatz Ansa
Ruben Urizar
Xabier Artola
Publication venue
Publication date: 01/01/2004
Field of study

This paper describes the representation of Basque Multiword Lexical Units and the automatic processing of Multiword Expressions. After discussing and stating which kind of multiword expressions we consider to be processed at the current stage of the work, we present the representation schema of the corresponding lexical units in a generalpurpose lexical database. Due to its expressive power, the schema can deal not only with fixed expressions but also with morphosyntactically flexible constructions. It also allows us to lemmatize word combinations as a unit and yet to parse the components individually if necessary. Moreover, we describe HABIL, a tool for the automatic processing of these expressions, and we give some evaluation results. This work must be placed in a general framework of written Basque processing tools, which currently ranges from the tokenization and segmentation of single words up to the syntactic tagging of general texts.

CiteSeerX

EUSLEM: A lemmatiser/tagger for Basque

Author: Itziar Aduriz
Izaskun Aldezabal
Iñaki Alegria
Nerea Ezeiza
Ruben Urizar
Xabier Artola
Publication venue
Publication date
Field of study

This paper presents relevant issues that have been considered in the design and development of a general purpose lemmatiser/tagger for Basque (EUSLEM). The lemmatiser/tagger is conceived as a basic tool for other linguistic applications. It uses the lexical database and the morphological analyser previously developed and implemented. We will describe the components used in the development of the lemmatiser/tagger and, finally, we will point out possible further applications of this tool. 1. Introduction An automatic lemmatiser/tagger is a basic tool for applications such as automatic indexation, documental databases, syntactic and semantic analysis, analysis of text corpora, etc. Its job is to give the correct lemma of a text-word, as well as its grammatical category. This project is being carried out by two entities: a group of the Computer Science Faculty of The Basque Country University and UZEI (1), an association that works on Basque terminology and lexicography. It's not the fi..

CiteSeerX