22 research outputs found

    Building the Gold Standard for the surface syntax of Basque

    Get PDF
    In this paper, we present the process in the construction of SF-EPEC, a 300,000-word corpus syntactically annotated that aims to be a Gold Standard for the surface syntactic processing of Basque. First, the tagset designed for this purpose is described; being Basque an agglutinative language, sometimes complex syntactic tags were needed. We also account for the different phases in the construction of SF-EPEC

    Funtzio sintaktikoen gold estandarra eskuz etiketatzeko gidalerroak

    Get PDF
    [EN]In this report we present the tags we use when annotating the gold standard of syntactic functions and the decisions taken during its annotation. The gold standard is a necessary resource to evaluate the rulebased surface syntactic parser (the one based on the Constraint Grammar formalism), and, moreover, it can be useful to develop and evaluate statistical parsers. The tags we are presenting here follow the Constraint Grammar (CG) formalism (Karlsson et al., 1995). In fact, last experiments show that good results have been obtained when parsing with CG (Karlsson et al., 1995; Samuelsson and Voutilainen,1997; Tapanainen and Järvinen, 1997; Bick, 2000).[EU]Txosten honetan funtzio sintaktikoen gold estandarra etiketatzean erabiltzen diren funtzio-etiketak eta horiek aplikatzeko hartutako erabakiak azalduko ditugu. Gold estandarra funtsezkoa dugu erregeletan oinarritutako azaleko analizatzaile sintaktikoa ebaluatzeko eta, halaber, baliagarri izan daiteke analizatzaile estatistikoak garatzeko eta ebaluatzeko ere. Funtzio-etiketa horiek Constraint Grammar(CG) eredua jarraitzen dute (Karlsson et al., 1995). Izan ere, azken urteetan analisi sintaktiko automatikoan emaitza onak lortu dira CG ereduaren aplikazioaren bidez (Karlsson et al., 1995; Samuelsson and Voutilainen, 1997; Tapanainen and Järvinen, 1997; Bick, 2000). Halaber, testuak anotatzeko lana oso handia dela kontuan izanik, Voutilainen-ek (2012) anotaziorako metodologia erdiautomatikoak proposatzen ditu. Ildo horretatik, etiketatze-lan hori arintze aldera,baliabide erdiautomatikoak ere jorratu ditugu (Arriola et al., 2013), baina geratzen den anbiguotasuna ebazteko (% 25) eskuzko etiketatzea egingo da. Hain zuzen ere, txosten honetan eskuzko lan hori aurrera eramateko gidalerroak definitu ditugu. Gidalerroetan analisietan erabiltzen diren laburtzapenak jasotzeaz gain, funtzio-etiketa horiek esleitzeko gidalerroak zehaztuko ditugu

    SemEval-2015 Task 4: TimeLine: Cross-Document Event Ordering

    Get PDF
    This paper describes the outcomes of the TimeLine task (Cross-Document Event Ordering), that was organised within the Time and Space track of SemEval-2015. Given a set of documents and a set of target entities, the task consisted of building a timeline for each entity, by detecting, anchoring in time and ordering the events involving that entity. The TimeLine task goes a step further than previous evaluation challenges by requiring participant systems to perform both event coreference and temporal relation extraction across documents. Four teams submitted the output of their systems to the four proposed subtracks for a total of 13 runs, the best of which obtained an F1-score of 7.85 in the main track (timeline creation from raw text)

    Creación y Simulación de Metodologías de Análisis, Clasificación e Integración de Nuevos Requerimientos a Software Propietario

    Get PDF
    La priorización de nuevos requerimientos a implementar en un software propietario es un punto fundamental para su mantenimiento, la conservación de la calidad, observación de las reglas de negocio y los estándares de la empresa. Aunque existen herramientas de priorización basadas en técnicas probadas y reconocidas, las mismas requieren una calificación previa de cada requerimiento. Si la empresa cuenta con solicitudes provenientes de varios clientes de un mismo producto, aumentan los factores que afectan a la empresa, las herramientas disponibles no contemplan estos aspectos y hacen mucho más compleja la tarea de calificación. Este trabajo de investigación abarca la realización de un relevamiento de los métodos de priorización y selección de nuevos requerimientos utilizados por empresas de la zona de Rosario, y la definición de una metodología para la selección un nuevo requerimiento, que implica el análisis y evaluación de todas las implicaciones sobre el producto de software y la empresa, respetando sus reglas de negocio. La metodología creada conduce a la definición de los procesos para la construcción de una herramienta de calificación y priorización de nuevos requerimientos en software propietario que tiene solicitudes de varios clientes al mismo tiempo, con instrumentos de calificación que consideran todos los aspectos relacionados, proveerá técnicas de priorización actuales y emitirá informes personalizados según diferentes perspectivas de la empresa.Eje: Ingeniería de SoftwareRed de Universidades con Carreras en Informática (RedUNCI

    Funtzio sintaktikoen gold estandarra eskuz etiketatzeko gidalerroak

    Get PDF
    [EN]In this report we present the tags we use when annotating the gold standard of syntactic functions and the decisions taken during its annotation. The gold standard is a necessary resource to evaluate the rulebased surface syntactic parser (the one based on the Constraint Grammar formalism), and, moreover, it can be useful to develop and evaluate statistical parsers. The tags we are presenting here follow the Constraint Grammar (CG) formalism (Karlsson et al., 1995). In fact, last experiments show that good results have been obtained when parsing with CG (Karlsson et al., 1995; Samuelsson and Voutilainen,1997; Tapanainen and Järvinen, 1997; Bick, 2000).[EU]Txosten honetan funtzio sintaktikoen gold estandarra etiketatzean erabiltzen diren funtzio-etiketak eta horiek aplikatzeko hartutako erabakiak azalduko ditugu. Gold estandarra funtsezkoa dugu erregeletan oinarritutako azaleko analizatzaile sintaktikoa ebaluatzeko eta, halaber, baliagarri izan daiteke analizatzaile estatistikoak garatzeko eta ebaluatzeko ere. Funtzio-etiketa horiek Constraint Grammar(CG) eredua jarraitzen dute (Karlsson et al., 1995). Izan ere, azken urteetan analisi sintaktiko automatikoan emaitza onak lortu dira CG ereduaren aplikazioaren bidez (Karlsson et al., 1995; Samuelsson and Voutilainen, 1997; Tapanainen and Järvinen, 1997; Bick, 2000). Halaber, testuak anotatzeko lana oso handia dela kontuan izanik, Voutilainen-ek (2012) anotaziorako metodologia erdiautomatikoak proposatzen ditu. Ildo horretatik, etiketatze-lan hori arintze aldera,baliabide erdiautomatikoak ere jorratu ditugu (Arriola et al., 2013), baina geratzen den anbiguotasuna ebazteko (% 25) eskuzko etiketatzea egingo da. Hain zuzen ere, txosten honetan eskuzko lan hori aurrera eramateko gidalerroak definitu ditugu. Gidalerroetan analisietan erabiltzen diren laburtzapenak jasotzeaz gain, funtzio-etiketa horiek esleitzeko gidalerroak zehaztuko ditugu

    Representation and treatment of multiword expressions in basque

    No full text
    This paper describes the representation of Basque Multiword Lexical Units and the automatic processing of Multiword Expressions. After discussing and stating which kind of multiword expressions we consider to be processed at the current stage of the work, we present the representation schema of the corresponding lexical units in a generalpurpose lexical database. Due to its expressive power, the schema can deal not only with fixed expressions but also with morphosyntactically flexible constructions. It also allows us to lemmatize word combinations as a unit and yet to parse the components individually if necessary. Moreover, we describe HABIL, a tool for the automatic processing of these expressions, and we give some evaluation results. This work must be placed in a general framework of written Basque processing tools, which currently ranges from the tokenization and segmentation of single words up to the syntactic tagging of general texts.

    EUSLEM: A lemmatiser/tagger for Basque

    No full text
    This paper presents relevant issues that have been considered in the design and development of a general purpose lemmatiser/tagger for Basque (EUSLEM). The lemmatiser/tagger is conceived as a basic tool for other linguistic applications. It uses the lexical database and the morphological analyser previously developed and implemented. We will describe the components used in the development of the lemmatiser/tagger and, finally, we will point out possible further applications of this tool. 1. Introduction An automatic lemmatiser/tagger is a basic tool for applications such as automatic indexation, documental databases, syntactic and semantic analysis, analysis of text corpora, etc. Its job is to give the correct lemma of a text-word, as well as its grammatical category. This project is being carried out by two entities: a group of the Computer Science Faculty of The Basque Country University and UZEI (1), an association that works on Basque terminology and lexicography. It's not the fi..
    corecore