Un detector de la unidad central para textos en castellano

Bengoetxea Kortazar, Kepa; Iruskieta Quintian, Mikel

Un detector de la unidad central para textos en castellano

Authors: Kepa Bengoetxea Kortazar
Mikel Iruskieta Quintian
Publication date: 1 January 2018
Publisher: Sociedad Española para el Procesamiento del Lenguaje Natural
Doi

Abstract

En este artículo presentamos el primer detector de la Unidad Central (CU) de resúmenes científicos en castellano basado en técnicas de aprendizaje automático. Para ello, nos hemos basado en la anotación del Spanish RST Treebank anotado bajo la Teoría de la Estructura Retórica o Rhetorical Structure Theory (RST). El método empleado para detectar la unidad central es el modelo de bolsa de palabras utilizando clasificadores como Naive Bayes y SVM. Finalmente, evaluamos el rendimiento de los clasificadores y hemos creado el detector de CUs usando el mejor clasificador.In this paper we present the first automatic detector of the Central Unit (CU) for Spanish scientific abstracts based on machine learning techniques. To do so, learning and evaluation data was extracted from the RST Spanish Treebank annotated under the Rhetorical Structure Theory (RST). We use a bag-of-words model based on Naive Bayes and SVM classifiers to detect the central units of a text. Finally, we evaluate the performance of the classifiers and choose the best to create an automatic CU detector

Similar works

Full text

Open in the Core reader

Download PDF

Available Versions

Repositorio Institucional de la Universidad de Alicante

oai:rua.ua.es:10045/74612

Last time updated on 17/04/2018

RUA

oai:rua.ua.es:10045/74612

Last time updated on 09/04/2020