Search CORE

75 research outputs found

Towards a Distant Reading of the Golden Ages Hendecasyllable: Metrical Patterns, Frequencies and Historical Development

Author: Navarro Colorado Borja
Publication venue: 'Universidad de Sevilla - Secretariado de Recursos Audiovisuales y Nuevas Tecnologias'
Publication date: 01/01/2016
Field of study

En este trabajo se desarrolla un análisis de los principales tipos de endecasílabos utilizados en los sonetos del Siglo de Oro. Como novedad, aplicamos un método de análisis macro o distante, mediante el análisis computacional de un corpus de más de setenta mil (70.000) versos. A partir de un modelo formal de patrón métrico, analizamos los tipos de patrones métricos más frecuentes y su evolución histórica. Los resultados, sin ser aún concluyentes, sí muestran las principales preferencias métricas de los diferentes autores y cómo varían a lo largo de los siglos XVI y XVII.In this paper an analysis of the hendecasyllable meter in the Golden Age Spanish sonnets is presented. A macroanalysis or (computer-based) “distant reading” approach is applied to a corpus of more than 70 000 hendecasyllables. Based on a formal definition of metrical pattern, I analyze the most frequent metrical patterns and their historical development. Results are not entirely conclusive, but they show the main authors’ metrical preferences and their evolution during 16th and 17th Centuries

Repositorio Institucional de la Universidad de Alicante

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

Ordenación de eventos multidocumento usando inferencia de relaciones temporales y modelos semánticos distribucionales

Author: Navarro Colorado Borja
Saquete Boró Estela
Publication venue: Sociedad Española para el Procesamiento del Lenguaje Natural
Publication date: 01/01/2017
Field of study

This paper focuses on the contribution of temporal relations inference and distributional semantic models to the event ordering task. Our system automatically builds ordered timelines of events from different written texts in English by performing first temporal clustering and then semantic clustering. In order to determine temporal compatibility, an inference from the temporal relationships between events –automatically extracted from a Temporal Information Processing system– is applied. Regarding semantic compatibility between events, we analyze two different distributional semantic models: LDA Topic modeling and Word2Vec word embeddings. Both semantic models together with the temporal inference have been evaluated within the framework of SemEval 2015 Task 4 Track B. Experiments show that, using both models, the current State of the Art is improved, showing significant advance in the Cross-Document Event Ordering task.Este artículo se centra en estudiar la contribución que la inferencia de relaciones temporales y los modelos semánticos distribucionales hacen a la tarea de ordenación de eventos. Nuestro sistema construye automáticamente líneas de tiempo con eventos extraídos de diferentes documentos escritos en inglés. Para ello realiza primero una agrupación temporal y posteriormente una agrupación semántica. Para determinar la compatibilidad temporal se realiza una inferencia sobre las relaciones temporales entre los eventos extraídos de un sistema automático de procesamiento de información temporal. Para la compatibilidad semántica entre eventos hemos analizado dos modelos semánticos distribucionales distintos: LDA Topic Modeling y Word2Vec Word Embeddings. Ambos modelos semánticos junto con la inferencia temporal han sido evaluados bajo el marco de evaluación de SemEval 2015 Task 4 Track B. Los experimentos muestran que, usando ambos modelos se mejora el estado del arte actual, implicando un avance importante en la tarea de ordenación de eventos multidocumento.This paper has been partially supported by the Spanish government, project TIN2015-65100-R, project TIN2015-65136-C2-2-R and PROMETEOII/2014/001

Repositorio Institucional de la Universidad de Alicante

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

Hacia un análisis distante del endecasílabo áureo: patrones métricos, frecuencias y evolución histórica

Author: Navarro Colorado Borja
Publication venue: 'UNED - Universidad Nacional de Educacion a Distancia'
Publication date: 01/01/2016
Field of study

Crossref

REVISTAS CIENTÍFICAS UNED. Servicio de Publicación y Difusión Digital. Biblioteca UNED

Enriched Digital Edition: a Multilevel Annotation Model for Golden-Age Spanish Poetry

Author: Navarro Colorado Borja
Publication venue: 'Universitat Autonoma de Barcelona'
Publication date: 01/01/2024
Field of study

En este capítulo se presenta un modelo general para la anotación multinivel de corpora de texto literario. Por multinivel se hace referencia a la combinación, en un mismo corpus, de información de diferentes niveles de descripción lingüística o literaria, desde datos relacionados con palabras o sílabas, hasta cuestiones temáticas, textuales o pragmáticas. El objetivo final de un corpus de estas características es fijar un posible análisis literario, por lo que se considera como una edición digital enriquecida. Se defienden cuatro características que un corpus de texto literario debe cumplir: interoperabilidad, perspectivismo, unidad y claridad/sencillez. Se da cuenta de los principales problemas de formalización en un corpus multinivel de este tipo: la combinación de diferentes formalismos de representación y, en el caso de XML, el problema de un anidamiento incorrecto. Finalmente se propone un modelo para un corpus de poesía del Siglo de Oro.This paper presents a general model for the multilevel annotation of a literary corpus. Multilevel refers to the combination of information from different linguistic or literary levels in the same corpus: from word related data to thematic, textual or pragmatic questions. The objective is to fix a possible literary analysis. To be considered an enriched digital edition, an annotated corpus must meet four characteristics: interoperability, perspectivism, unity and clarity/simplicity. The main formalization problems are discussed: the combination of different representation formalisms and, in the case of XML, the improper nesting. Finally, a model for a corpus of poetry from the Spanish Golden-Age is proposed.Trabajo parcialmente financiado por el Ministerio de Ciencia e Innovación a través del proyecto “CORTEX: Conscious Text Generation” (PID2021-123956OB-I00): MCIN/AEI/10.13039/501100011033/ y “FEDER Una manera de hacer Europa”; y por la Generalitat Valenciana (Conselleria d’Educació, Investigació, Cultura i Esport) a través del Proyecto: NL4DISMIS: Tecnologías del Lenguaje Natural para lidiar con la desinformación (CIPROM/2021/021)

Repositorio Institucional de la Universidad de Alicante

On Poetic Topic Modeling: Extracting Themes and Motifs From a Corpus of Spanish Poetry

Author: Navarro Colorado Borja
Publication venue: 'Frontiers Media SA'
Publication date: 01/01/2018
Field of study

This paper analyzes the application of LDA topic modeling to a corpus of poetry. First, it explains how the most coherent LDA-topics have been established by running several tests and automatically evaluating the coherence of the resulting LDA-topics. Results show, on one hand, that when dealing with a corpus of poetry, lemmatization is not advisable because several poetic features are lost in the process; and, on the other hand, that a standard LDA algorithm is better than a specific version of LDA for short texts (LF-LDA). The resulting LDA-topics have then been manually analyzed in order to define the relation between word topics and poems. The analysis shows that there are mainly two kinds of semantic relations: an LDA-topic could represent the subject or theme of the poem, but it could also represent a poetic motif. All these analyses have been undertaken on a large corpus of Golden Age Spanish sonnets. Finally, the paper shows the most relevant themes and motifs in this corpus such as “love,” “religion,” “heroics,” “moral,” or “mockery” on one hand, and “rhyme,” “marine,” “music,” or “painting” on the other hand.This work was supported by the BBVA Foundation: grants for research groups 2016, project Distant Reading Approach to Golden Age Spanish Sonnets (Ayudas fundación BBVA a equipos de investigación científica, proyecto Análisis distante de base computacional del soneto castellano del Siglo de Oro): http://adso.gplsi.es. It was also partially conducted in the context of the COST Action Distant Reading for European Literary History (CA16204 - Distant-Reading): www.distant-reading.net

Repositorio Institucional de la Universidad de Alicante

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

Crossref

Directory of Open Access Journals

Frontiers - Publisher Connector

The Simplification of the Language of Public Administration: The Case of Ombudsman Institutions

Author: González Delgado Gabriel
Navarro Colorado Borja
Publication venue: ELRA Language Resource Association
Publication date: 21/05/2024
Field of study

Language produced by Public Administrations has crucial implications in citizens’ lives. However, its syntactic complexity and the use of legal jargon, among other factors, make it difficult to be understood for laypeople and certain target audiences. The NLP task of Automatic Text Simplification (ATS) can help to the necessary simplification of this technical language. For that purpose, specialized parallel datasets of complex-simple pairs need to be developed for the training of these ATS systems. In this position paper, an on-going project is presented, whose main objectives are (a) to extensively analyze the syntactical, lexical, and discursive features of the language of English-speaking ombudsmen, as samples of public administrative language, with special attention to those characteristics that pose a threat to comprehension, and (b) to develop the OmbudsCorpus, a parallel corpus of complex-simple supra-sentential fragments from ombudsmen’s case reports that have been manually simplified by professionals and annotated with standardized simplification operations. This research endeavor aims to provide a deeper understanding of the simplification process and to enhance the training of ATS systems specialized in administrative texts.This paper has been partially funded by the Spanish Government through the R&D projects “CORTEX: Conscious Text Generation” (PID2021-123956OB-I00, funded by MCIN/AEI/10.13039/501100011033/ and by “ERDF A way of making Europe”) and “CLEAR.TEXT: Enhancing the modernization public sector organizations by deploying Natural Language Processing to make their digital content CLEARER to those with cognitive disabilities” (TED2021-130707B-I00), and by the Generalitat Valenciana through the project “NL4DISMIS: Natural Language Technologies for dealing with dis- and misinformation with grant reference (CIPROM/2021/21)”

Repositorio Institucional de la Universidad de Alicante

An approach to the recommendation of scientific articles according to their degree of specificity

Author: Hernández Antonio
Navarro Colorado Borja
Tomás David
Publication venue: Sociedad Española para el Procesamiento del Lenguaje Natural
Publication date: 01/01/2015
Field of study

En este artículo se presenta un método para recomendar artículos científicos teniendo en cuenta su grado de generalidad o especificidad. Este enfoque se basa en la idea de que personas menos expertas en un tema preferirían leer artículos más generales para introducirse en el mismo, mientras que personas más expertas preferirían artículos más específicos. Frente a otras técnicas de recomendación que se centran en el análisis de perfiles de usuario, nuestra propuesta se basa puramente en el análisis del contenido. Presentamos dos aproximaciones para recomendar artículos basados en el modelado de tópicos (Topic Modelling). El primero de ellos se basa en la divergencia de tópicos que se dan en los documentos, mientras que el segundo se basa en la similitud que se dan entre estos tópicos. Con ambas medidas se consiguió determinar lo general o específico de un artículo para su recomendación, superando en ambos casos a un sistema de recuperación de información tradicional.This article presents a method for recommending scientific articles taking into consideration their degree of generality or specificity. This approach is based on the idea that less expert people in a specific topic prefer to read more general articles to be introduced into it, while people with more expertise prefer to read more specific articles. Compared to other recommendation techniques that focus on the analysis of user profiles, our proposal is purely based on content analysis. We present two methods for recommending articles, based on Topic Modelling. The first one is based on the divergence of topics given in the documents, while the second uses the similarities that exist between these topics. By using the proposed methods it was possible to determine the degree of specificity of an article, and the results obtained with them overcame those produced by an information retrieval traditional system.Este trabajo ha sido parcialmente financiado por los siguientes proyectos: ATTOS (TIN2012-38536-C03-03), LEGOLANG-UAGE (TIN2012-31224), FIRST (FP7-287607), DIIM2.0 (PROMETEOII/2014/001) y por el Programa Nacional de Movilidad de Recursos Humanos del Plan Nacional de I+D+i (CAS12/00113)

Repositorio Institucional de la Universidad de Alicante

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

Metrical Annotation of a Large Corpus of Spanish Sonnets: Representation, Scansion and Evaluation

Author: Navarro Colorado Borja
Ribes-Lafoz María
Sánchez Noelia
Publication venue: European Language Resources Association (ELRA)
Publication date: 01/01/2016
Field of study

In order to analyze metrical and semantics aspects of poetry in Spanish with computational techniques, we have developed a large corpus annotated with metrical information. In this paper we will present and discuss the development of this corpus: the formal representation of metrical patterns, the semi-automatic annotation process based on a new automatic scansion system, the main annotation problems, and the evaluation, in which an inter-annotator agreement of 96% has been obtained. The corpus is open and available

Repositorio Institucional de la Universidad de Alicante

Cross-document event ordering through temporal, lexical and distributional knowledge

Author: Navarro Colorado Borja
Saquete Boró Estela
Publication venue: 'Elsevier BV'
Publication date: 01/01/2016
Field of study

In this paper we present a system that automatically builds ordered timelines of events from different written texts in English. The system deals with problems such as automatic event extraction, cross-document temporal relation extraction and cross-document event coreference resolution. Its main characteristic is the application of three different types of knowledge: temporal knowledge, lexical-semantic knowledge and distributional-semantic knowledge, in order to anchor and order the events in the timeline. It has been evaluated within the framework of SemEval 2015. The proposed system improves the current state-of-the-art systems in all measures (up to eight points of F1-score over other systems) and shows a significant advance in the Cross-document event ordering task.This paper has been partially supported by the Spanish government, project TIN2015-65100-R and project TIN2015-65136-C2-2-R

Repositorio Institucional de la Universidad de Alicante

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

Crossref