18 research outputs found

    A computational ecosystem to support eHealth Knowledge Discovery technologies in Spanish

    Get PDF
    The massive amount of biomedical information published online requires the development of automatic knowledge discovery technologies to effectively make use of this available content. To foster and support this, the research community creates linguistic resources, such as annotated corpora, and designs shared evaluation campaigns and academic competitive challenges. This work describes an ecosystem that facilitates research and development in knowledge discovery in the biomedical domain, specifically in Spanish language. To this end, several resources are developed and shared with the research community, including a novel semantic annotation model, an annotated corpus of 1045 sentences, and computational resources to build and evaluate automatic knowledge discovery techniques. Furthermore, a research task is defined with objective evaluation criteria, and an online evaluation environment is setup and maintained, enabling researchers interested in this task to obtain immediate feedback and compare their results with the state-of-the-art. As a case study, we analyze the results of a competitive challenge based on these resources and provide guidelines for future research. The constructed ecosystem provides an effective learning and evaluation environment to encourage research in knowledge discovery in Spanish biomedical documents.This research has been partially supported by the University of Alicante and University of Havana, the Generalitat Valenciana (Conselleria d’Educació, Investigació, Cultura i Esport) and the Spanish Government through the projects SIIA (PROMETEO/2018/089, PROMETEU/2018/089) and LIVING-LANG (RTI2018-094653-B-C22)

    Demo Application for the AutoGOAL Framework

    Get PDF
    This paper introduces a web demo that showcases the main characteristics of the AutoGOAL framework. AutoGOAL is a framework in Python for automatically finding the best way to solve a given task. It has been designed mainly for automatic machine learning (AutoML) but it can be used in any scenario where several possible strategies are available to solve a given computational task. In contrast with alternative frameworks, AutoGOAL can be applied seamlessly to Natural Language Processing as well as structured classification problems. This paper presents an overview of the framework’s design and experimental evaluation in several machine learning problems, including two recent NLP challenges. The accompanying software demo is available online and full source code is provided under the MIT open-source license.This research has been supported by a Carolina Foundation grant in agreement with University of Alicante and University of Havana. Moreover, it has also been partially funded by both aforementioned universities, the Generalitat Valenciana (Conselleria d’Educaci´o, Investigaci´o, Cultura i Esport) and the Spanish Government through the projects LIVING-LANG (RTI2018-094653-B-C22) and SIIA (PROMETEO/2018/089, PROMETEU/2018/089)

    Applying Human-in-the-Loop to construct a dataset for determining content reliability to combat fake news

    Get PDF
    Annotated corpora are indispensable tools to train computational models in Natural Language Processing. However, in the case of more complex semantic annotation processes, it is a costly, arduous, and time-consuming task, resulting in a shortage of resources to train Machine Learning and Deep Learning algorithms. In consideration, this work proposes a methodology, based on the human-in-the-loop paradigm, for semi-automatic annotation of complex tasks. This methodology is applied in the construction of a reliability dataset of Spanish news so as to combat disinformation and fake news. We obtain a high quality resource by implementing the proposed methodology for semi-automatic annotation, increasing annotator efficacy and speed, with fewer examples. The methodology consists of three incremental phases and results in the construction of the RUN dataset. The annotation quality of the resource was evaluated through time-reduction (annotation time reduction of almost 64% with respect to the fully manual annotation), annotation quality (measuring consistency of annotation and inter-annotator agreement), and performance by training a model with RUN semi-automatic dataset (Accuracy 95% F1 95%), validating the suitability of the proposal.This research work is funded by MCIN/AEI/10.13039/501100011033 and, as appropriate, by “ERDF A way of making Europe”, by the “European Union” or by the “European Union NextGenerationEU/PRTR” through the project TRIVIAL: Technological Resources for Intelligent VIral AnaLysis through NLP (PID2021-122263OB-C22) and the project SOCIALTRUST: Assessing trustworthiness in digital media (PDC2022-133146-C22). It is also funded by Generalitat Valenciana, Spain through the project NL4DISMIS: Natural Language Technologies for dealing with dis- and misinformation (CIPROM/2021/21), and the grant ACIF/2020/177

    GPLSI-UH LETO V1.0: Learning Engine Through Ontologies

    Get PDF
    LETO es un marco de aprendizaje de ontologías diseñado para extraer conocimiento de una variedad de fuentes. Estas fuentes pudieran ser datos estructurados y no estructurados, y de ellas se podrá descubrir, actualizar continuamente, enriquecer e integrar información relevante como parte de un único conocimiento semántico. En la actual versión 1.0 se limita a la extracción de conocimiento desde datos no estructurados, i.e. textos en lenguaje natural, siguiendo el modelo semántico publicado en [EGM2018]. Entre sus funcionalidades está la extracción de entidades y relaciones semánticas desde fuentes textuales; la transformación de esta información en elementos interrelacionados mediante técnicas de agrupamientos; y finalmente generación de ontologías representativas del contenido procesado. Se proporciona un punto de acceso API, y una herramienta visual para la manipulación de procesos y visualización de las ontologías obtenidas [EMA2019].LETO is an ontology learning framework designed to extract knowledge from a variety of sources. These sources may be structured and/or unstructured data, and from them we can discover, continuously update, enrich and integrate relevant information as part of a single semantic knowledge resource. The current 1.0 version is limited to the extraction of knowledge from unstructured data, i.e. natural language texts, following the semantic model published in [EGM2018]. Among this version’s functionalities are the extraction of entities and semantic relations from textual sources; the transformation of such information into linked elements through clustering techniques; and finally, the generation of representative ontologies of the processed content. An API access point as well as a visual tool for the manipulation of processes and visualization of the obtained ontologies is provided [EMA2019].Universidad de Alicante; Universidad de La Habana(Cuba); Ministerio de Educación, Cultura y Deporte, Ministerio de Economía y Competitividad (MINECO) a través de los proyectos LIVING-LANG (RTI2018-094653-B-C22) e INTEGER (RTI2018-094649-B-I00); Gobierno de la Generalitat Valenciana a través del proyecto SIIA (PROMETEO/2018/089, PROMETEU/2018/089); se ha contado con el respaldo de las acciones COST: CA19134 - “Distributed Knowledge Graphs” y CA19142 - “Leading Platform for European Citizens, Industries, Academia and Policymakers in Media Accessibility

    Resumen de TASS 2018: Opiniones, Salud y Emociones

    Get PDF
    This is an overview of the Workshop on Semantic Analysis at the SEPLN congress held in Sevilla, Spain, in September 2018. This forum proposes to participants four different semantic tasks on texts written in Spanish. Task 1 focuses on polarity classification; Task 2 encourages the development of aspect-based polarity classification systems; Task 3 provides a scenario for discovering knowledge from eHealth documents; finally, Task 4 is about automatic classification of news articles according to safety. The former two tasks are novel in this TASS's edition. We detail the approaches and the results of the submitted systems of the different groups in each task.Este artículo ofrece un resumen sobre el Taller de Análisis Semántico en la SEPLN (TASS) celebrado en Sevilla, España, en septiembre de 2018. Este foro propone a los participantes cuatro tareas diferentes de análisis semántico sobre textos en español. La Tarea 1 se centra en la clasificación de la polaridad; la Tarea 2 anima al desarrollo de sistemas de polaridad orientados a aspectos; la Tarea 3 consiste en descubrir conocimiento en documentos sobre salud; finalmente, la Tarea 4 propone la clasificación automática de noticias periodísticas según un nivel de seguridad. Las dos últimas tareas son nuevas en esta edición. Se ofrece una síntesis de los sistemas y los resultados aportados por los distintos equipos participantes, así como una discusión sobre los mismos.This work has been partially supported by a grant from the Fondo Europeo de Desarrollo Regional (FEDER), the projects REDES (TIN2015-65136-C2-1-R, TIN2015-65136-C2-2-R) and SMART-DASCI (TIN2017-89517-P) from the Spanish Government, and “Plataforma Inteligente para Recuperación, Análisis y Representación de la Información Generada por Usuarios en Internet” (GRE16-01) from University of Alicante. Eugenio Martínez Cámara was supported by the Spanish Government Programme Juan de la Cierva Formación (FJCI-2016-28353)

    A corpus to support eHealth Knowledge Discovery technologies

    Get PDF
    This paper presents and describes eHealth-KD corpus. The corpus is a collection of 1173 Spanish health-related sentences manually annotated with a general semantic structure that captures most of the content, without resorting to domain-specific labels. The semantic representation is first defined and illustrated with example sentences from the corpus. Next, the paper summarizes the process of annotation and provides key metrics of the corpus. Finally, three baseline implementations, which are supported by machine learning models, were designed to consider the complexity of learning the corpus semantics. The resulting corpus was used as an evaluation scenario in TASS 2018 (Martínez-Cámara et al., 2018) and the findings obtained by participants are discussed. The eHealth-KD corpus provides the first step in the design of a general-purpose semantic framework that can be used to extract knowledge from a variety of domains.This research has been supported by the University of Alicante and University of Havana. Moreover, it has also been partially funded by both aforementioned universities and the Generalitat Valenciana (Conselleria d’Educació, Investigació, Cultura i Esport) through the projects PROMETEO/2018/089, PROMETEU/2018/089; Social-Univ 2.0 (ENCARGO-INTERNOOMNI-1); and PINGVALUE3-18Y

    Detección de Idioma en Twitter (Language Detection on Twitter)

    No full text
    ResumenEl trabajo presenta una alternativa para identificar idiomas en Twitter sin que sea necesario utilizar conjuntos de entrenamiento o información agregada. En dicha alternativa se utilizan técnicas basadas en los algoritmos de reconocimiento de trigramas y small words. Se valora la utilización de estos algoritmos por sí solos y en un modelo de composición. Asimismo, se analiza la incidencia del pre-procesamiento de los tweets en la precisión de la identificación de los idiomas. Finalmente, después de un proceso de experimentación, se determina la mejor alternativa de las estudiadas.AbstractThe paper presents an alternative to identify languages on Twitter without having to use training sets or aggregated information. Such alternative is based on trigram recognition algorithms and small words techniques. The use of these algorithms is evaluated both on their own and in a model of composition. Also, the incidence of pre-processing of tweets in the accuracy of identifying the language is discussed. Finally, after a process of experimentation, the best alternative, out of those studied, is determined
    corecore