Search CORE

21 research outputs found

Automatic Discovery of Heterogeneous Machine Learning Pipelines: An Application to Natural Language Processing

Author: Almeida-Cruz Yudivian
Estévez-Velarde Suilan
Gutiérrez Yoan
Montoyo Andres
Publication venue: 'Association for Computational Linguistics (ACL)'
Publication date: 01/01/2020
Field of study

This paper presents AutoGOAL, a system for automatic machine learning (AutoML) that uses heterogeneous techniques. In contrast with existing AutoML approaches, our contribution can automatically build machine learning pipelines that combine techniques and algorithms from different frameworks, including shallow classifiers, natural language processing tools, and neural networks. We define the heterogeneous AutoML optimization problem as the search for the best sequence of algorithms that transforms specific input data into the desired output. This provides a novel theoretical and practical approach to AutoML. Our proposal is experimentally evaluated in diverse machine learning problems and compared with alternative approaches, showing that it is competitive with other AutoML alternatives in standard benchmarks. Furthermore, it can be applied to novel scenarios, such as several NLP tasks, where existing alternatives cannot be directly deployed. The system is freely available and includes in-built compatibility with a large number of popular machine learning frameworks, which makes our approach useful for solving practical problems with relative ease and effort.This research has been supported by a Carolina Foundation grant in agreement with University of Alicante and University of Havana. Moreover, it has also been partially funded by both aforementioned universities, the Generalitat Valenciana (Conselleria d’Educació, Investigació, Cultura i Esport) and the Spanish Government through the projects LIVING-LANG (RTI2018-094653-B-C22) and SIIA (PROMETEO/2018/089, PROMETEU/2018/089)

Repositorio Institucional de la Universidad de Alicante

Crossref

Differential evolution with thresheld convergence

Author: Bolufé-Röhler Antonio
Chen Stephen
Estévez-Velarde Suilan
Montgomery James
Piad-Morffis Alejandro
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 10/12/2015
Field of study

During the search process of differential evolution (DE), each new solution may represent a new more promising region of the search space (exploration) or a better solution within the current region (exploitation). This concurrent exploitation can interfere with exploration since the identification of a new more promising region depends on finding a (random) solution in that region which is better than its target solution. Ideally, every sampled solution will have the same relative fitness with respect to its nearby local optimum – finding the best region to exploit then becomes the problem of finding the best random solution. However, differential evolution is characterized by an initial period of exploration followed by rapid convergence. Once the population starts converging, the difference vectors become shorter, more exploitation is performed, and an accelerating convergence occurs. This rapid convergence can occur well before the algorithm’s budget of function evaluations is exhausted; that is, the algorithm can converge prematurely. In thresheld convergence, early exploitation is “held” back by a threshold function, allowing a longer exploration phase. This paper presents a new adaptive thresheld convergence mechanism which helps DE achieve large performance improvements in multi-modal search spaces

The Australian National University

KD SENSO-MERGER: An architecture for semantic integration of heterogeneous data

Author: Abreu Salas José Ignacio
Estévez-Velarde Suilan
Gutiérrez Yoan
Montoyo Andres
Muñoz Rafael
Publication venue: Elsevier
Publication date: 19/01/2024
Field of study

This paper presents KD SENSO-MERGER, a novel Knowledge Discovery (KD) architecture that is capable of semantically integrating heterogeneous data from various sources of structured and unstructured data (i.e. geolocations, demographic, socio-economic, user reviews, and comments). This goal drives the main design approach of the architecture. It works by building internal representations that adapt and merge knowledge across multiple domains, ensuring that the knowledge base is continuously updated. To deal with the challenge of integrating heterogeneous data, this proposal puts forward the corresponding solutions: (i) knowledge extraction, addressed via a plugin-based architecture of knowledge sensors; (ii) data integrity, tackled by an architecture designed to deal with uncertain or noisy information; (iii) scalability, this is also supported by the plugin-based architecture as only relevant knowledge to the scenario is integrated by switching-off non-relevant sensors. Also, we minimize the expert knowledge required, which may pose a bottleneck when integrating a fast-paced stream of new sources. As proof of concept, we developed a case study that deploys the architecture to integrate population census and economic data, municipal cartography, and Google Reviews to analyze the socio-economic contexts of educational institutions. The knowledge discovered enables us to answer questions that are not possible through individual sources. Thus, companies or public entities can discover patterns of behavior or relationships that would otherwise not be visible and this would allow extracting valuable information for the decision-making process.This research is supported by the University of Alicante, Spain, the Spanish Ministry of Science and Innovation, the Generalitat Valenciana, Spain, and the European Regional Development Fund (ERDF) through the following funding: At the national level, the following projects were granted: TRIVIAL (PID2021-122263OB-C22); and CORTEX (PID2021-123956OB-I00), funded by MCIN/AEI/10.13039/501100011033 and, as appropriate, by ‘‘ERDF A way of making Europe’’, by the ‘‘European Union’’ or by the ‘‘European Union NextGenerationEU/PRTR’’. At regional level, the Generalitat Valenciana (Conselleria d’Educacio, Investigacio, Cultura i Esport), Spain, granted funding for NL4DISMIS (CIPROM/2021/21)

Repositorio Institucional de la Universidad de Alicante

Demo Application for the AutoGOAL Framework

Author: Almeida-Cruz Yudivian
Estévez-Velarde Suilan
Gutiérrez Yoan
Montoyo Andres
Muñoz Rafael
Piad-Morffis Alejandro
Publication venue: 'Association for Computational Linguistics (ACL)'
Publication date: 01/01/2020
Field of study

This paper introduces a web demo that showcases the main characteristics of the AutoGOAL framework. AutoGOAL is a framework in Python for automatically finding the best way to solve a given task. It has been designed mainly for automatic machine learning (AutoML) but it can be used in any scenario where several possible strategies are available to solve a given computational task. In contrast with alternative frameworks, AutoGOAL can be applied seamlessly to Natural Language Processing as well as structured classification problems. This paper presents an overview of the framework’s design and experimental evaluation in several machine learning problems, including two recent NLP challenges. The accompanying software demo is available online and full source code is provided under the MIT open-source license.This research has been supported by a Carolina Foundation grant in agreement with University of Alicante and University of Havana. Moreover, it has also been partially funded by both aforementioned universities, the Generalitat Valenciana (Conselleria d’Educaci´o, Investigaci´o, Cultura i Esport) and the Spanish Government through the projects LIVING-LANG (RTI2018-094653-B-C22) and SIIA (PROMETEO/2018/089, PROMETEU/2018/089)

Repositorio Institucional de la Universidad de Alicante

Crossref

Overview of the eHealth Knowledge Discovery Challenge at IberLEF 2020

Author: Almeida-Cruz Yudivian
Cañizares-Diaz Hian
Estévez-Velarde Suilan
Gutiérrez Yoan
Montoyo Andres
Muñoz Rafael
Piad-Morffis Alejandro
Publication venue: CEUR
Publication date: 01/01/2020
Field of study

This paper summarises the results of the third edition of the eHealth Knowledge Discovery (KD) challenge, hosted at the Iberian Language Evaluation Forum 2020. The eHealth-KD challenge proposes two computational tasks involving the identification of semantic entities and relations in natural language text, focusing on Spanish language health documents. In this edition, besides text extracted from medical sources, Wikipedia content was introduced into the corpus, and a novel transfer-learning evaluation scenario was designed that challenges participants to create systems that provide cross-domain generalisation. A total of eight teams participated with a variety of approaches including deep learning end-to-end systems as well as rule-based and knowledge-driven techniques. This paper analyses the most successful approaches and highlights the most interesting challenges for future research in this field.This research has been partially supported by the University of Alicante and University of Havana, the Generalitat Valenciana (Conselleria d’Educació, Investigació, Cultura i Esport) and the Spanish Government through the projects SIIA (PROMETEO/2018/089, PROMETEU/2018/089) and LIVING-LANG (RTI2018-094653-B-C22)

Repositorio Institucional de la Universidad de Alicante

Applying Human-in-the-Loop to construct a dataset for determining content reliability to combat fake news

Author: Bonet-Jover Alba
Estévez-Velarde Suilan
Martínez-Barco Patricio
Piad-Morffis Alejandro
Saquete Boró Estela
Sepúlveda-Torres Robiert
Publication venue: Elsevier
Publication date: 20/09/2023
Field of study

Annotated corpora are indispensable tools to train computational models in Natural Language Processing. However, in the case of more complex semantic annotation processes, it is a costly, arduous, and time-consuming task, resulting in a shortage of resources to train Machine Learning and Deep Learning algorithms. In consideration, this work proposes a methodology, based on the human-in-the-loop paradigm, for semi-automatic annotation of complex tasks. This methodology is applied in the construction of a reliability dataset of Spanish news so as to combat disinformation and fake news. We obtain a high quality resource by implementing the proposed methodology for semi-automatic annotation, increasing annotator efficacy and speed, with fewer examples. The methodology consists of three incremental phases and results in the construction of the RUN dataset. The annotation quality of the resource was evaluated through time-reduction (annotation time reduction of almost 64% with respect to the fully manual annotation), annotation quality (measuring consistency of annotation and inter-annotator agreement), and performance by training a model with RUN semi-automatic dataset (Accuracy 95% F1 95%), validating the suitability of the proposal.This research work is funded by MCIN/AEI/10.13039/501100011033 and, as appropriate, by “ERDF A way of making Europe”, by the “European Union” or by the “European Union NextGenerationEU/PRTR” through the project TRIVIAL: Technological Resources for Intelligent VIral AnaLysis through NLP (PID2021-122263OB-C22) and the project SOCIALTRUST: Assessing trustworthiness in digital media (PDC2022-133146-C22). It is also funded by Generalitat Valenciana, Spain through the project NL4DISMIS: Natural Language Technologies for dealing with dis- and misinformation (CIPROM/2021/21), and the grant ACIF/2020/177

Repositorio Institucional de la Universidad de Alicante

Generación procedural de ciudades: un enfoque jerárquico

Author: Consuegra Ayala Juan Pablo
Estévez Velarde Suilan
Katrib Mora Miguel
Leonard Méndez Ludwig
Piad Morffis Alejandro
Publication venue: Universidad de La Laguna
Publication date: 01/01/2017
Field of study

La generación procedural de ciudades ha sido tema de un gran número de investigaciones en los últimos años. El creciente nivel de detalle que exigen los contenidos en las aplicaciones de realidad virtual lo demanda. En este trabajo se propone un nuevo enfoque para la generación procedural de ciudades. Para ello se describe una forma de representar el contenido a generar y se propone una estrategia de generación acorde a dicha representación

Repositorio Institucional de la Universidad de La Laguna

GPLSI-UH LETO V1.0: Learning Engine Through Ontologies

Author: Almeida-Cruz Yudivian
Estévez-Velarde Suilan
Gutiérrez Yoan
Montoyo Andres
Muñoz Rafael
Palomar Manuel
Piad-Morffis Alejandro
Valdés Pérez Daniel Alejandro
Publication venue
Publication date: 30/01/2021
Field of study

LETO es un marco de aprendizaje de ontologías diseñado para extraer conocimiento de una variedad de fuentes. Estas fuentes pudieran ser datos estructurados y no estructurados, y de ellas se podrá descubrir, actualizar continuamente, enriquecer e integrar información relevante como parte de un único conocimiento semántico. En la actual versión 1.0 se limita a la extracción de conocimiento desde datos no estructurados, i.e. textos en lenguaje natural, siguiendo el modelo semántico publicado en [EGM2018]. Entre sus funcionalidades está la extracción de entidades y relaciones semánticas desde fuentes textuales; la transformación de esta información en elementos interrelacionados mediante técnicas de agrupamientos; y finalmente generación de ontologías representativas del contenido procesado. Se proporciona un punto de acceso API, y una herramienta visual para la manipulación de procesos y visualización de las ontologías obtenidas [EMA2019].LETO is an ontology learning framework designed to extract knowledge from a variety of sources. These sources may be structured and/or unstructured data, and from them we can discover, continuously update, enrich and integrate relevant information as part of a single semantic knowledge resource. The current 1.0 version is limited to the extraction of knowledge from unstructured data, i.e. natural language texts, following the semantic model published in [EGM2018]. Among this version’s functionalities are the extraction of entities and semantic relations from textual sources; the transformation of such information into linked elements through clustering techniques; and finally, the generation of representative ontologies of the processed content. An API access point as well as a visual tool for the manipulation of processes and visualization of the obtained ontologies is provided [EMA2019].Universidad de Alicante; Universidad de La Habana(Cuba); Ministerio de Educación, Cultura y Deporte, Ministerio de Economía y Competitividad (MINECO) a través de los proyectos LIVING-LANG (RTI2018-094653-B-C22) e INTEGER (RTI2018-094649-B-I00); Gobierno de la Generalitat Valenciana a través del proyecto SIIA (PROMETEO/2018/089, PROMETEU/2018/089); se ha contado con el respaldo de las acciones COST: CA19134 - “Distributed Knowledge Graphs” y CA19142 - “Leading Platform for European Citizens, Industries, Academia and Policymakers in Media Accessibility

Repositorio Institucional de la Universidad de Alicante

Resumen de TASS 2018: Opiniones, Salud y Emociones

Author: Almeida-Cruz Yudivian
Díaz Galiano Manuel Carlos
Estévez-Velarde Suilan
García Cumbreras Miguel Ángel
García Vega Manuel
Gutiérrez Yoan
Martínez Cámara Eugenio
Montejo Ráez Arturo
Montoyo Andres
Muñoz Rafael
Piad-Morffis Alejandro
Villena Román Julio
Publication venue: Sun SITE Central Europe
Publication date: 01/09/2018
Field of study

This is an overview of the Workshop on Semantic Analysis at the SEPLN congress held in Sevilla, Spain, in September 2018. This forum proposes to participants four different semantic tasks on texts written in Spanish. Task 1 focuses on polarity classification; Task 2 encourages the development of aspect-based polarity classification systems; Task 3 provides a scenario for discovering knowledge from eHealth documents; finally, Task 4 is about automatic classification of news articles according to safety. The former two tasks are novel in this TASS's edition. We detail the approaches and the results of the submitted systems of the different groups in each task.Este artículo ofrece un resumen sobre el Taller de Análisis Semántico en la SEPLN (TASS) celebrado en Sevilla, España, en septiembre de 2018. Este foro propone a los participantes cuatro tareas diferentes de análisis semántico sobre textos en español. La Tarea 1 se centra en la clasificación de la polaridad; la Tarea 2 anima al desarrollo de sistemas de polaridad orientados a aspectos; la Tarea 3 consiste en descubrir conocimiento en documentos sobre salud; finalmente, la Tarea 4 propone la clasificación automática de noticias periodísticas según un nivel de seguridad. Las dos últimas tareas son nuevas en esta edición. Se ofrece una síntesis de los sistemas y los resultados aportados por los distintos equipos participantes, así como una discusión sobre los mismos.This work has been partially supported by a grant from the Fondo Europeo de Desarrollo Regional (FEDER), the projects REDES (TIN2015-65136-C2-1-R, TIN2015-65136-C2-2-R) and SMART-DASCI (TIN2017-89517-P) from the Spanish Government, and “Plataforma Inteligente para Recuperación, Análisis y Representación de la Información Generada por Usuarios en Internet” (GRE16-01) from University of Alicante. Eugenio Martínez Cámara was supported by the Spanish Government Programme Juan de la Cierva Formación (FJCI-2016-28353)

Repositorio Institucional de la Universidad de Alicante

Descubrimiento Automático de Flujos de Aprendizaje de Máquina basado en Gramáticas Probabilı́sticas

Author: Estévez-Velarde Suilan
Publication venue: 'Universidad de Alicante Servicio de Publicaciones'
Publication date: 02/12/2021
Field of study

El aprendizaje de máquinas ha ganado terreno utilizándose en casi todas las áreas de la vida cotidiana, ayudando a tomar decisiones en las finanzas, la medicina, el comercio y el entretenimiento. El desarrollo continuo de nuevos algoritmos y técnicas de aprendizaje automático, y la amplia gama de herramientas y conjuntos de datos disponibles han traído nuevas oportunidades y desafíos para investigadores y profesionales tanto del mundo académico como de la industria. Seleccionar la mejor estrategia posible para resolver un problema de aprendizaje automático es cada vez más difícil, en parte porque requiere largos tiempos de experimentación y profundos conocimientos técnicos. En este escenario, el campo de investigación Automated Machine Learning (AutoML) ha ganado protagonismo, proponiendo estrategias para automatizar progresivamente tareas usuales durante el desarrollo de aplicaciones de aprendizaje de máquina. Las herramientas de AutoML más comunes permiten seleccionar automáticamente dentro de un conjunto restringido de algoritmos y parámetros la mejor estrategia para cierto conjunto de datos. Sin embargo, los problemas prácticos a menudo requieren combinar y comparar algoritmos heterogéneos implementados con diferentes tecnologías subyacentes. Un ejemplo es el procesamiento del lenguaje natural, un escenario donde varía el espacio de posibles técnicas a aplicar ampliamente entre diferentes tareas, desde el preprocesamiento hasta la representación y clasificación de textos. Realizar AutoML en un escenario heterogéneo como este es complejo porque la solución necesaria podría incluir herramientas y bibliotecas no compatibles entre sí. Esto requeriría que todos los algoritmos acuerden un protocolo común que permita la salida de un algoritmo para ser compartida como entradas a cualquier otro. En esta investigación se diseña e implementa un sistema de AutoML que utiliza técnicas heterogéneas. A diferencia de los enfoques de AutoML existentes, nuestra contribución puede combinar técnicas y algoritmos de diferentes bibliotecas y tecnologías, incluidos algoritmos de aprendizaje de máquina clásicos, extracción de características, herramientas de procesamiento de lenguaje natural y diversas arquitecturas de redes neuronales. Definimos el problema heterogéneo de optimización de AutoML como la búsqueda de la mejor secuencia de algoritmos que transforme datos de entrada específicos en la salida deseada. Esto proporciona un enfoque teórico y práctico novedoso para AutoML. Nuestra propuesta se evalúa experimentalmente en diversos problemas de aprendizaje automático y se compara con enfoques alternativos, lo que demuestra que es competitiva con otras alternativas de AutoML en los puntos de referencia estándar. Además, se puede aplicar a escenarios novedosos, como varias tareas de procesamiento de lenguaje natural, donde las alternativas existentes no se pueden implementar directamente. El sistema está disponible de forma gratuita e incluye compatibilidad incorporada con una gran cantidad de marcos de aprendizaje automático populares, lo que hace que nuestro enfoque sea útil para resolver problemas prácticos con relativa facilidad y esfuerzo. El uso de la herramienta propuesta en esta investigación permite a los investigadores y profesionales desarrollar rápidamente algoritmos de referencia optimizados en diversos problemas de aprendizaje automático. En algunos escenarios, la solución proporcionada por nuestro sistema podría ser suficiente. Sin embargo, los sistemas AutoML no deben intentar reemplazar a los expertos humanos, sino servir como herramientas complementarias que permitan a los investigadores obtener rápidamente mejores prototipos y conocimientos sobre las estrategias más prometedoras en un problema concreto. Las técnicas de AutoML abren las puertas a revolucionar la forma en que se realiza la investigación y el desarrollo del aprendizaje automático en la academia y la industria

Repositorio Institucional de la Universidad de Alicante