Search CORE

9 research outputs found

Resumen de la tarea Rest-Mex en IberLEF 2022: Sistema de Recomendación, Análisis de Sentimiento y Predicción de Semáforo Covid para Textos Turísticos Mexicanos

Author: Aranda Ramón
Bustio-Martínez Lázaro
Díaz-Pacheco Ángel
Fajardo-Delgado Daniel
Guerrero-Rodríguez Rafael
Rodríguez-González Ansel Y.
Álvarez-Carmona Miguel Á.
Publication venue: Sociedad Española para el Procesamiento del Lenguaje Natural
Publication date: 01/09/2022
Field of study

This paper presents the framework and results from the Rest-Mex task at IberLEF 2022. This task considered three tracks: Recommendation System, Sentiment Analysis and Covid Semaphore Prediction, using texts from Mexican touristic places. The Recommendation System task consists in predicting the degree of satisfaction that a tourist may have when recommending a destination of Nayarit, Mexico, based on places visited by the tourists and their opinions. On the other hand, the Sentiment Analysis task predicts the polarity of an opinion issued and the attraction by a tourist who traveled to the most representative places in Mexico. We have built corpora for both tasks considering Spanish opinions from the TripAdvisor website. As a novelty, the Covid Semaphore Prediction task aims to predict the color of the Mexican Semaphore for each state, according to the Covid news in the state, using data from the Mexican Ministry of Health. This paper compares and discusses the participants’ results for all three tacks.Este artículo presenta el marco y los resultados de la tarea Rest-Mex en IberLEF 2022. Esta tarea consideró tres sub tareas: Sistema de recomendación, Análisis de sentimiento y Predicción de semáforo Covid, utilizando textos de lugares turísticos mexicanos. La tarea del Sistema de Recomendación consiste en predecir el grado de satisfacción que puede tener un turista al recomendar un destino de Nayarit, México, con base en los lugares visitados por los turistas y sus opiniones. Por otro lado, la tarea de Análisis de Sentimiento predice la polaridad de una opinión emitida y la atracción por parte de un turista que viajó a los lugares más representativos de México. Hemos construido corpus para ambas tareas teniendo en cuenta las opiniones en español de TripAdvisor. Como novedad, la tarea de Predicción de Semáforo Covid tiene como objetivo predecir el color del Semáforo Mexicano para cada estado, de acuerdo a las noticias Covid en el estado, utilizando datos de la Secretaría de Salud de México. Este documento compara y discute los resultados de los participantes para las tres sub tareas

Repositorio Institucional de la Universidad de Alicante

Resumen de la tarea Rest-Mex en IberLEF 2023: Investigaci´on sobre An´alisis de Sentimiento para Textos Tur´ısticos Mexicanos

Author: Aranda Ramón
Bustio-Martínez Lázaro
Díaz-Pacheco Ángel
López-Monroy A. Pastor
Muñiz-Sánchez Victor
Rodríguez-González Ansel Y.
Sanchez-Vega Fernando
Álvarez-Carmona Miguel Á.
Publication venue: Sociedad Española para el Procesamiento del Lenguaje Natural
Publication date: 01/09/2023
Field of study

This paper presents the framework and results of the Rest-Mex task at IberLEF 2023, focusing on sentiment analysis and text clustering of tourist texts. The study primarily focuses on texts related to tourist destinations in Mexico, although this edition included data from Cuba and Colombia for the first time. The sentiment analysis task aims to predict the polarity of opinions expressed by tourists, classifying the type of place visited, whether it’s a tourist attraction, hotel, or restaurant, as well as the country it is located in. On the other hand, the text clustering task aims to classify news articles related to tourism in Mexico. For both tasks, corpora were built using Spanish opinions extracted from TripAdvisor and news articles from Mexican media. This article compares and discusses the results obtained by the participants in both sub-tasks. Additionally, a method is proposed to measure the easiness of a multi-class text classification corpus, along with an approach for system selection in a possible late fusion scheme.Este artículo presenta el marco y los resultados de la tarea Rest-Mex en IberLEF 2023, que se enfoca en el análisis de sentimiento y agrupamiento de textos turísticos. El estudio se centra principalmente en textos relacionados con destinos turísticos en México, aunque esta edición incluyó datos de Cuba y Colombia por primera vez. La tarea de análisis de sentimiento tiene como objetivo predecir la polaridad de opiniones expresadas por turistas, clasificando el tipo de lugar visitado, ya sea un atractivo turístico, un hotel o un restaurante, así como el país en el que se encuentra. Por otro lado, la tarea de agrupamiento de textos busca clasificar noticias relacionadas con el turismo en México. Para ambas tareas, se construyeron corpus utilizando opiniones en español extraídas de TripAdvisor y noticias de medios mexicanos. En este artículo, se comparan y discuten los resultados obtenidos por los participantes en ambas sub tareas. Además, se propone un método para medir la facilidad de un corpus de clasificación textual multi-clase, así como un enfoque para la selección de sistemas en un posible esquema de fusión tardía.The authors thank the Mexican Academy of Tourism Research (AMIT) for their support of the project ”Creation of a labeled database related to tourist destinations for training artificial intelligence models for classifying relevant topics” through the call ”I Research Projects 2022”, which originated this work

Repositorio Institucional de la Universidad de Alicante

An autoencoder-based representation for noise reduction in distant supervision of relation extraction

Author: Bustio-Martínez Lázaro
García-Mendoza Juan Luis
Orihuela-Espina Felipe
Villaseñor-Pineda Luis
Publication venue: 'IOS Press'
Publication date: 31/03/2022
Field of study

University of Birmingham Research Portal

Evaluation of a new representation for noise reduction in distant supervision

Author: Buscaldi Davide
Bustio-Martínez Lázaro
García-Mendoza Juan Luis
Orihuela-Espina Felipe
Villaseñor-Pineda Luis
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/10/2022
Field of study

Distant Supervision is a relation extraction approach that allows automatic labeling of a dataset. However, this labeling introduces noise in the labels (e.g., when two entities in a sentence are automatically labeled with an invalid relation). Noise in labels makes difficult the relation extraction task. This noise is precisely one of the main challenges of this task. Until now, the methods that incorporate a previous noise reduction step do not evaluate the performance of this step. This paper evaluates the noise reduction using a new representation obtained with autoencoders. In addition, it was incoporated more information to the input of the autoencoder proposed in the state-of-the-art to improve the representation over which the noise is reduced. Also, three methods were proposed to select the instances considered as real. As a result, it was obtained the highest values of the area under the ROC curves using the improved input combined with state-of-the-art anomaly detection methods. Moreover, the three proposed selection methods significantly improve the existing method in the literature

University of Birmingham Research Portal

Spiral - Imperial College Digital Repository

A multi-measure feature selection algorithm for efficacious intrusion detection

Author: Bustio-Martínez Lázaro (author)
Hernández-León Raudel (author)
Herrera-Semenets Vitali (author)
van den Berg J. (author)
Publication venue: 'Elsevier BV'
Publication date: 01/01/2021
Field of study

Every day the number of devices interacting through telecommunications networks grows resulting into an increase in the volume of data and information generated. At the same time, a growing number of information security incidents is being observed including the occurrence of unauthorized accesses, also named intrusions. As a consequence of these two developments, Information and Communications services providers require automated processes to detect and solve such intrusions, and this should done quickly in order to keep the related cybersecurity risks at acceptable levels. However, the presence of large volumes of data negatively interferes with the performance of classifiers used in intrusion detection tasks, which limits their applicability in practical cases. The research reported in this paper focuses on proposing a novel feature selection algorithm for intrusion detection scenarios. To this end, an extensive literature review was executed to first discover issues in the feature selection algorithms reported. Based on the insights obtained, the new multi-measure feature selection algorithm was designed that uses qualitative information provided by multiple feature selection measures, and reduces the dimensionality of the training data set. The algorithm proposed was next extensively tested using various data sets. It provides greater efficacy than other feature selection algorithms used for intrusion detection purposes. We finalize by providing some ideas on future research in order to further improve the algorithm.Green Open Access added to TU Delft Institutional Repository ‘You share, we take care!’ – Taverne project https://www.openaccess.nl/en/you-share-we-take-care Otherwise as indicated in the copyright section: the publisher is the copyright holder of this work and the author uses the Dutch legislation to make this work public.Information and Communication Technolog

TU Delft Repository

A Decision Tree Induction Algorithm for Efficient Rule Evaluation Using Shannon’s Expansion

Author: Bustio-Martínez Lázaro (author)
Hernández-León Raudel (author)
Herrera-Semenets Vitali (author)
van den Berg Jan (author)
Publication venue: Springer
Publication date: 01/01/2023
Field of study

Decision trees are one of the most popular structures for decision-making and the representation of a set of rules. However, when a rule set is represented as a decision tree, some quirks in its structure may negatively affect its performance. For example, duplicate sub-trees and rule filters, that need to be evaluated more than once, could negatively affect the efficiency. This paper presents a novel algorithm based on Shannon’s expansion, which guarantees that the same rule filter is not evaluated more than once, even if repeated in other rules. This fact increases efficiency during the evaluation process using the induced decision tree. Experiments demonstrated the viability of the proposed algorithm in processing-intensive scenarios, such as in intrusion detection and data stream analysis.Green Open Access added to TU Delft Institutional Repository 'You share, we take care!' - Taverne project https://www.openaccess.nl/en/you-share-we-take-care Otherwise as indicated in the copyright section: the publisher is the copyright holder of this work and the author uses the Dutch legislation to make this work public.Cyber Securit

TU Delft Repository

Red Light/Green Light: A Lightweight Algorithm for, Possibly, Fraudulent Online Behavior Change Detection

Author: Bustio-Martínez Lázaro (author)
Hernández-León Raudel (author)
Herrera Semenets V. (author)
van den Berg Jan (author)
Publication venue: Springer
Publication date: 01/01/2022
Field of study

Telecommunications services have become a constant in people’s lives. This has inspired fraudsters to carry out malicious activities causing economic losses to people and companies. Early detection of signs that suggest the possible occurrence of malicious activity would allow analysts to act in time and avoid unintended consequences. Modeling the behavior of users could identify when a significant change takes place. Following this idea, an algorithm for online behavior change detection in telecommunication services is proposed in this paper. The experimental results show that the new algorithm can identify behavioral changes related to unforeseen events.Green Open Access added to TU Delft Institutional Repository 'You share, we take care!' - Taverne project https://www.openaccess.nl/en/you-share-we-take-care Otherwise as indicated in the copyright section: the publisher is the copyright holder of this work and the author uses the Dutch legislation to make this work public.Cyber Securit

TU Delft Repository

Improved algorithm for parallel mining collaborative frequent itemsets in multiple data streams

Author: A Gani
ALS Saabith
D Gunopulos
E Baccarelli
Fang’ai Liu
G Wu
Gurmeet Singh Manku
H Yu
HF Li
J Dean
J Han
JIN Che-Qing
Khanh-Chuong Duong
Lázaro Bustio-Martínez
MR Henzinger
MY Yeh
N MacBean
Qianqian Wang
Rakesh Agrawal
S Shamshirb
S Wang
T Bernecker
V Hristidis
Xin Wang
Y Xun
Y Xun
ZH Deng
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Crossref