Search CORE

2 research outputs found

Dockerising Terrier for The Open-Source IR Replicability Challenge

Author: Câmara Arthur Barbosa
Macdonald Craig
Publication venue
Publication date: 01/07/2019
Field of study

Reproducibility and replicability are key concepts in science, and it is therefore important for information retrieval (IR) platforms to aid in reproducing and replicating experiments. In this paper, we describe the creation of a Docker container for Terrier within the framework of the OSIRRC 2019 challenge, which allows typical runs to be reproduced on TREC Test Collections such as Robust04, GOV2, Core2018. In doing so, it is hoped that the produced Docker image can be of aid to other (re)producing baseline experiments on these test collections. Initiatives like OSIRRC are key in advancing these key concepts in the IR area. By making not only the source code available, but also the exact same environment and standardising inputs and outputs, it is possible to easily compare approaches and thereby improve the quality of the research for Information Retrieval

Enlighten

Aplicación de Técnicas de Recuperación de Información y Aprendizaje Automático a la Minería de Opiniones

Author: Corujo Muíña Manuel
Publication venue
Publication date: 01/01/2022
Field of study

[Resumen]: La Minería de Opiniones, también conocida como Análisis de Sentimientos, se dedica al estudio de opiniones y sentimientos expresados en textos. Se encuentra enmarcada dentro del área de estudio del Procesamiento de Lenguaje Natural. Este proyecto en concreto consiste en identificar la polaridad (positiva o negativa) de un conjunto de textos extraídos de una red social, Twitter. Las investigaciones en este campo han aumentado en los últimos años gracias a la mayor disponibilidad de recursos de evaluación con los que se puede trabajar. Así, actualmente resulta sencillo encontrar multitud de textos, independientemente de la temática. El hecho de poder filtrar los tuits en base a rangos de edad, geografía o hashtag (etiquetas señaladas con #), permite la realización de estudios poblacionales, de especial interés en el ámbito de, por ejemplo, la política y el marketing, al hacer posible conocer de forma automática la opinión que genera un producto entre la comunidad de usuarios. Al contrario que en otras aproximaciones más clásicas, en este trabajo no se utilizarán las palabras de los textos como atributos, sino que tan solo se utilizarán un conjunto de atributos derivados del ranking producido por un motor de búsqueda en respuesta a una consulta, donde esta consulta es el texto cuya polaridad queremos conocer. De este modo, el funcionamiento del sistema es el siguiente: En primer lugar, se usa un motor de búsqueda para, a partir de un conjunto de textos (tuits) cuya polaridad es ya conocida, construir un índice. Con el mismo motor de búsqueda y utilizando el texto que se quiere clasificar como consulta, lanzamos esta contra el motor de búsqueda, lo que devolverá un ranking con los tuits del índice más similares a aquel a clasificar. A partir de este ranking se extraen una serie de atributos que serán los que posteriormente utilice el clasificador para determinar la polaridad del texto. Como clasificadores se han utilizado distintos algoritmos de aprendizaje supervisado, como Máquinas de Soporte Vectorial, árboles de decisión o Naïve-Bayes.[Abstract]: Opinion Mining, also known as Sentiment Analysis, is devoted to the study of opinions and emotions expressed in texts. It is framed within the study area of the Natural Language Processing. This particular project consists of the identification of the polarity (positive or negative) of a sample of texts extracted from a social network, namely Twitter. Research in this field has been increasing in recent years due to the growing number of texts that can be analysed as the use of social networks has expanded. Thus, it is currently easy to find a large number of texts, regardless of the subject matter. Due to the possibility of being able to filter tweets based on age ranges, geography or hashtag (labels marked with #), among others, these investigations are highly useful for fields such as politics. Furthermore, said research can also prove to be very convenient in the business world, as they allow to automatically determine the opinion generated by a product among the public. This paper will not consider the words in the texts as attributes, instead only 24 attributes derived from the ranking produced by a Search Engine in response to a consulta will be employed. This consulta is the text to be classified. As such, the system works as follows: A SE is used to build an index based on a set of texts. With the same SE and using the text to classify as a consulta, the index created is consulted, which will generate a ranking of the tweets in the index that are most similar to the consulta. From this ranking, 24 attributes are extracted, which will later be the ones that the classifier uses to determine the polarity of the text. Different supervised learning algorithms have been used as classifiers, such as Support Vector Machines, decision trees or Naïve-Bayes.Traballo fin de grao (UDC.FIC). Enxeñaría Informática. Curso 2021/202

Repositorio da Universidade da Coruña