Search CORE

11 research outputs found

Topic Modeling for Analysing Similarity between Users in Twitter

Author: Puerto San Román Haritz
Publication venue
Publication date: 06/03/2018
Field of study

La minería de datos en redes sociales está ganando importancia debido a que permite realizar campañas de marketing más precisas. Por ejemplo, Google realiza un análisis de todos nuestros datos: vídeos que vemos, términos que buscamos, páginas webs a las que accedemos, aplicaciones que descargamos, etc. para conocernos mejor y mostrarnos publicidad personalizada. LDA es un modelo estadístico generativo para modelar documentos. Existen diversos algoritmos que dado un conjunto de documentos permiten obtener un modelo LDA que podría haber generado esos documentos. Con ese modelo es posible observar los temas usados en esos documentos y las palabras más relevantes para cada tema. En el presente trabajo se pretende realizar una primera aproximación a la minería de datos en Twitter. Para ello, usando la API de Twitter se han descargado tweets de diversos usuarios y de sus seguidores. Posteriormente se han procesado esos Tweets generando documentos y se ha aplicado la implementación de Gensim del algoritmo Online LDA para obtener los temas de los documentos. Posteriormente, se han comparado los temas de los usuarios con los de sus seguidores. También se proporciona un análisis del estado del arte de la minería de datos en Twitter

Repositorio Institucional Universidad de Málaga

UKP-SQuARE: An Interactive Tool for Teaching Question Answering

Author: Fang Haishuo
Gurevych Iryna
Puerto Haritz
Publication venue
Publication date: 02/06/2023
Field of study

The exponential growth of question answering (QA) has made it an indispensable topic in any Natural Language Processing (NLP) course. Additionally, the breadth of QA derived from this exponential growth makes it an ideal scenario for teaching related NLP topics such as information retrieval, explainability, and adversarial attacks among others. In this paper, we introduce UKP-SQuARE as a platform for QA education. This platform provides an interactive environment where students can run, compare, and analyze various QA models from different perspectives, such as general behavior, explainability, and robustness. Therefore, students can get a first-hand experience in different QA techniques during the class. Thanks to this, we propose a learner-centered approach for QA education in which students proactively learn theoretical concepts and acquire problem-solving skills through interactive exploration, experimentation, and practical assignments, rather than solely relying on traditional lectures. To evaluate the effectiveness of UKP-SQuARE in teaching scenarios, we adopted it in a postgraduate NLP course and surveyed the students after the course. Their positive feedback shows the platform's effectiveness in their course and invites a wider adoption.Comment: Accepted by BEA workshop, ACL202

arXiv.org e-Print Archive

MetaQA: Combining Expert Agents for Multi-Skill Question Answering

Author: Gurevych Iryna
Puerto Haritz
Şahin Gözde Gül
Publication venue
Publication date: 22/01/2023
Field of study

The recent explosion of question answering (QA) datasets and models has increased the interest in the generalization of models across multiple domains and formats by either training on multiple datasets or by combining multiple models. Despite the promising results of multi-dataset models, some domains or QA formats may require specific architectures, and thus the adaptability of these models might be limited. In addition, current approaches for combining models disregard cues such as question-answer compatibility. In this work, we propose to combine expert agents with a novel, flexible, and training-efficient architecture that considers questions, answer predictions, and answer-prediction confidence scores to select the best answer among a list of answer candidates. Through quantitative and qualitative experiments we show that our model i) creates a collaboration between agents that outperforms previous multi-agent and multi-dataset approaches in both in-domain and out-of-domain scenarios, ii) is highly data-efficient to train, and iii) can be adapted to any QA format. We release our code and a dataset of answer predictions from expert agents for 16 QA datasets to foster future developments of multi-agent systems on https://github.com/UKPLab/MetaQA.Comment: Accepted at EACL 202

arXiv.org e-Print Archive

Let Me Know What to Ask: Interrogative-Word-Aware Question Generation

Author: Kang Junmo
Myaeng Sung-Hyon
Roman Haritz Puerto San
Publication venue
Publication date: 01/01/2019
Field of study

Question Generation (QG) is a Natural Language Processing (NLP) task that aids advances in Question Answering (QA) and conversational assistants. Existing models focus on generating a question based on a text and possibly the answer to the generated question. They need to determine the type of interrogative word to be generated while having to pay attention to the grammar and vocabulary of the question. In this work, we propose Interrogative-Word-Aware Question Generation (IWAQG), a pipelined system composed of two modules: an interrogative word classifier and a QG model. The first module predicts the interrogative word that is provided to the second module to create the question. Owing to an increased recall of deciding the interrogative words to be used for the generated questions, the proposed model achieves new state-of-the-art results on the task of QG in SQuAD, improving from 46.58 to 47.69 in BLEU-1, 17.55 to 18.53 in BLEU-4, 21.24 to 22.33 in METEOR, and from 44.53 to 46.94 in ROUGE-L.Comment: Accepted at 2nd Workshop on Machine Reading for Question Answering (MRQA), EMNLP 201

arXiv.org e-Print Archive

TUbiblio

Crossref

UKP-SQuARE v3: A Platform for Multi-Agent QA Research

Author: Baumgärtner Tim
Fang Haishuo
Gurevych Iryna
Puerto Haritz
Sachdeva Rachneet
Tariverdian Sewin
Wang Kexin
Zhang Hao
Publication venue
Publication date: 31/03/2023
Field of study

The continuous development of Question Answering (QA) datasets has drawn the research community's attention toward multi-domain models. A popular approach is to use multi-dataset models, which are models trained on multiple datasets to learn their regularities and prevent overfitting to a single dataset. However, with the proliferation of QA models in online repositories such as GitHub or Hugging Face, an alternative is becoming viable. Recent works have demonstrated that combining expert agents can yield large performance gains over multi-dataset models. To ease research in multi-agent models, we extend UKP-SQuARE, an online platform for QA research, to support three families of multi-agent systems: i) agent selection, ii) early-fusion of agents, and iii) late-fusion of agents. We conduct experiments to evaluate their inference speed and discuss the performance vs. speed trade-off compared to multi-dataset models. UKP-SQuARE is open-source and publicly available at http://square.ukp-lab.de

arXiv.org e-Print Archive

Surveying (Dis)Parities and Concerns of Compute Hungry NLP Research

Author: Arase Yuki
Derczynski Leon
Dodge Jesse
Forde Jessica Zosa
Gurevych Iryna
Lee Ji-Ung
Puerto Haritz
Rücklé Andreas
Schwartz Roy
Strubell Emma
van Aken Betty
Publication venue
Publication date: 29/06/2023
Field of study

Many recent improvements in NLP stem from the development and use of large pre-trained language models (PLMs) with billions of parameters. Large model sizes makes computational cost one of the main limiting factors for training and evaluating such models; and has raised severe concerns about the sustainability, reproducibility, and inclusiveness for researching PLMs. These concerns are often based on personal experiences and observations. However, there had not been any large-scale surveys that investigate them. In this work, we provide a first attempt to quantify these concerns regarding three topics, namely, environmental impact, equity, and impact on peer reviewing. By conducting a survey with 312 participants from the NLP community, we capture existing (dis)parities between different and within groups with respect to seniority, academia, and industry; and their impact on the peer reviewing process. For each topic, we provide an analysis and devise recommendations to mitigate found disparities, some of which already successfully implemented. Finally, we discuss additional concerns raised by many participants in free-text responses

arXiv.org e-Print Archive

UKP-SQuARE: An Interactive Tool for Teaching Question Answering

Author: Fang Haishuo
Gurevych Iryna
Puerto Haritz
Publication venue: 'Association for Computational Linguistics (ACL)'
Publication date: 24/07/2023
Field of study

The exponential growth of question answering (QA) has made it an indispensable topic in any Natural Language Processing (NLP) course. Additionally, the breadth of QA derived from this exponential growth makes it an ideal scenario for teaching related NLP topics such as information retrieval, explainability, and adversarial attacks among others. In this paper, we introduce UKP-SQuARE as a platform for QA education. This platform provides an interactive environment where students can run, compare, and analyze various QA models from different perspectives, such as general behavior, explainability, and robustness. Therefore, students can get a first-hand experience in different QA techniques during the class. Thanks to this, we propose a learner-centered approach for QA education in which students proactively learn theoretical concepts and acquire problem-solving skills through interactive exploration, experimentation, and practical assignments, rather than solely relying on traditional lectures. To evaluate the effectiveness of UKP-SQuARE in teaching scenarios, we adopted it in a postgraduate NLP course and surveyed the students after the course. Their positive feedback shows the platform’s effectiveness in their course and invites a wider adoption

TUbiblio

Regularization of Distinct Strategies for Unsupervised Question Generation

Author: Hong Giwon
Kang Junmo
Myaeng Sung-Hyon
Puerto San Roman Haritz
Publication venue: ACL
Publication date: 01/01/2020
Field of study

Unsupervised question answering (UQA) has been proposed to avoid the high cost of creating high-quality datasets for QA. One approach to UQA is to train a QA model with questions generated automatically. However, the generated questions are either too similar to a word sequence in the context or too drifted from the semantics of the context, thereby making it difficult to train a robust QA model. We propose a novel regularization method based on teacher-student architecture to avoid bias toward a particular question generation strategy and modulate the process of generating individual words when a question is generated. Our experiments demonstrate that we have achieved the goal of generating higher-quality questions for UQA across diverse QA datasets and tasks. We also show that this method can be useful for creating a QA model with few-shot learning

TUbiblio

Crossref

UKP-SQuARE v3: A Platform for Multi-Agent QA Research

Author: Baumgärtner Tim
Fang Haishuo
Gurevych Iryna
Puerto Haritz
Sachdeva Rachneet
Tariverdian Sewin
Wang Kexin
Zhang Hao
Publication venue: 'Association for Computational Linguistics (ACL)'
Publication date: 10/07/2023
Field of study

The continuous development of Question Answering (QA) datasets has drawn the research community’s attention toward multi-domain models. A popular approach is to use multi-dataset models, which are models trained on multiple datasets to learn their regularities and prevent overfitting to a single dataset. However, with the proliferation of QA models in online repositories such as GitHub or Hugging Face, an alternative is becoming viable. Recent works have demonstrated that combining expert agents can yield large performance gains over multi-dataset models. To ease research in multi-agent models, we extend UKP-SQuARE, an online platform for QA research, to support three families of multi-agent systems: i) agent selection, ii) early-fusion of agents, and iii) late-fusion of agents. We conduct experiments to evaluate their inference speed and discuss the performance vs. speed trade-off compared to multi-dataset models. UKP-SQuARE is open-source and publicly available

TUbiblio

UKP-SQuARE v2: Explainability and Adversarial Attacks for Trustworthy QA

Author: Baumgärtner Tim
Gurevych Iryna
Haritz Puerto
Ribeiro Leonardo F. R.
Saadi Hossain Shaikh
Sachdeva Rachneet
Tariverdian Sewin
Wang Kexin
Zhang Hao
Publication venue: 'Association for Computational Linguistics (ACL)'
Publication date: 30/11/2022
Field of study

Question Answering (QA) systems are increasingly deployed in applications where they support real-world decisions. However, state-of-the-art models rely on deep neural networks, which are difficult to interpret by humans. Inherently interpretable models or post hoc explainability methods can help users to comprehend how a model arrives at its prediction and, if successful, increase their trust in the system. Furthermore, researchers can leverage these insights to develop new methods that are more accurate and less biased. In this paper, we introduce SQuARE v2, the new version of SQuARE, to provide an explainability infrastructure for comparing models based on methods such as saliency maps and graph-based explanations. While saliency maps are useful to inspect the importance of each input token for the model’s prediction, graph-based explanations from external Knowledge Graphs enable the users to verify the reasoning behind the model prediction. In addition, we provide multiple adversarial attacks to compare the robustness of QA models. With these explainability methods and adversarial attacks, we aim to ease the research on trustworthy QA models. SQuARE is available on https://square.ukp-lab.de

TUbiblio