11 research outputs found
Topic Modeling for Analysing Similarity between Users in Twitter
La minerĂa de datos en redes sociales está ganando importancia debido a que permite realizar campañas de marketing más precisas. Por ejemplo, Google realiza un análisis de todos nuestros datos: vĂdeos que vemos, tĂ©rminos que buscamos, páginas webs a las que accedemos, aplicaciones que descargamos, etc. para conocernos mejor y mostrarnos publicidad personalizada.
LDA es un modelo estadĂstico generativo para modelar documentos. Existen diversos algoritmos que dado un conjunto de documentos permiten obtener un modelo LDA que podrĂa haber generado esos documentos. Con ese modelo es posible observar los temas usados en esos documentos y las palabras más relevantes para cada tema.
En el presente trabajo se pretende realizar una primera aproximaciĂłn a la minerĂa de datos en Twitter. Para ello, usando la API de Twitter se han descargado tweets de diversos usuarios y de sus seguidores. Posteriormente se han procesado esos Tweets generando documentos y se ha aplicado la implementaciĂłn de Gensim del algoritmo Online LDA para obtener los temas de los documentos. Posteriormente, se han comparado los temas de los usuarios con los de sus seguidores.
TambiĂ©n se proporciona un análisis del estado del arte de la minerĂa de datos en Twitter
UKP-SQuARE: An Interactive Tool for Teaching Question Answering
The exponential growth of question answering (QA) has made it an
indispensable topic in any Natural Language Processing (NLP) course.
Additionally, the breadth of QA derived from this exponential growth makes it
an ideal scenario for teaching related NLP topics such as information
retrieval, explainability, and adversarial attacks among others. In this paper,
we introduce UKP-SQuARE as a platform for QA education. This platform provides
an interactive environment where students can run, compare, and analyze various
QA models from different perspectives, such as general behavior,
explainability, and robustness. Therefore, students can get a first-hand
experience in different QA techniques during the class. Thanks to this, we
propose a learner-centered approach for QA education in which students
proactively learn theoretical concepts and acquire problem-solving skills
through interactive exploration, experimentation, and practical assignments,
rather than solely relying on traditional lectures. To evaluate the
effectiveness of UKP-SQuARE in teaching scenarios, we adopted it in a
postgraduate NLP course and surveyed the students after the course. Their
positive feedback shows the platform's effectiveness in their course and
invites a wider adoption.Comment: Accepted by BEA workshop, ACL202
MetaQA: Combining Expert Agents for Multi-Skill Question Answering
The recent explosion of question answering (QA) datasets and models has
increased the interest in the generalization of models across multiple domains
and formats by either training on multiple datasets or by combining multiple
models. Despite the promising results of multi-dataset models, some domains or
QA formats may require specific architectures, and thus the adaptability of
these models might be limited. In addition, current approaches for combining
models disregard cues such as question-answer compatibility. In this work, we
propose to combine expert agents with a novel, flexible, and training-efficient
architecture that considers questions, answer predictions, and
answer-prediction confidence scores to select the best answer among a list of
answer candidates. Through quantitative and qualitative experiments we show
that our model i) creates a collaboration between agents that outperforms
previous multi-agent and multi-dataset approaches in both in-domain and
out-of-domain scenarios, ii) is highly data-efficient to train, and iii) can be
adapted to any QA format. We release our code and a dataset of answer
predictions from expert agents for 16 QA datasets to foster future developments
of multi-agent systems on https://github.com/UKPLab/MetaQA.Comment: Accepted at EACL 202
Let Me Know What to Ask: Interrogative-Word-Aware Question Generation
Question Generation (QG) is a Natural Language Processing (NLP) task that
aids advances in Question Answering (QA) and conversational assistants.
Existing models focus on generating a question based on a text and possibly the
answer to the generated question. They need to determine the type of
interrogative word to be generated while having to pay attention to the grammar
and vocabulary of the question. In this work, we propose
Interrogative-Word-Aware Question Generation (IWAQG), a pipelined system
composed of two modules: an interrogative word classifier and a QG model. The
first module predicts the interrogative word that is provided to the second
module to create the question. Owing to an increased recall of deciding the
interrogative words to be used for the generated questions, the proposed model
achieves new state-of-the-art results on the task of QG in SQuAD, improving
from 46.58 to 47.69 in BLEU-1, 17.55 to 18.53 in BLEU-4, 21.24 to 22.33 in
METEOR, and from 44.53 to 46.94 in ROUGE-L.Comment: Accepted at 2nd Workshop on Machine Reading for Question Answering
(MRQA), EMNLP 201
UKP-SQuARE v3: A Platform for Multi-Agent QA Research
The continuous development of Question Answering (QA) datasets has drawn the
research community's attention toward multi-domain models. A popular approach
is to use multi-dataset models, which are models trained on multiple datasets
to learn their regularities and prevent overfitting to a single dataset.
However, with the proliferation of QA models in online repositories such as
GitHub or Hugging Face, an alternative is becoming viable. Recent works have
demonstrated that combining expert agents can yield large performance gains
over multi-dataset models. To ease research in multi-agent models, we extend
UKP-SQuARE, an online platform for QA research, to support three families of
multi-agent systems: i) agent selection, ii) early-fusion of agents, and iii)
late-fusion of agents. We conduct experiments to evaluate their inference speed
and discuss the performance vs. speed trade-off compared to multi-dataset
models. UKP-SQuARE is open-source and publicly available at
http://square.ukp-lab.de
Surveying (Dis)Parities and Concerns of Compute Hungry NLP Research
Many recent improvements in NLP stem from the development and use of large
pre-trained language models (PLMs) with billions of parameters. Large model
sizes makes computational cost one of the main limiting factors for training
and evaluating such models; and has raised severe concerns about the
sustainability, reproducibility, and inclusiveness for researching PLMs. These
concerns are often based on personal experiences and observations. However,
there had not been any large-scale surveys that investigate them. In this work,
we provide a first attempt to quantify these concerns regarding three topics,
namely, environmental impact, equity, and impact on peer reviewing. By
conducting a survey with 312 participants from the NLP community, we capture
existing (dis)parities between different and within groups with respect to
seniority, academia, and industry; and their impact on the peer reviewing
process. For each topic, we provide an analysis and devise recommendations to
mitigate found disparities, some of which already successfully implemented.
Finally, we discuss additional concerns raised by many participants in
free-text responses
UKP-SQuARE: An Interactive Tool for Teaching Question Answering
The exponential growth of question answering (QA) has made it an indispensable topic in any Natural Language Processing (NLP) course. Additionally, the breadth of QA derived from this exponential growth makes it an ideal scenario for teaching related NLP topics such as information retrieval, explainability, and adversarial attacks among others. In this paper, we introduce UKP-SQuARE as a platform for QA education. This platform provides an interactive environment where students can run, compare, and analyze various QA models from different perspectives, such as general behavior, explainability, and robustness. Therefore, students can get a first-hand experience in different QA techniques during the class. Thanks to this, we propose a learner-centered approach for QA education in which students proactively learn theoretical concepts and acquire problem-solving skills through interactive exploration, experimentation, and practical assignments, rather than solely relying on traditional lectures. To evaluate the effectiveness of UKP-SQuARE in teaching scenarios, we adopted it in a postgraduate NLP course and surveyed the students after the course. Their positive feedback shows the platform’s effectiveness in their course and invites a wider adoption
Regularization of Distinct Strategies for Unsupervised Question Generation
Unsupervised question answering (UQA) has been proposed to avoid the high cost of creating high-quality datasets for QA. One approach to UQA is to train a QA model with questions generated automatically. However, the generated questions are either too similar to a word sequence in the context or too drifted from the semantics of the context, thereby making it difficult to train a robust QA model. We propose a novel regularization method based on teacher-student architecture to avoid bias toward a particular question generation strategy and modulate the process of generating individual words when a question is generated. Our experiments demonstrate that we have achieved the goal of generating higher-quality questions for UQA across diverse QA datasets and tasks. We also show that this method can be useful for creating a QA model with few-shot learning
UKP-SQuARE v3: A Platform for Multi-Agent QA Research
The continuous development of Question Answering (QA) datasets has drawn the research community’s attention toward multi-domain models. A popular approach is to use multi-dataset models, which are models trained on multiple datasets to learn their regularities and prevent overfitting to a single dataset. However, with the proliferation of QA models in online repositories such as GitHub or Hugging Face, an alternative is becoming viable. Recent works have demonstrated that combining expert agents can yield large performance gains over multi-dataset models. To ease research in multi-agent models, we extend UKP-SQuARE, an online platform for QA research, to support three families of multi-agent systems: i) agent selection, ii) early-fusion of agents, and iii) late-fusion of agents. We conduct experiments to evaluate their inference speed and discuss the performance vs. speed trade-off compared to multi-dataset models. UKP-SQuARE is open-source and publicly available
UKP-SQuARE v2: Explainability and Adversarial Attacks for Trustworthy QA
Question Answering (QA) systems are increasingly deployed in applications where they support real-world decisions. However, state-of-the-art models rely on deep neural networks, which are difficult to interpret by humans. Inherently interpretable models or post hoc explainability methods can help users to comprehend how a model arrives at its prediction and, if successful, increase their trust in the system. Furthermore, researchers can leverage these insights to develop new methods that are more accurate and less biased. In this paper, we introduce SQuARE v2, the new version of SQuARE, to provide an explainability infrastructure for comparing models based on methods such as saliency maps and graph-based explanations. While saliency maps are useful to inspect the importance of each input token for the model’s prediction, graph-based explanations from external Knowledge Graphs enable the users to verify the reasoning behind the model prediction. In addition, we provide multiple adversarial attacks to compare the robustness of QA models. With these explainability methods and adversarial attacks, we aim to ease the research on trustworthy QA models. SQuARE is available on https://square.ukp-lab.de