9 research outputs found
The Effects of Gender Signals and Performance in Online Product Reviews
This work quantifies the effects of signaling gender through gender specific user names, on the success of reviews written on the popular amazon.com shopping platform. Highly rated reviews play an important role in e-commerce since they are prominently displayed next to products. Differences in reviews, perceived - consciously or unconsciously - with respect to gender signals, can lead to crucial biases in determining what content and perspectives are represented among top reviews. To investigate this, we extract signals of author gender from user names to select reviews where the author’s likely gender can be inferred. Using reviews authored by these gender-signaling authors, we train a deep learning classifier to quantify the gendered writing style (i.e., gendered performance) of reviews written by authors who do not send clear gender signals via their user name. We contrast the effects of gender signaling and performance on the review helpfulness ratings using matching experiments. This is aimed at understanding if an advantage is to be gained by (not) signaling one's gender when posting reviews. While we find no general trend that gendered signals or performances influence overall review success, we find strong context-specific effects. For example, reviews in product categories such as Electronics or Computers are perceived as less helpful when authors signal that they are likely woman, but are received as more helpful in categories such as Beauty or Clothing. In addition to these interesting findings, we believe this general chain of tools could be deployed across various social media platforms
Are Emergent Abilities in Large Language Models just In-Context Learning?
Large language models have exhibited emergent abilities, demonstrating
exceptional performance across diverse tasks for which they were not explicitly
trained, including those that require complex reasoning abilities. The
emergence of such abilities carries profound implications for the future
direction of research in NLP, especially as the deployment of such models
becomes more prevalent. However, one key challenge is that the evaluation of
these abilities is often confounded by competencies that arise in models
through alternative prompting techniques, such as in-context learning and
instruction following, which also emerge as the models are scaled up. In this
study, we provide the first comprehensive examination of these emergent
abilities while accounting for various potentially biasing factors that can
influence the evaluation of models. We conduct rigorous tests on a set of 18
models, encompassing a parameter range from 60 million to 175 billion
parameters, across a comprehensive set of 22 tasks. Through an extensive series
of over 1,000 experiments, we provide compelling evidence that emergent
abilities can primarily be ascribed to in-context learning. We find no evidence
for the emergence of reasoning abilities, thus providing valuable insights into
the underlying mechanisms driving the observed abilities and thus alleviating
safety concerns regarding their use.Comment: Code available at https://github.com/UKPLab/on-emergence and data
available at https://tudatalib.ulb.tu-darmstadt.de/handle/tudatalib/393
UKP-SQuARE v3: A Platform for Multi-Agent QA Research
The continuous development of Question Answering (QA) datasets has drawn the
research community's attention toward multi-domain models. A popular approach
is to use multi-dataset models, which are models trained on multiple datasets
to learn their regularities and prevent overfitting to a single dataset.
However, with the proliferation of QA models in online repositories such as
GitHub or Hugging Face, an alternative is becoming viable. Recent works have
demonstrated that combining expert agents can yield large performance gains
over multi-dataset models. To ease research in multi-agent models, we extend
UKP-SQuARE, an online platform for QA research, to support three families of
multi-agent systems: i) agent selection, ii) early-fusion of agents, and iii)
late-fusion of agents. We conduct experiments to evaluate their inference speed
and discuss the performance vs. speed trade-off compared to multi-dataset
models. UKP-SQuARE is open-source and publicly available at
http://square.ukp-lab.de
Effective Cross-Task Transfer Learning for Explainable Natural Language Inference with T5
We compare sequential fine-tuning with a model for multi-task learning in the context where we are interested in boosting performance on two of the tasks, one of which depends on the other. We test these models on the FigLang2022 shared task which requires participants to predict language inference labels on figurative language along with corresponding textual explanations of the inference predictions. Our results show that while sequential multi-task learning can be tuned to be good at the first of two target tasks, it performs less well on the second and additionally struggles with overfitting. Our findings show that simple sequential fine-tuning of text-to-text models is an extraordinarily powerful method of achieving cross-task knowledge transfer while simultaneously predicting multiple interdependent targets. So much so, that our best model achieved the (tied) highest score on the task
On emergence
Output files for the paper Are Emergent Abilities in Large Language Models just In-Context
Learning
UKP-SQuARE v3: A Platform for Multi-Agent QA Research
The continuous development of Question Answering (QA) datasets has drawn the research community’s attention toward multi-domain models. A popular approach is to use multi-dataset models, which are models trained on multiple datasets to learn their regularities and prevent overfitting to a single dataset. However, with the proliferation of QA models in online repositories such as GitHub or Hugging Face, an alternative is becoming viable. Recent works have demonstrated that combining expert agents can yield large performance gains over multi-dataset models. To ease research in multi-agent models, we extend UKP-SQuARE, an online platform for QA research, to support three families of multi-agent systems: i) agent selection, ii) early-fusion of agents, and iii) late-fusion of agents. We conduct experiments to evaluate their inference speed and discuss the performance vs. speed trade-off compared to multi-dataset models. UKP-SQuARE is open-source and publicly available
UKP-SQuARE v2: Explainability and Adversarial Attacks for Trustworthy QA
Question Answering (QA) systems are increasingly deployed in applications where they support real-world decisions. However, state-of-the-art models rely on deep neural networks, which are difficult to interpret by humans. Inherently interpretable models or post hoc explainability methods can help users to comprehend how a model arrives at its prediction and, if successful, increase their trust in the system. Furthermore, researchers can leverage these insights to develop new methods that are more accurate and less biased. In this paper, we introduce SQuARE v2, the new version of SQuARE, to provide an explainability infrastructure for comparing models based on methods such as saliency maps and graph-based explanations. While saliency maps are useful to inspect the importance of each input token for the model’s prediction, graph-based explanations from external Knowledge Graphs enable the users to verify the reasoning behind the model prediction. In addition, we provide multiple adversarial attacks to compare the robustness of QA models. With these explainability methods and adversarial attacks, we aim to ease the research on trustworthy QA models. SQuARE is available on https://square.ukp-lab.de
UKP-SQUARE: An Online Platform for Question Answering Research
Recent advances in NLP and information retrieval have given rise to a diverse
set of question answering tasks that are of different formats (e.g.,
extractive, abstractive), require different model architectures (e.g.,
generative, discriminative), and setups (e.g., with or without retrieval).
Despite having a large number of powerful, specialized QA pipelines (which we
refer to as Skills) that consider a single domain, model or setup, there exists
no framework where users can easily explore and compare such pipelines and can
extend them according to their needs. To address this issue, we present
UKP-SQUARE, an extensible online QA platform for researchers which allows users
to query and analyze a large collection of modern Skills via a user-friendly
web interface and integrated behavioural tests. In addition, QA researchers can
develop, manage, and share their custom Skills using our microservices that
support a wide range of models (Transformers, Adapters, ONNX), datastores and
retrieval techniques (e.g., sparse and dense). UKP-SQUARE is available on
https://square.ukp-lab.de.Comment: Accepted at ACL 2022 Demo Trac