7 research outputs found
Asking More Informative Questions for Grounded Retrieval
When a model is trying to gather information in an interactive setting, it
benefits from asking informative questions. However, in the case of a grounded
multi-turn image identification task, previous studies have been constrained to
polar yes/no questions, limiting how much information the model can gain in a
single turn. We present an approach that formulates more informative,
open-ended questions. In doing so, we discover that off-the-shelf visual
question answering (VQA) models often make presupposition errors, which
standard information gain question selection methods fail to account for. To
address this issue, we propose a method that can incorporate presupposition
handling into both question selection and belief updates. Specifically, we use
a two-stage process, where the model first filters out images which are
irrelevant to a given question, then updates its beliefs about which image the
user intends. Through self-play and human evaluations, we show that our method
is successful in asking informative open-ended questions, increasing accuracy
over the past state-of-the-art by 14%, while resulting in 48% more efficient
games in human evaluations
PANCETTA: Phoneme Aware Neural Completion to Elicit Tongue Twisters Automatically
Tongue twisters are meaningful sentences that are difficult to pronounce. The
process of automatically generating tongue twisters is challenging since the
generated utterance must satisfy two conditions at once: phonetic difficulty
and semantic meaning. Furthermore, phonetic difficulty is itself hard to
characterize and is expressed in natural tongue twisters through a
heterogeneous mix of phenomena such as alliteration and homophony. In this
paper, we propose PANCETTA: Phoneme Aware Neural Completion to Elicit Tongue
Twisters Automatically. We leverage phoneme representations to capture the
notion of phonetic difficulty, and we train language models to generate
original tongue twisters on two proposed task settings. To do this, we curate a
dataset called PANCETTA, consisting of existing English tongue twisters.
Through automatic and human evaluation, as well as qualitative analysis, we
show that PANCETTA generates novel, phonetically difficult, fluent, and
semantically meaningful tongue twisters.Comment: EACL 2023. Code at https://github.com/sedrickkeh/PANCETT
EUREKA: EUphemism Recognition Enhanced through Knn-based methods and Augmentation
We introduce EUREKA, an ensemble-based approach for performing automatic euphemism detection. We (1) identify and correct potentially mislabelled rows in the dataset, (2) curate an expanded corpus called EuphAug, (3) leverage model representations of Potentially Euphemistic Terms (PETs), and (4) explore using representations of semantically close sentences to aid in classification. Using our augmented dataset and kNN-based methods, EUREKA was able to achieve state-of-the-art results on the public leaderboard of the Euphemism Detection Shared Task, ranking first with a macro F1 score of 0.881
PINEAPPLE: Personifying INanimate Entities by Acquiring Parallel Personification data for Learning Enhanced generation
A personification is a figure of speech that endows inanimate entities with
properties and actions typically seen as requiring animacy. In this paper, we
explore the task of personification generation. To this end, we propose
PINEAPPLE: Personifying INanimate Entities by Acquiring Parallel
Personification data for Learning Enhanced generation. We curate a corpus of
personifications called PersonifCorp, together with automatically generated
de-personified literalizations of these personifications. We demonstrate the
usefulness of this parallel corpus by training a seq2seq model to personify a
given literal input. Both automatic and human evaluations show that fine-tuning
with PersonifCorp leads to significant gains in personification-related
qualities such as animacy and interestingness. A detailed qualitative analysis
also highlights key strengths and imperfections of PINEAPPLE over baselines,
demonstrating a strong ability to generate diverse and creative
personifications that enhance the overall appeal of a sentence.Comment: Accepted to COLING 2022; official Github repo at
https://github.com/sedrickkeh/PINEAPPL
NewsPanda: Media Monitoring for Timely Conservation Action
Non-governmental organizations for environmental conservation have a
significant interest in monitoring conservation-related media and getting
timely updates about infrastructure construction projects as they may cause
massive impact to key conservation areas. Such monitoring, however, is
difficult and time-consuming. We introduce NewsPanda, a toolkit which
automatically detects and analyzes online articles related to environmental
conservation and infrastructure construction. We fine-tune a BERT-based model
using active learning methods and noise correction algorithms to identify
articles that are relevant to conservation and infrastructure construction. For
the identified articles, we perform further analysis, extracting keywords and
finding potentially related sources. NewsPanda has been successfully deployed
by the World Wide Fund for Nature teams in the UK, India, and Nepal since
February 2022. It currently monitors over 80,000 websites and 1,074
conservation sites across India and Nepal, saving more than 30 hours of human
efforts weekly. We have now scaled it up to cover 60,000 conservation sites
globally.Comment: Accepted to IAAI-23: 35th Annual Conference on Innovative
Applications of Artificial Intelligence. Winner of IAAI Deployed Application
Award. Code at https://github.com/NewsPanda-WWF-CMU/weekly-pipelin
NewsPanda: Media Monitoring for Timely Conservation Action
Non-governmental organizations for environmental conservation have a significant interest in monitoring conservation-related media and getting timely updates about infrastructure construction projects as they may cause massive impact to key conservation areas. Such monitoring, however, is difficult and time-consuming. We introduce NewsPanda, a toolkit which automatically detects and analyzes online articles related to environmental conservation and infrastructure construction. We fine-tune a BERT-based model using active learning methods and noise correction algorithms to identify articles that are relevant to conservation and infrastructure construction. For the identified articles, we perform further analysis, extracting keywords and finding potentially related sources. NewsPanda has been successfully deployed by the World Wide Fund for Nature teams in the UK, India, and Nepal since February 2022. It currently monitors over 80,000 websites and 1,074 conservation sites across India and Nepal, saving more than 30 hours of human efforts weekly. We have now scaled it up to cover 60,000 conservation sites globally