1,618 research outputs found
A Continuously Growing Dataset of Sentential Paraphrases
A major challenge in paraphrase research is the lack of parallel corpora. In
this paper, we present a new method to collect large-scale sentential
paraphrases from Twitter by linking tweets through shared URLs. The main
advantage of our method is its simplicity, as it gets rid of the classifier or
human in the loop needed to select data before annotation and subsequent
application of paraphrase identification algorithms in the previous work. We
present the largest human-labeled paraphrase corpus to date of 51,524 sentence
pairs and the first cross-domain benchmarking for automatic paraphrase
identification. In addition, we show that more than 30,000 new sentential
paraphrases can be easily and continuously captured every month at ~70%
precision, and demonstrate their utility for downstream NLP tasks through
phrasal paraphrase extraction. We make our code and data freely available.Comment: 11 pages, accepted to EMNLP 201
Paraphrase Types for Generation and Detection
Current approaches in paraphrase generation and detection heavily rely on a
single general similarity score, ignoring the intricate linguistic properties
of language. This paper introduces two new tasks to address this shortcoming by
considering paraphrase types - specific linguistic perturbations at particular
text positions. We name these tasks Paraphrase Type Generation and Paraphrase
Type Detection. Our results suggest that while current techniques perform well
in a binary classification scenario, i.e., paraphrased or not, the inclusion of
fine-grained paraphrase types poses a significant challenge. While most
approaches are good at generating and detecting general semantic similar
content, they fail to understand the intrinsic linguistic variables they
manipulate. Models trained in generating and identifying paraphrase types also
show improvements in tasks without them. In addition, scaling these models
further improves their ability to understand paraphrase types. We believe
paraphrase types can unlock a new paradigm for developing paraphrase models and
solving tasks in the future.Comment: Published at EMNLP 202
Genie: A Generator of Natural Language Semantic Parsers for Virtual Assistant Commands
To understand diverse natural language commands, virtual assistants today are
trained with numerous labor-intensive, manually annotated sentences. This paper
presents a methodology and the Genie toolkit that can handle new compound
commands with significantly less manual effort. We advocate formalizing the
capability of virtual assistants with a Virtual Assistant Programming Language
(VAPL) and using a neural semantic parser to translate natural language into
VAPL code. Genie needs only a small realistic set of input sentences for
validating the neural model. Developers write templates to synthesize data;
Genie uses crowdsourced paraphrases and data augmentation, along with the
synthesized data, to train a semantic parser. We also propose design principles
that make VAPL languages amenable to natural language translation. We apply
these principles to revise ThingTalk, the language used by the Almond virtual
assistant. We use Genie to build the first semantic parser that can support
compound virtual assistants commands with unquoted free-form parameters. Genie
achieves a 62% accuracy on realistic user inputs. We demonstrate Genie's
generality by showing a 19% and 31% improvement over the previous state of the
art on a music skill, aggregate functions, and access control.Comment: To appear in PLDI 201
A Survey of Current Datasets for Vision and Language Research
Integrating vision and language has long been a dream in work on artificial
intelligence (AI). In the past two years, we have witnessed an explosion of
work that brings together vision and language from images to videos and beyond.
The available corpora have played a crucial role in advancing this area of
research. In this paper, we propose a set of quality metrics for evaluating and
analyzing the vision & language datasets and categorize them accordingly. Our
analyses show that the most recent datasets have been using more complex
language and more abstract concepts, however, there are different strengths and
weaknesses in each.Comment: To appear in EMNLP 2015, short proceedings. Dataset analysis and
discussion expanded, including an initial examination into reporting bias for
one of them. F.F. and N.M. contributed equally to this wor
Dancing Between Success and Failure: Edit-level Simplification Evaluation using SALSA
Large language models (e.g., GPT-4) are uniquely capable of producing highly
rated text simplification, yet current human evaluation methods fail to provide
a clear understanding of systems' specific strengths and weaknesses. To address
this limitation, we introduce SALSA, an edit-based human annotation framework
that enables holistic and fine-grained text simplification evaluation. We
develop twenty one linguistically grounded edit types, covering the full
spectrum of success and failure across dimensions of conceptual, syntactic and
lexical simplicity. Using SALSA, we collect 19K edit annotations on 840
simplifications, revealing discrepancies in the distribution of simplification
strategies performed by fine-tuned models, prompted LLMs and humans, and find
GPT-3.5 performs more quality edits than humans, but still exhibits frequent
errors. Using our fine-grained annotations, we develop LENS-SALSA, a
reference-free automatic simplification metric, trained to predict sentence-
and word-level quality simultaneously. Additionally, we introduce word-level
quality estimation for simplification and report promising baseline results.
Our data, new metric, and annotation toolkit are available at
https://salsa-eval.com.Comment: Accepted to EMNLP 202
An End-to-end Neural Natural Language Interface for Databases
The ability to extract insights from new data sets is critical for decision
making. Visual interactive tools play an important role in data exploration
since they provide non-technical users with an effective way to visually
compose queries and comprehend the results. Natural language has recently
gained traction as an alternative query interface to databases with the
potential to enable non-expert users to formulate complex questions and
information needs efficiently and effectively. However, understanding natural
language questions and translating them accurately to SQL is a challenging
task, and thus Natural Language Interfaces for Databases (NLIDBs) have not yet
made their way into practical tools and commercial products.
In this paper, we present DBPal, a novel data exploration tool with a natural
language interface. DBPal leverages recent advances in deep models to make
query understanding more robust in the following ways: First, DBPal uses a deep
model to translate natural language statements to SQL, making the translation
process more robust to paraphrasing and other linguistic variations. Second, to
support the users in phrasing questions without knowing the database schema and
the query features, DBPal provides a learned auto-completion model that
suggests partial query extensions to users during query formulation and thus
helps to write complex queries
LENS: A Learnable Evaluation Metric for Text Simplification
Training learnable metrics using modern language models has recently emerged
as a promising method for the automatic evaluation of machine translation.
However, existing human evaluation datasets for text simplification have
limited annotations that are based on unitary or outdated models, making them
unsuitable for this approach. To address these issues, we introduce the
SimpEval corpus that contains: SimpEval_past, comprising 12K human ratings on
2.4K simplifications of 24 past systems, and SimpEval_2022, a challenging
simplification benchmark consisting of over 1K human ratings of 360
simplifications including GPT-3.5 generated text. Training on SimpEval, we
present LENS, a Learnable Evaluation Metric for Text Simplification. Extensive
empirical results show that LENS correlates much better with human judgment
than existing metrics, paving the way for future progress in the evaluation of
text simplification. We also introduce Rank and Rate, a human evaluation
framework that rates simplifications from several models in a list-wise manner
using an interactive interface, which ensures both consistency and accuracy in
the evaluation process and is used to create the SimpEval datasets.Comment: Accepted at ACL 202
- …