13 research outputs found
Syn-QG: Syntactic and Shallow Semantic Rules for Question Generation
Question Generation (QG) is fundamentally a simple syntactic transformation;
however, many aspects of semantics influence what questions are good to form.
We implement this observation by developing Syn-QG, a set of transparent
syntactic rules leveraging universal dependencies, shallow semantic parsing,
lexical resources, and custom rules which transform declarative sentences into
question-answer pairs. We utilize PropBank argument descriptions and VerbNet
state predicates to incorporate shallow semantic content, which helps generate
questions of a descriptive nature and produce inferential and semantically
richer questions than existing systems. In order to improve syntactic fluency
and eliminate grammatically incorrect questions, we employ back-translation
over the output of these syntactic rules. A set of crowd-sourced evaluations
shows that our system can generate a larger number of highly grammatical and
relevant questions than previous QG systems and that back-translation
drastically improves grammaticality at a slight cost of generating irrelevant
questions.Comment: Some of the results in the paper were incorrec
DUQGen: Effective Unsupervised Domain Adaptation of Neural Rankers by Diversifying Synthetic Query Generation
State-of-the-art neural rankers pre-trained on large task-specific training
data such as MS-MARCO, have been shown to exhibit strong performance on various
ranking tasks without domain adaptation, also called zero-shot. However,
zero-shot neural ranking may be sub-optimal, as it does not take advantage of
the target domain information. Unfortunately, acquiring sufficiently large and
high quality target training data to improve a modern neural ranker can be
costly and time-consuming. To address this problem, we propose a new approach
to unsupervised domain adaptation for ranking, DUQGen, which addresses a
critical gap in prior literature, namely how to automatically generate both
effective and diverse synthetic training data to fine tune a modern neural
ranker for a new domain. Specifically, DUQGen produces a more effective
representation of the target domain by identifying clusters of similar
documents; and generates a more diverse training dataset by probabilistic
sampling over the resulting document clusters. Our extensive experiments, over
the standard BEIR collection, demonstrate that DUQGen consistently outperforms
all zero-shot baselines and substantially outperforms the SOTA baselines on 16
out of 18 datasets, for an average of 4% relative improvement across all
datasets. We complement our results with a thorough analysis for more in-depth
understanding of the proposed method's performance and to identify promising
areas for further improvements.Comment: NAACL 2024 Main Conferenc
An Interactive Query Generation Assistant using LLM-based Prompt Modification and User Feedback
While search is the predominant method of accessing information, formulating
effective queries remains a challenging task, especially for situations where
the users are not familiar with a domain, or searching for documents in other
languages, or looking for complex information such as events, which are not
easily expressible as queries. Providing example documents or passages of
interest, might be easier for a user, however, such query-by-example scenarios
are prone to concept drift, and are highly sensitive to the query generation
method. This demo illustrates complementary approaches of using LLMs
interactively, assisting and enabling the user to provide edits and feedback at
all stages of the query formulation process. The proposed Query Generation
Assistant is a novel search interface which supports automatic and interactive
query generation over a mono-linguial or multi-lingual document collection.
Specifically, the proposed assistive interface enables the users to refine the
queries generated by different LLMs, to provide feedback on the retrieved
documents or passages, and is able to incorporate the users' feedback as
prompts to generate more effective queries. The proposed interface is a
valuable experimental tool for exploring fine-tuning and prompting of LLMs for
query generation to qualitatively evaluate the effectiveness of retrieval and
ranking models, and for conducting Human-in-the-Loop (HITL) experiments for
complex search tasks where users struggle to formulate queries without such
assistance.Comment: Intelligence Advanced Research Projects Activity (IARPA) BETTER
Research Progra
CANDLE: Decomposing Conditional and Conjunctive Queries for Task-Oriented Dialogue Systems
Domain-specific dialogue systems generally determine user intents by relying
on sentence-level classifiers which mainly focus on single action sentences.
Such classifiers are not designed to effectively handle complex queries
composed of conditional and sequential clauses that represent multiple actions.
We attempt to decompose such queries into smaller single-action sub-queries
that are reasonable for intent classifiers to understand in a dialogue
pipeline. We release CANDLE (Conditional & AND type Expressions), a dataset
consisting of 3124 utterances manually tagged with conditional and sequential
labels and demonstrates this decomposition by training two baseline taggers
Automatic Construction of Evaluation Suites for Natural Language Generation Datasets
Machine learning approaches applied to NLP are often evaluated by summarizing
their performance in a single number, for example accuracy. Since most test
sets are constructed as an i.i.d. sample from the overall data, this approach
overly simplifies the complexity of language and encourages overfitting to the
head of the data distribution. As such, rare language phenomena or text about
underrepresented groups are not equally included in the evaluation. To
encourage more in-depth model analyses, researchers have proposed the use of
multiple test sets, also called challenge sets, that assess specific
capabilities of a model. In this paper, we develop a framework based on this
idea which is able to generate controlled perturbations and identify subsets in
text-to-scalar, text-to-text, or data-to-text settings. By applying this
framework to the GEM generation benchmark, we propose an evaluation suite made
of 80 challenge sets, demonstrate the kinds of analyses that it enables and
shed light onto the limits of current generation models
NusaCrowd: Open Source Initiative for Indonesian NLP Resources
We present NusaCrowd, a collaborative initiative to collect and unify
existing resources for Indonesian languages, including opening access to
previously non-public resources. Through this initiative, we have brought
together 137 datasets and 118 standardized data loaders. The quality of the
datasets has been assessed manually and automatically, and their value is
demonstrated through multiple experiments. NusaCrowd's data collection enables
the creation of the first zero-shot benchmarks for natural language
understanding and generation in Indonesian and the local languages of
Indonesia. Furthermore, NusaCrowd brings the creation of the first multilingual
automatic speech recognition benchmark in Indonesian and the local languages of
Indonesia. Our work strives to advance natural language processing (NLP)
research for languages that are under-represented despite being widely spoken
A Bird's-Eye Tutorial of Graph Attention Architectures
Graph Neural Networks (GNNs) have shown tremendous strides in performance for
graph-structured problems especially in the domains of natural language
processing, computer vision and recommender systems. Inspired by the success of
the transformer architecture, there has been an ever-growing body of work on
attention variants of GNNs attempting to advance the state of the art in many
of these problems. Incorporating "attention" into graph mining has been viewed
as a way to overcome the noisiness, heterogenity and complexity associated with
graph-structured data as well as to encode soft-inductive bias. It is hence
crucial and advantageous to study these variants from a bird's-eye view to
assess their strengths and weaknesses. We provide a systematic and focused
tutorial centered around attention based GNNs in a hope to benefit researchers
dealing with graph-structured problems. Our tutorial looks at GNN variants from
the point of view of the attention function and iteratively builds the reader's
understanding of different graph attention variants.Comment: 8 pages Tutoria
The GEM Benchmark:Natural Language Generation, its Evaluation and Metrics
We introduce GEM, a living benchmark for natural language Generation (NLG),
its Evaluation, and Metrics. Measuring progress in NLG relies on a constantly
evolving ecosystem of automated metrics, datasets, and human evaluation
standards. Due to this moving target, new models often still evaluate on
divergent anglo-centric corpora with well-established, but flawed, metrics.
This disconnect makes it challenging to identify the limitations of current
models and opportunities for progress. Addressing this limitation, GEM provides
an environment in which models can easily be applied to a wide set of tasks and
in which evaluation strategies can be tested. Regular updates to the benchmark
will help NLG research become more multilingual and evolve the challenge
alongside models. This paper serves as the description of the data for which we
are organizing a shared task at our ACL 2021 Workshop and to which we invite
the entire NLG community to participate