26 research outputs found
Syn-QG: Syntactic and Shallow Semantic Rules for Question Generation
Question Generation (QG) is fundamentally a simple syntactic transformation;
however, many aspects of semantics influence what questions are good to form.
We implement this observation by developing Syn-QG, a set of transparent
syntactic rules leveraging universal dependencies, shallow semantic parsing,
lexical resources, and custom rules which transform declarative sentences into
question-answer pairs. We utilize PropBank argument descriptions and VerbNet
state predicates to incorporate shallow semantic content, which helps generate
questions of a descriptive nature and produce inferential and semantically
richer questions than existing systems. In order to improve syntactic fluency
and eliminate grammatically incorrect questions, we employ back-translation
over the output of these syntactic rules. A set of crowd-sourced evaluations
shows that our system can generate a larger number of highly grammatical and
relevant questions than previous QG systems and that back-translation
drastically improves grammaticality at a slight cost of generating irrelevant
questions.Comment: Some of the results in the paper were incorrec
An Interactive Query Generation Assistant using LLM-based Prompt Modification and User Feedback
While search is the predominant method of accessing information, formulating
effective queries remains a challenging task, especially for situations where
the users are not familiar with a domain, or searching for documents in other
languages, or looking for complex information such as events, which are not
easily expressible as queries. Providing example documents or passages of
interest, might be easier for a user, however, such query-by-example scenarios
are prone to concept drift, and are highly sensitive to the query generation
method. This demo illustrates complementary approaches of using LLMs
interactively, assisting and enabling the user to provide edits and feedback at
all stages of the query formulation process. The proposed Query Generation
Assistant is a novel search interface which supports automatic and interactive
query generation over a mono-linguial or multi-lingual document collection.
Specifically, the proposed assistive interface enables the users to refine the
queries generated by different LLMs, to provide feedback on the retrieved
documents or passages, and is able to incorporate the users' feedback as
prompts to generate more effective queries. The proposed interface is a
valuable experimental tool for exploring fine-tuning and prompting of LLMs for
query generation to qualitatively evaluate the effectiveness of retrieval and
ranking models, and for conducting Human-in-the-Loop (HITL) experiments for
complex search tasks where users struggle to formulate queries without such
assistance.Comment: Intelligence Advanced Research Projects Activity (IARPA) BETTER
Research Progra
CANDLE: Decomposing Conditional and Conjunctive Queries for Task-Oriented Dialogue Systems
Domain-specific dialogue systems generally determine user intents by relying
on sentence-level classifiers which mainly focus on single action sentences.
Such classifiers are not designed to effectively handle complex queries
composed of conditional and sequential clauses that represent multiple actions.
We attempt to decompose such queries into smaller single-action sub-queries
that are reasonable for intent classifiers to understand in a dialogue
pipeline. We release CANDLE (Conditional & AND type Expressions), a dataset
consisting of 3124 utterances manually tagged with conditional and sequential
labels and demonstrates this decomposition by training two baseline taggers
Automatic Construction of Evaluation Suites for Natural Language Generation Datasets
Machine learning approaches applied to NLP are often evaluated by summarizing
their performance in a single number, for example accuracy. Since most test
sets are constructed as an i.i.d. sample from the overall data, this approach
overly simplifies the complexity of language and encourages overfitting to the
head of the data distribution. As such, rare language phenomena or text about
underrepresented groups are not equally included in the evaluation. To
encourage more in-depth model analyses, researchers have proposed the use of
multiple test sets, also called challenge sets, that assess specific
capabilities of a model. In this paper, we develop a framework based on this
idea which is able to generate controlled perturbations and identify subsets in
text-to-scalar, text-to-text, or data-to-text settings. By applying this
framework to the GEM generation benchmark, we propose an evaluation suite made
of 80 challenge sets, demonstrate the kinds of analyses that it enables and
shed light onto the limits of current generation models
NusaCrowd: Open Source Initiative for Indonesian NLP Resources
We present NusaCrowd, a collaborative initiative to collect and unify
existing resources for Indonesian languages, including opening access to
previously non-public resources. Through this initiative, we have brought
together 137 datasets and 118 standardized data loaders. The quality of the
datasets has been assessed manually and automatically, and their value is
demonstrated through multiple experiments. NusaCrowd's data collection enables
the creation of the first zero-shot benchmarks for natural language
understanding and generation in Indonesian and the local languages of
Indonesia. Furthermore, NusaCrowd brings the creation of the first multilingual
automatic speech recognition benchmark in Indonesian and the local languages of
Indonesia. Our work strives to advance natural language processing (NLP)
research for languages that are under-represented despite being widely spoken
GEMv2 : Multilingual NLG benchmarking in a single line of code
Evaluation in machine learning is usually informed by past choices, for example which datasets or metrics to use. This standardization enables the comparison on equal footing using leaderboards, but the evaluation choices become sub-optimal as better alternatives arise. This problem is especially pertinent in natural language generation which requires ever-improving suites of datasets, metrics, and human evaluation to make definitive claims. To make following best model evaluation practices easier, we introduce GEMv2. The new version of the Generation, Evaluation, and Metrics Benchmark introduces a modular infrastructure for dataset, model, and metric developers to benefit from each others work. GEMv2 supports 40 documented datasets in 51 languages. Models for all datasets can be evaluated online and our interactive data card creation and rendering tools make it easier to add new datasets to the living benchmark.Peer reviewe