730 research outputs found
Iterative Edit-based Unsupervised Sentence Simplification
We present a new iterative approach towards unsupervised edit-based sentence simplification. Our approach is guided by a scoring function to select simplified sentences generated after iteratively performing word and phrase-level edits on the complex sentence. The scoring function measures different aspects of simplification: fluency, simplicity, and preservation of meaning. As a result, unlike past approaches, our method is controllable and interpretable and does not require a parallel training set since it is unsupervised. At the same time, using the Newsela and WikiLarge datasets, we experimentally show that our solution is nearly as effective as state-of-the-art supervised approaches
GRS: Combining Generation and Revision in Unsupervised Sentence Simplification
Text simplification is a task in the natural language processing field that alters a given text to reduce the structural and lexical complexity of the text while preserving the underlying meaning.
We can classify existing text simplification approaches into generative and revision-based methods. Through explicit edit operations such as word deletion and lexical substitution, revision-based strategies iteratively simplify a given text in multiple steps. However, generative approaches generate simplified sentences from a complex sentence in one step. Generative models do not have explicit edit operations but learn implicit edit operations from data. Revision-based methods are more controllable and interpretable than generative models. On the other hand, generative models can apply more complex edits (such as paraphrasing) to a given text compared to the revision-based method.
We propose GRS: an unsupervised approach to sentence simplification that combines text generation and text revision. We start with an iterative framework in which an input sentence is revised using explicit edit operations such as word deletion and add paraphrasing as a new edit operation. This allows us to combine the advantages of generative and revision-based approaches. Paraphrasing captures complex edit operations, and the use of explicit edit operations in an iterative manner provides controllability and interpretability.
We demonstrate the advantages of GRS compared to existing methods. To evaluate our model, we use Newsela and ASSET datasets that contain high-quality complex-simple sentence pairs and are commonly used in the literature. The Newsela dataset contains 1,840 news articles re-written for children at five different readability standards. The ASSET dataset comprises 2,359 sentences from English Wikipedia. GRS outperforms all unsupervised methods on the Newsela dataset and bridges the gap between revisions-based and generative models on ASSET datasets
Can Knowledge Graphs Simplify Text?
Knowledge Graph (KG)-to-Text Generation has seen recent improvements in
generating fluent and informative sentences which describe a given KG. As KGs
are widespread across multiple domains and contain important entity-relation
information, and as text simplification aims to reduce the complexity of a text
while preserving the meaning of the original text, we propose KGSimple, a novel
approach to unsupervised text simplification which infuses KG-established
techniques in order to construct a simplified KG path and generate a concise
text which preserves the original input's meaning. Through an iterative and
sampling KG-first approach, our model is capable of simplifying text when
starting from a KG by learning to keep important information while harnessing
KG-to-text generation to output fluent and descriptive sentences. We evaluate
various settings of the KGSimple model on currently-available KG-to-text
datasets, demonstrating its effectiveness compared to unsupervised text
simplification models which start with a given complex text. Our code is
available on GitHub.Comment: Accepted as a Main Conference Long Paper at CIKM 202
Automatic and Human-AI Interactive Text Generation
In this tutorial, we focus on text-to-text generation, a class of natural
language generation (NLG) tasks, that takes a piece of text as input and then
generates a revision that is improved according to some specific criteria
(e.g., readability or linguistic styles), while largely retaining the original
meaning and the length of the text. This includes many useful applications,
such as text simplification, paraphrase generation, style transfer, etc. In
contrast to text summarization and open-ended text completion (e.g., story),
the text-to-text generation tasks we discuss in this tutorial are more
constrained in terms of semantic consistency and targeted language styles. This
level of control makes these tasks ideal testbeds for studying the ability of
models to generate text that is both semantically adequate and stylistically
appropriate. Moreover, these tasks are interesting from a technical standpoint,
as they require complex combinations of lexical and syntactical
transformations, stylistic control, and adherence to factual knowledge, -- all
at once. With a special focus on text simplification and revision, this
tutorial aims to provide an overview of the state-of-the-art natural language
generation research from four major aspects -- Data, Models, Human-AI
Collaboration, and Evaluation -- and to discuss and showcase a few significant
and recent advances: (1) the use of non-retrogressive approaches; (2) the shift
from fine-tuning to prompting with large language models; (3) the development
of new learnable metric and fine-grained human evaluation framework; (4) a
growing body of studies and datasets on non-English languages; (5) the rise of
HCI+NLP+Accessibility interdisciplinary research to create real-world writing
assistant systems.Comment: To appear at ACL 2024, Tutoria
Dancing Between Success and Failure: Edit-level Simplification Evaluation using SALSA
Large language models (e.g., GPT-4) are uniquely capable of producing highly
rated text simplification, yet current human evaluation methods fail to provide
a clear understanding of systems' specific strengths and weaknesses. To address
this limitation, we introduce SALSA, an edit-based human annotation framework
that enables holistic and fine-grained text simplification evaluation. We
develop twenty one linguistically grounded edit types, covering the full
spectrum of success and failure across dimensions of conceptual, syntactic and
lexical simplicity. Using SALSA, we collect 19K edit annotations on 840
simplifications, revealing discrepancies in the distribution of simplification
strategies performed by fine-tuned models, prompted LLMs and humans, and find
GPT-3.5 performs more quality edits than humans, but still exhibits frequent
errors. Using our fine-grained annotations, we develop LENS-SALSA, a
reference-free automatic simplification metric, trained to predict sentence-
and word-level quality simultaneously. Additionally, we introduce word-level
quality estimation for simplification and report promising baseline results.
Our data, new metric, and annotation toolkit are available at
https://salsa-eval.com.Comment: Accepted to EMNLP 202
Expertise Style Transfer: A New Task Towards Better Communication between Experts and Laymen
The curse of knowledge can impede communication between experts and laymen.
We propose a new task of expertise style transfer and contribute a manually
annotated dataset with the goal of alleviating such cognitive biases. Solving
this task not only simplifies the professional language, but also improves the
accuracy and expertise level of laymen descriptions using simple words. This is
a challenging task, unaddressed in previous work, as it requires the models to
have expert intelligence in order to modify text with a deep understanding of
domain knowledge and structures. We establish the benchmark performance of five
state-of-the-art models for style transfer and text simplification. The results
demonstrate a significant gap between machine and human performance. We also
discuss the challenges of automatic evaluation, to provide insights into future
research directions. The dataset is publicly available at
https://srhthu.github.io/expertise-style-transfer.Comment: 11 pages, 6 figures; To appear in ACL 202
- …