730 research outputs found

    Iterative Edit-based Unsupervised Sentence Simplification

    Get PDF
    We present a new iterative approach towards unsupervised edit-based sentence simplification. Our approach is guided by a scoring function to select simplified sentences generated after iteratively performing word and phrase-level edits on the complex sentence. The scoring function measures different aspects of simplification: fluency, simplicity, and preservation of meaning. As a result, unlike past approaches, our method is controllable and interpretable and does not require a parallel training set since it is unsupervised. At the same time, using the Newsela and WikiLarge datasets, we experimentally show that our solution is nearly as effective as state-of-the-art supervised approaches

    GRS: Combining Generation and Revision in Unsupervised Sentence Simplification

    Get PDF
    Text simplification is a task in the natural language processing field that alters a given text to reduce the structural and lexical complexity of the text while preserving the underlying meaning. We can classify existing text simplification approaches into generative and revision-based methods. Through explicit edit operations such as word deletion and lexical substitution, revision-based strategies iteratively simplify a given text in multiple steps. However, generative approaches generate simplified sentences from a complex sentence in one step. Generative models do not have explicit edit operations but learn implicit edit operations from data. Revision-based methods are more controllable and interpretable than generative models. On the other hand, generative models can apply more complex edits (such as paraphrasing) to a given text compared to the revision-based method. We propose GRS: an unsupervised approach to sentence simplification that combines text generation and text revision. We start with an iterative framework in which an input sentence is revised using explicit edit operations such as word deletion and add paraphrasing as a new edit operation. This allows us to combine the advantages of generative and revision-based approaches. Paraphrasing captures complex edit operations, and the use of explicit edit operations in an iterative manner provides controllability and interpretability. We demonstrate the advantages of GRS compared to existing methods. To evaluate our model, we use Newsela and ASSET datasets that contain high-quality complex-simple sentence pairs and are commonly used in the literature. The Newsela dataset contains 1,840 news articles re-written for children at five different readability standards. The ASSET dataset comprises 2,359 sentences from English Wikipedia. GRS outperforms all unsupervised methods on the Newsela dataset and bridges the gap between revisions-based and generative models on ASSET datasets

    Can Knowledge Graphs Simplify Text?

    Full text link
    Knowledge Graph (KG)-to-Text Generation has seen recent improvements in generating fluent and informative sentences which describe a given KG. As KGs are widespread across multiple domains and contain important entity-relation information, and as text simplification aims to reduce the complexity of a text while preserving the meaning of the original text, we propose KGSimple, a novel approach to unsupervised text simplification which infuses KG-established techniques in order to construct a simplified KG path and generate a concise text which preserves the original input's meaning. Through an iterative and sampling KG-first approach, our model is capable of simplifying text when starting from a KG by learning to keep important information while harnessing KG-to-text generation to output fluent and descriptive sentences. We evaluate various settings of the KGSimple model on currently-available KG-to-text datasets, demonstrating its effectiveness compared to unsupervised text simplification models which start with a given complex text. Our code is available on GitHub.Comment: Accepted as a Main Conference Long Paper at CIKM 202

    Automatic and Human-AI Interactive Text Generation

    Full text link
    In this tutorial, we focus on text-to-text generation, a class of natural language generation (NLG) tasks, that takes a piece of text as input and then generates a revision that is improved according to some specific criteria (e.g., readability or linguistic styles), while largely retaining the original meaning and the length of the text. This includes many useful applications, such as text simplification, paraphrase generation, style transfer, etc. In contrast to text summarization and open-ended text completion (e.g., story), the text-to-text generation tasks we discuss in this tutorial are more constrained in terms of semantic consistency and targeted language styles. This level of control makes these tasks ideal testbeds for studying the ability of models to generate text that is both semantically adequate and stylistically appropriate. Moreover, these tasks are interesting from a technical standpoint, as they require complex combinations of lexical and syntactical transformations, stylistic control, and adherence to factual knowledge, -- all at once. With a special focus on text simplification and revision, this tutorial aims to provide an overview of the state-of-the-art natural language generation research from four major aspects -- Data, Models, Human-AI Collaboration, and Evaluation -- and to discuss and showcase a few significant and recent advances: (1) the use of non-retrogressive approaches; (2) the shift from fine-tuning to prompting with large language models; (3) the development of new learnable metric and fine-grained human evaluation framework; (4) a growing body of studies and datasets on non-English languages; (5) the rise of HCI+NLP+Accessibility interdisciplinary research to create real-world writing assistant systems.Comment: To appear at ACL 2024, Tutoria

    Dancing Between Success and Failure: Edit-level Simplification Evaluation using SALSA

    Full text link
    Large language models (e.g., GPT-4) are uniquely capable of producing highly rated text simplification, yet current human evaluation methods fail to provide a clear understanding of systems' specific strengths and weaknesses. To address this limitation, we introduce SALSA, an edit-based human annotation framework that enables holistic and fine-grained text simplification evaluation. We develop twenty one linguistically grounded edit types, covering the full spectrum of success and failure across dimensions of conceptual, syntactic and lexical simplicity. Using SALSA, we collect 19K edit annotations on 840 simplifications, revealing discrepancies in the distribution of simplification strategies performed by fine-tuned models, prompted LLMs and humans, and find GPT-3.5 performs more quality edits than humans, but still exhibits frequent errors. Using our fine-grained annotations, we develop LENS-SALSA, a reference-free automatic simplification metric, trained to predict sentence- and word-level quality simultaneously. Additionally, we introduce word-level quality estimation for simplification and report promising baseline results. Our data, new metric, and annotation toolkit are available at https://salsa-eval.com.Comment: Accepted to EMNLP 202

    Expertise Style Transfer: A New Task Towards Better Communication between Experts and Laymen

    Full text link
    The curse of knowledge can impede communication between experts and laymen. We propose a new task of expertise style transfer and contribute a manually annotated dataset with the goal of alleviating such cognitive biases. Solving this task not only simplifies the professional language, but also improves the accuracy and expertise level of laymen descriptions using simple words. This is a challenging task, unaddressed in previous work, as it requires the models to have expert intelligence in order to modify text with a deep understanding of domain knowledge and structures. We establish the benchmark performance of five state-of-the-art models for style transfer and text simplification. The results demonstrate a significant gap between machine and human performance. We also discuss the challenges of automatic evaluation, to provide insights into future research directions. The dataset is publicly available at https://srhthu.github.io/expertise-style-transfer.Comment: 11 pages, 6 figures; To appear in ACL 202
    corecore