1,379 research outputs found
Data-driven sentence simplification: Survey and benchmark
Sentence Simplification (SS) aims to modify a sentence in order to make it easier to read and understand. In order to do so, several rewriting transformations can be performed such as replacement, reordering, and splitting. Executing these transformations while keeping sentences grammatical, preserving their main idea, and generating simpler output, is a challenging and still far from solved problem. In this article, we survey research on SS, focusing on approaches that attempt to learn how to simplify using corpora of aligned original-simplified sentence pairs in English, which is the dominant paradigm nowadays. We also include a benchmark of different approaches on common datasets so as to compare them and highlight their strengths and limitations. We expect that this survey will serve as a starting point for researchers interested in the task and help spark new ideas for future developments
Vicinity-driven paragraph and sentence alignment for comparable corpora
Parallel corpora have driven great progress in the field of Text Simplification. However, most sentence alignment algorithms either offer a limited range of alignment types supported, or simply ignore valuable clues present in comparable documents. We address this problem by introducing a new set of flexible vicinity-driven paragraph and sentence alignment algorithms that 1-N, N-1, N-N and long distance null alignments without the need for hard-to-replicate supervised models
Text-based Editing of Talking-head Video
Editing talking-head video to change the speech content or to remove filler words is challenging. We propose a novel method to edit talking-head video based on its transcript to produce a realistic output video in which the dialogue of the speaker has been modified, while maintaining a seamless audio-visual flow (i.e. no jump cuts). Our method automatically annotates an input talking-head video with phonemes, visemes, 3D face pose and geometry, reflectance, expression and scene illumination per frame. To edit a video, the user has to only edit the transcript, and an optimization strategy then chooses segments of the input corpus as base material. The annotated parameters corresponding to the selected segments are seamlessly stitched together and used to produce an intermediate video representation in which the lower half of the face is rendered with a parametric face model. Finally, a recurrent video generation network transforms this representation to a photorealistic video that matches the edited transcript. We demonstrate a large variety of edits, such as the addition, removal, and alteration of words, as well as convincing language translation and full sentence synthesis
Automatically Evaluating Opinion Prevalence in Opinion Summarization
When faced with a large number of product reviews, it is not clear that a
human can remember all of them and weight opinions representatively to write a
good reference summary. We propose an automatic metric to test the prevalence
of the opinions that a summary expresses, based on counting the number of
reviews that are consistent with each statement in the summary, while
discrediting trivial or redundant statements. To formulate this opinion
prevalence metric, we consider several existing methods to score the factual
consistency of a summary statement with respect to each individual source
review. On a corpus of Amazon product reviews, we gather multiple human
judgments of the opinion consistency, to determine which automatic metric best
expresses consistency in product reviews. Using the resulting opinion
prevalence metric, we show that a human authored summary has only slightly
better opinion prevalence than randomly selected extracts from the source
reviews, and previous extractive and abstractive unsupervised opinion
summarization methods perform worse than humans. We demonstrate room for
improvement with a greedy construction of extractive summaries with twice the
opinion prevalence achieved by humans. Finally, we show that preprocessing
source reviews by simplification can raise the opinion prevalence achieved by
existing abstractive opinion summarization systems to the level of human
performance.Comment: The 6th Workshop on e-Commerce and NLP (KDD 2023
- …