10 research outputs found

    Data-to-Text Generation with Content Selection and Planning

    Full text link
    Recent advances in data-to-text generation have led to the use of large-scale datasets and neural network models which are trained end-to-end, without explicitly modeling what to say and in what order. In this work, we present a neural network architecture which incorporates content selection and planning without sacrificing end-to-end training. We decompose the generation task into two stages. Given a corpus of data records (paired with descriptive documents), we first generate a content plan highlighting which information should be mentioned and in which order and then generate the document while taking the content plan into account. Automatic and human-based evaluation experiments show that our model outperforms strong baselines improving the state-of-the-art on the recently released RotoWire dataset.Comment: Added link to cod

    Learning to Select, Track, and Generate for Data-to-Text

    Full text link
    We propose a data-to-text generation model with two modules, one for tracking and the other for text generation. Our tracking module selects and keeps track of salient information and memorizes which record has been mentioned. Our generation module generates a summary conditioned on the state of tracking module. Our model is considered to simulate the human-like writing process that gradually selects the information by determining the intermediate variables while writing the summary. In addition, we also explore the effectiveness of the writer information for generation. Experimental results show that our model outperforms existing models in all evaluation metrics even without writer information. Incorporating writer information further improves the performance, contributing to content planning and surface realization.Comment: ACL 201

    Sentence-Level Content Planning and Style Specification for Neural Text Generation

    Full text link
    Building effective text generation systems requires three critical components: content selection, text planning, and surface realization, and traditionally they are tackled as separate problems. Recent all-in-one style neural generation models have made impressive progress, yet they often produce outputs that are incoherent and unfaithful to the input. To address these issues, we present an end-to-end trained two-step generation model, where a sentence-level content planner first decides on the keyphrases to cover as well as a desired language style, followed by a surface realization decoder that generates relevant and coherent text. For experiments, we consider three tasks from domains with diverse topics and varying language styles: persuasive argument construction from Reddit, paragraph generation for normal and simple versions of Wikipedia, and abstract generation for scientific articles. Automatic evaluation shows that our system can significantly outperform competitive comparisons. Human judges further rate our system generated text as more fluent and correct, compared to the generations by its variants that do not consider language style.Comment: Accepted as a long paper to EMNLP 201

    Automatic Generation of Sports News

    Get PDF
    Nesta dissertação foi desenvolvido um sistema de geração de linguagem natural, que a partir de dados de um determinado jogo de futebol, é capaz de criar uma notícia com o rescaldo desse jogo, automaticamente

    Reactive Content Selection in the Generation of Real-time Soccer Commentary

    No full text
    MIKE is an automatic commentary system that generates a commentary of a simulated soccer game in English, French, or Japanese. One of the major technical challenges..

    The role of terminology and local grammar in video annotation

    Get PDF
    The linguistic annotation' of video sequences is an intellectually challenging task involving the investigation of how images and words are linked .together, a task that is ultimately financially rewarding in that the eventual automatic retrieval of video (sequences) can be much less time consuming, subjective and expensive than when retrieved manually. Much effort has been focused on automatic or semi-automatic annotation. Computational linguistic methods of video annotation rely on collections of collateral text in the form of keywords and proper nouns. Keywords are often used in a particular order indicating an identifiable pattern which is often limited and can subsequently be used to annotate the portion of a video where such a pattern occurred. Once' the relevant keywords and patterns have been stored, they can then be used to annotate the remainder of the video, excluding all collateral text which does not match the keywords or patterns. A new method of video annotation is presented in this thesis. The method facilitates a) annotation extraction of specialist terms within a corpus of collateral text; b) annotation identification of frequently used linguistic patterns to use in repeating key events within the data-set. The use of the method has led to the development of a system that can automatically assign key words and key patterns to a number of frames that are found in the commentary text approximately contemporaneous to the selected number of frames. The system does not perform video analysis; it only analyses the collateral text. The method is based on corpus linguistics and is mainly frequency based - frequency of occurrence of a key word or key pattern is taken as the basis of its representation. No assumptions are made about the grammatical structure of the language used in the collateral text, neither is a lexica of key words refined. Our system has been designed to annotate videos of football matches in English a!ld Arabic, and also cricket videos in English. The system has also been designed to retrieve annotated clips. The system not only provides a simple search method for annotated clips retrieval, it also provides complex, more advanced search methods.EThOS - Electronic Theses Online ServiceGBUnited Kingdo
    corecore