10 research outputs found
Data-to-Text Generation with Content Selection and Planning
Recent advances in data-to-text generation have led to the use of large-scale
datasets and neural network models which are trained end-to-end, without
explicitly modeling what to say and in what order. In this work, we present a
neural network architecture which incorporates content selection and planning
without sacrificing end-to-end training. We decompose the generation task into
two stages. Given a corpus of data records (paired with descriptive documents),
we first generate a content plan highlighting which information should be
mentioned and in which order and then generate the document while taking the
content plan into account. Automatic and human-based evaluation experiments
show that our model outperforms strong baselines improving the state-of-the-art
on the recently released RotoWire dataset.Comment: Added link to cod
Learning to Select, Track, and Generate for Data-to-Text
We propose a data-to-text generation model with two modules, one for tracking
and the other for text generation. Our tracking module selects and keeps track
of salient information and memorizes which record has been mentioned. Our
generation module generates a summary conditioned on the state of tracking
module. Our model is considered to simulate the human-like writing process that
gradually selects the information by determining the intermediate variables
while writing the summary. In addition, we also explore the effectiveness of
the writer information for generation. Experimental results show that our model
outperforms existing models in all evaluation metrics even without writer
information. Incorporating writer information further improves the performance,
contributing to content planning and surface realization.Comment: ACL 201
Sentence-Level Content Planning and Style Specification for Neural Text Generation
Building effective text generation systems requires three critical
components: content selection, text planning, and surface realization, and
traditionally they are tackled as separate problems. Recent all-in-one style
neural generation models have made impressive progress, yet they often produce
outputs that are incoherent and unfaithful to the input. To address these
issues, we present an end-to-end trained two-step generation model, where a
sentence-level content planner first decides on the keyphrases to cover as well
as a desired language style, followed by a surface realization decoder that
generates relevant and coherent text. For experiments, we consider three tasks
from domains with diverse topics and varying language styles: persuasive
argument construction from Reddit, paragraph generation for normal and simple
versions of Wikipedia, and abstract generation for scientific articles.
Automatic evaluation shows that our system can significantly outperform
competitive comparisons. Human judges further rate our system generated text as
more fluent and correct, compared to the generations by its variants that do
not consider language style.Comment: Accepted as a long paper to EMNLP 201
Automatic Generation of Sports News
Nesta dissertação foi desenvolvido um sistema de geração de linguagem natural, que a partir de dados de um determinado jogo de futebol, é capaz de criar uma notícia com o rescaldo desse jogo, automaticamente
Reactive Content Selection in the Generation of Real-time Soccer Commentary
MIKE is an automatic commentary system that generates a commentary of a simulated soccer game in English, French, or Japanese. One of the major technical challenges..
The role of terminology and local grammar in video annotation
The linguistic annotation' of video sequences is an intellectually challenging task involving the investigation of how images and words are linked .together, a task that is ultimately financially rewarding in that the eventual automatic retrieval of video (sequences) can be much less time consuming, subjective and expensive than when retrieved manually. Much effort has been focused on automatic or semi-automatic annotation. Computational linguistic methods of video annotation rely on collections of collateral text in the form of keywords and proper nouns. Keywords are often used in a particular order indicating an identifiable pattern which is often limited and can subsequently be used to annotate the portion of a video where such a pattern occurred. Once' the relevant keywords and patterns have been stored, they can then be used to annotate the remainder of the video, excluding all collateral text which does not match the keywords or patterns. A new method of video annotation is presented in this thesis. The method facilitates a) annotation extraction of specialist terms within a corpus of collateral text; b) annotation identification of frequently used linguistic patterns to use in repeating key events within the data-set. The use of the method has led to the development of a system that can automatically assign key words and key patterns to a number of frames that are found in the commentary text approximately contemporaneous to the selected number of frames. The system does not perform video analysis; it only analyses the collateral text. The method is based on corpus linguistics and is mainly frequency based - frequency of occurrence of a key word or key pattern is taken as the basis of its representation. No assumptions are made about the grammatical structure of the language used in the collateral text, neither is a lexica of key words refined. Our system has been designed to annotate videos of football matches in English a!ld Arabic, and also cricket videos in English. The system has also been designed to retrieve annotated clips. The system not only provides a simple search method for annotated clips retrieval, it also provides complex, more advanced search methods.EThOS - Electronic Theses Online ServiceGBUnited Kingdo