174 research outputs found
Generating multiple summaries based on computational model of perspective
Thesis (Ph. D.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2008.Includes bibliographical references (leaves 87-92).Every story about an event offers a unique perspective about the event. A popular sporting event, such as a Major League Baseball game, is followed by several summary articles that show different points of view. The goal of this research is to build a computational model of perspective and build a system for automatically generating multiple summary articles showing different perspectives. My approach is to take a neutral summary article, reorder the content of that summary based on event features extracted from the description of the game, and produce two new summaries showing the local team perspectives. I will present an initial user survey that validated the hypothesis that content ordering has a significant effect on the users' perception of perspective. I will also discuss collecting and analyzing a parallel corpus of baseball game data and summary articles showing local team perspectives. I will then describe the reordering algorithm, the implementation of the system, and a user study to evaluate the output of the system.by Alice H. Oh.Ph.D
TRANSLATING VISUALIZATION INTERACTION INTO NATURAL LANGUAGE
Richly interactive visualization tools are increasingly popular for data exploration and analysis in a wide variety of domains.
Recent advancements in data collection and storage call for more complex analytical tasks to make sense of readily available datasets. More complicated and sophisticated tools are needed to complete those tasks. However, as these visualization tools get more complicated, it becomes increasingly difficult to learn interaction sequences, recall past queries asked from a visualization, and correctly interpret visual states to forage the data. Moreover, the high interactivity of such tools increases the challenge of connecting low-level acquired information to higher-level analytical questions and hypotheses to support, reason, and eventually present insights. This makes studying the usability of complex interactive visualizations, both in the process of foraging and making sense of data, an essential part of visual analytic research. This research can be approached in at least two major ways. One can focus on studying new techniques and guidelines for designing interactive complex visualizations that are easy to use and understand. One can also focus on keeping the capabilities of existing complex visualizations, yet provide supporting capabilities that increases their usability. The latter is an emerging area of research in visual analytics, and is the focus of this dissertation.
This dissertation describes six contributions to the field of visual analytics. The first contribution is an architecture of a query-to-question supporting system that automatically records user interactions and presents them contextually using natural written language. The architecture takes into account the domain knowledge of experts/designers and uses natural language generation (NLG) techniques to translate and transcribe a progression of interactive visualization states into a log of text that can be visualized.
The second contribution is query-to-question (Q2Q), an implemented system that translates low-level user interactions into high-level analytical questions and presents them as a log of styled text that complements and effectively extends the functionality of visualization tools.
The third contribution is a demonstration of the beneficial effects of accompanying a visualization with a textual translation of user interaction on the usability of visualizations. The presence of the translation interface produces considerable improvements in learnability, efficiency, and memorability of visualization in terms of speed and the length of interaction sequences that users perform, along with a modest decrease in error ratio.
The fourth contribution is a set of design guidelines for translating user interactions into natural language, taking into account variation in user knowledge and roles, the types of data being visualized, and the types of interaction supported.
The fifth contribution is a history organizer interface that enables users to organize their analytical process. The structured textual translations output from Q2Q are input into a history organizer tool (HOT) that imposes reordering, sequencing, and grouping of the translated interactions. HOT provides a reasoning framework for users to organize and present hypotheses and insight acquired from a visualization.
The sixth contribution is a demonstration of the efficiency of a suite of arrangement options for organizing questions asked in a visualization. Integration of query translation and history organization improves users' speed, error ratio, and number of reordering actions performed during organization of translated interactions. Overall, this dissertation contributes to the analysis and discovery of user storytelling patterns and behaviours, thereby paving the way to the creation of more intelligent, effective, and user-oriented visual analysis presentation tools
Automatic Image Captioning with Style
This thesis connects two core topics in machine learning, vision
and language. The problem of choice is image caption generation:
automatically constructing natural language descriptions of image
content. Previous research into image caption generation has
focused on generating purely descriptive captions; I focus on
generating visually relevant captions with a distinct linguistic
style. Captions with style have the potential to ease
communication and add a new layer of personalisation.
First, I consider naming variations in image captions, and
propose a method for predicting context-dependent names that
takes into account visual and linguistic information. This method
makes use of a large-scale image caption dataset, which I also
use to explore naming conventions and report naming conventions
for hundreds of animal classes. Next I propose the SentiCap
model, which relies on recent advances in artificial neural
networks to generate visually relevant image captions with
positive or negative sentiment. To balance descriptiveness and
sentiment, the SentiCap model dynamically switches between two
recurrent neural networks, one tuned for descriptive words and
one for sentiment words. As the first published model for
generating captions with sentiment, SentiCap has influenced a
number of subsequent works. I then investigate the sub-task of
modelling styled sentences without images. The specific task
chosen is sentence simplification: rewriting news article
sentences to make them easier to understand.
For this task I design a neural sequence-to-sequence model that
can work with
limited training data, using novel adaptations for word copying
and sharing
word embeddings. Finally, I present SemStyle, a system for
generating visually
relevant image captions in the style of an arbitrary text corpus.
A shared term
space allows a neural network for vision and content planning to
communicate
with a network for styled language generation. SemStyle achieves
competitive
results in human and automatic evaluations of descriptiveness and
style.
As a whole, this thesis presents two complete systems for styled
caption generation that are first of their kind and demonstrate,
for the first time, that automatic style transfer for image
captions is achievable. Contributions also include novel ideas
for object naming and sentence simplification. This thesis opens
up inquiries into highly personalised image captions; large scale
visually grounded concept naming; and more generally, styled text
generation with content control
Recommended from our members
Generating Natural Language Summaries from Multiple On-Line Sources: Language Reuse and Regeneration
The abundance of news wire on the World-Wide Web has resulted in at least four major problems, which seem to present the most interesting challenges to users and researchers alike: size,heterogeneity, change, and conflicting information. Size: several hundred newspapers and news agencies maintain their Web sites with thousands of news stories in each. Heterogeneity: some of the data related to news is in structured format (e.g., tables); more exists in semi-structured format (e.g.,Web pages, encyclopedias, textual databases); while the rest of the data is in textual form (e.g., newswire). Change: most Web sites and certainly all news sources change on a daily basis. Disagreement: different sources present conflicting or at least different views of the same event. We have approached the second, third, and fourth of these four problems from the point of view of text generation. We have developed a system, {\scsummons}, which when coupled with appropriate information extraction technology, generates a specific genre of natural language summaries of a particular event (which we call briefings) in a restricted domain. The briefings are concise, they contain facts from multiple and heterogeneous sources, and incorporate evolving information, highlighting agreements and contradictions among sources on the same topic. We have developed novel techniques and algorithms for combining data from multiple sources at the conceptual level (using natural language understanding), for identifying new information on a given topic; and for presenting the information in natural language form to the user. We named the framework that we have developed for these problems {\em language reuse and regeneration} (LRR). Its novelty lies in the ability to produce text by collating together text already written by humans on the Web. The main features of LRR are: increased robustness through a simplified parsing/generation component, leverage on text already written by humans, and facilities for the inclusion of structured data in computer-generated text. The present thesis contains an introduction to LRR and its use inmulti-document summarization. We have paid special attention to the techniquesfor producing conceptual summaries of multiple sources, to the creation and useof a LRR-based lexicon for text generation, to a methodology used to identifynew and old information in threads of documents, and to the generation offluent natural language text using all the components above. The thesis contains evaluations of the different components of {\sc summons} aswell as certain aspects of LRR as a methodology. A review of the relevantliterature is included as a separate chapter
- …