290 research outputs found
Hierarchically-Attentive RNN for Album Summarization and Storytelling
We address the problem of end-to-end visual storytelling. Given a photo
album, our model first selects the most representative (summary) photos, and
then composes a natural language story for the album. For this task, we make
use of the Visual Storytelling dataset and a model composed of three
hierarchically-attentive Recurrent Neural Nets (RNNs) to: encode the album
photos, select representative (summary) photos, and compose the story.
Automatic and human evaluations show our model achieves better performance on
selection, generation, and retrieval than baselines.Comment: To appear at EMNLP-2017 (7 pages
Use of Donepezil in Alzheimer\u27s Disease: Suggested Practice Guidelines
Alzheimer’s disease (AD) is the cause of 60-80% of dementia (2014 Alzheimer’s disease, 2014). AD is estimated to affect over 5 million people over age 65 in the US (2014 Alzheimer’s disease, 2014). With the onset of baby boomers reaching the age of 65 and increased longevity it is estimated that 16 million persons will have Alzheimer’s by 2050. Eighty two percent of persons with AD are over age 85. In 2014 AD is the 6th leading cause of death and the highest cost disease in the US. (2014 Alzheimer’s disease, 2014)
The predicted increase in prevalence and incidence of Alzheimer’s disease and longevity of the disease represents an increased disease burden on families and society and a need for effective management over many years. The purpose of this study will be to determine the effect of donepezil, a cholinesterase inhibitor, at various stages of the disease compared to placebo and to determine the best clinical practice guidelines.
The earlier the onset of donepezil therapy, the more significant effect is seen on preserving cognitive function and delayed nursing home placement. Persons at all stages (mild to very severe retain the ability to respond to donepezil. At end stage disease, it may be efficacious to discontinue donepezil.
These findings indicate that health care providers need to improve the screening and early initiation of donepezil in the management of Alzheimer’s disease as well as monitoring benefits and harm at later stages of the disease.https://commons.und.edu/pas-grad-posters/1109/thumbnail.jp
Solving Visual Madlibs with Multiple Cues
This paper focuses on answering fill-in-the-blank style multiple choice
questions from the Visual Madlibs dataset. Previous approaches to Visual
Question Answering (VQA) have mainly used generic image features from networks
trained on the ImageNet dataset, despite the wide scope of questions. In
contrast, our approach employs features derived from networks trained for
specialized tasks of scene classification, person activity prediction, and
person and object attribute prediction. We also present a method for selecting
sub-regions of an image that are relevant for evaluating the appropriateness of
a putative answer. Visual features are computed both from the whole image and
from local regions, while sentences are mapped to a common space using a simple
normalized canonical correlation analysis (CCA) model. Our results show a
significant improvement over the previous state of the art, and indicate that
answering different question types benefits from examining a variety of image
cues and carefully choosing informative image sub-regions
A Joint Speaker-Listener-Reinforcer Model for Referring Expressions
Referring expressions are natural language constructions used to identify
particular objects within a scene. In this paper, we propose a unified
framework for the tasks of referring expression comprehension and generation.
Our model is composed of three modules: speaker, listener, and reinforcer. The
speaker generates referring expressions, the listener comprehends referring
expressions, and the reinforcer introduces a reward function to guide sampling
of more discriminative expressions. The listener-speaker modules are trained
jointly in an end-to-end learning framework, allowing the modules to be aware
of one another during learning while also benefiting from the discriminative
reinforcer's feedback. We demonstrate that this unified framework and training
achieves state-of-the-art results for both comprehension and generation on
three referring expression datasets. Project and demo page:
https://vision.cs.unc.edu/referComment: Some typo fixed; comprehension results on refcocog updated; more
human evaluation results adde
Visual to Sound: Generating Natural Sound for Videos in the Wild
As two of the five traditional human senses (sight, hearing, taste, smell,
and touch), vision and sound are basic sources through which humans understand
the world. Often correlated during natural events, these two modalities combine
to jointly affect human perception. In this paper, we pose the task of
generating sound given visual input. Such capabilities could help enable
applications in virtual reality (generating sound for virtual scenes
automatically) or provide additional accessibility to images or videos for
people with visual impairments. As a first step in this direction, we apply
learning-based methods to generate raw waveform samples given input video
frames. We evaluate our models on a dataset of videos containing a variety of
sounds (such as ambient sounds and sounds from people/animals). Our experiments
show that the generated sounds are fairly realistic and have good temporal
synchronization with the visual inputs.Comment: Project page:
http://bvision11.cs.unc.edu/bigpen/yipin/visual2sound_webpage/visual2sound.htm
- …