Search CORE

290 research outputs found

Hierarchically-Attentive RNN for Album Summarization and Storytelling

Author: Bansal Mohit
Berg Tamara L.
Yu Licheng
Publication venue
Publication date: 01/01/2017
Field of study

We address the problem of end-to-end visual storytelling. Given a photo album, our model first selects the most representative (summary) photos, and then composes a natural language story for the album. For this task, we make use of the Visual Storytelling dataset and a model composed of three hierarchically-attentive Recurrent Neural Nets (RNNs) to: encode the album photos, select representative (summary) photos, and compose the story. Automatic and human evaluations show our model achieves better performance on selection, generation, and retrieval than baselines.Comment: To appear at EMNLP-2017 (7 pages

arXiv.org e-Print Archive

Crossref

Use of Donepezil in Alzheimer\u27s Disease: Suggested Practice Guidelines

Author: Berg Tamara
Publication venue: UND Scholarly Commons
Publication date: 01/01/2015
Field of study

Alzheimer’s disease (AD) is the cause of 60-80% of dementia (2014 Alzheimer’s disease, 2014). AD is estimated to affect over 5 million people over age 65 in the US (2014 Alzheimer’s disease, 2014). With the onset of baby boomers reaching the age of 65 and increased longevity it is estimated that 16 million persons will have Alzheimer’s by 2050. Eighty two percent of persons with AD are over age 85. In 2014 AD is the 6th leading cause of death and the highest cost disease in the US. (2014 Alzheimer’s disease, 2014) The predicted increase in prevalence and incidence of Alzheimer’s disease and longevity of the disease represents an increased disease burden on families and society and a need for effective management over many years. The purpose of this study will be to determine the effect of donepezil, a cholinesterase inhibitor, at various stages of the disease compared to placebo and to determine the best clinical practice guidelines. The earlier the onset of donepezil therapy, the more significant effect is seen on preserving cognitive function and delayed nursing home placement. Persons at all stages (mild to very severe retain the ability to respond to donepezil. At end stage disease, it may be efficacious to discontinue donepezil. These findings indicate that health care providers need to improve the screening and early initiation of donepezil in the management of Alzheimer’s disease as well as monitoring benefits and harm at later stages of the disease.https://commons.und.edu/pas-grad-posters/1109/thumbnail.jp

UND Scholarly Commons (University of North Dakota)

Solving Visual Madlibs with Multiple Cues

Author: Alexander C. Berg
Arun Mallya
Bryan Plummer
Svetlana Lazebnik
Tamara L. Berg
Tommasi Tatiana
Publication venue: 'British Machine Vision Association and Society for Pattern Recognition'
Publication date: 01/01/2016
Field of study

This paper focuses on answering fill-in-the-blank style multiple choice questions from the Visual Madlibs dataset. Previous approaches to Visual Question Answering (VQA) have mainly used generic image features from networks trained on the ImageNet dataset, despite the wide scope of questions. In contrast, our approach employs features derived from networks trained for specialized tasks of scene classification, person activity prediction, and person and object attribute prediction. We also present a method for selecting sub-regions of an image that are relevant for evaluating the appropriateness of a putative answer. Visual features are computed both from the whole image and from local regions, while sentences are mapped to a common space using a simple normalized canonical correlation analysis (CCA) model. Our results show a significant improvement over the previous state of the art, and indicate that answering different question types benefits from examining a variety of image cues and carefully choosing informative image sub-regions

PORTO@iris (Publications Open Repository TOrino - Politecnico di Torino)

Archivio della ricerca- Università di Roma La Sapienza

A Joint Speaker-Listener-Reinforcer Model for Referring Expressions

Author: Bansal Mohit
Berg Tamara L.
Tan Hao
Yu Licheng
Publication venue
Publication date: 17/04/2017
Field of study

Referring expressions are natural language constructions used to identify particular objects within a scene. In this paper, we propose a unified framework for the tasks of referring expression comprehension and generation. Our model is composed of three modules: speaker, listener, and reinforcer. The speaker generates referring expressions, the listener comprehends referring expressions, and the reinforcer introduces a reward function to guide sampling of more discriminative expressions. The listener-speaker modules are trained jointly in an end-to-end learning framework, allowing the modules to be aware of one another during learning while also benefiting from the discriminative reinforcer's feedback. We demonstrate that this unified framework and training achieves state-of-the-art results for both comprehension and generation on three referring expression datasets. Project and demo page: https://vision.cs.unc.edu/referComment: Some typo fixed; comprehension results on refcocog updated; more human evaluation results adde

arXiv.org e-Print Archive

Crossref

Visual to Sound: Generating Natural Sound for Videos in the Wild

Author: Berg Tamara L.
Bui Trung
Fang Chen
Wang Zhaowen
Zhou Yipin
Publication venue
Publication date: 01/06/2018
Field of study

As two of the five traditional human senses (sight, hearing, taste, smell, and touch), vision and sound are basic sources through which humans understand the world. Often correlated during natural events, these two modalities combine to jointly affect human perception. In this paper, we pose the task of generating sound given visual input. Such capabilities could help enable applications in virtual reality (generating sound for virtual scenes automatically) or provide additional accessibility to images or videos for people with visual impairments. As a first step in this direction, we apply learning-based methods to generate raw waveform samples given input video frames. We evaluate our models on a dataset of videos containing a variety of sounds (such as ambient sounds and sounds from people/animals). Our experiments show that the generated sounds are fairly realistic and have good temporal synchronization with the visual inputs.Comment: Project page: http://bvision11.cs.unc.edu/bigpen/yipin/visual2sound_webpage/visual2sound.htm

arXiv.org e-Print Archive

Crossref