2,214 research outputs found
Harvesting Discriminative Meta Objects with Deep CNN Features for Scene Classification
Recent work on scene classification still makes use of generic CNN features
in a rudimentary manner. In this ICCV 2015 paper, we present a novel pipeline
built upon deep CNN features to harvest discriminative visual objects and parts
for scene classification. We first use a region proposal technique to generate
a set of high-quality patches potentially containing objects, and apply a
pre-trained CNN to extract generic deep features from these patches. Then we
perform both unsupervised and weakly supervised learning to screen these
patches and discover discriminative ones representing category-specific objects
and parts. We further apply discriminative clustering enhanced with local CNN
fine-tuning to aggregate similar objects and parts into groups, called meta
objects. A scene image representation is constructed by pooling the feature
response maps of all the learned meta objects at multiple spatial scales. We
have confirmed that the scene image representation obtained using this new
pipeline is capable of delivering state-of-the-art performance on two popular
scene benchmark datasets, MIT Indoor 67~\cite{MITIndoor67} and
Sun397~\cite{Sun397}Comment: To Appear in ICCV 201
Systems analysis of guard cell membrane transport for enhanced stomatal dynamics and water use efficiency
Stomatal transpiration is at the centre of a crisis in water availability and crop production that is expected to unfold over the next 20-30 years. Global water usage has increased 6-fold in the past 100 years, twice as fast as the human population, and is expected to double again before 2030, driven mainly by irrigation and agriculture. Guard cell membrane transport is integral to controlling stomatal aperture and offers important targets for genetic manipulation to improve crop performance. However, its complexity presents a formidable barrier to exploring such possibilities. With few exceptions, mutations that increase water use efficiency commonly have been found to do so with substantial costs to the rate of carbon assimilation, reflecting the trade-off in CO2 availability with suppressed stomatal transpiration. One approach yet to be explored in any detail relies on quantitative systems analysis of the guard cell. Our deep knowledge of transport and homeostasis in these cells gives real substance to the prospect for ‘reverse engineering’ of stomatal responses, using in silico design in directing genetic manipulation for improved water use and crop yields. Here we address this problem with a focus on stomatal kinetics, taking advantage of the OnGuard software and models of the stomatal guard cell (www.psrg.org.uk) recently developed for exploring stomatal physiology. Our analysis suggests that manipulations of single transporter populations are likely to have unforeseen consequences. Channel gating, especially of the dominant K+ channels, appears the most favorable target for experimental manipulation
RAN4IQA: Restorative Adversarial Nets for No-Reference Image Quality Assessment
Inspired by the free-energy brain theory, which implies that human visual
system (HVS) tends to reduce uncertainty and restore perceptual details upon
seeing a distorted image, we propose restorative adversarial net (RAN), a
GAN-based model for no-reference image quality assessment (NR-IQA). RAN, which
mimics the process of HVS, consists of three components: a restorator, a
discriminator and an evaluator. The restorator restores and reconstructs input
distorted image patches, while the discriminator distinguishes the
reconstructed patches from the pristine distortion-free patches. After
restoration, we observe that the perceptual distance between the restored and
the distorted patches is monotonic with respect to the distortion level. We
further define Gain of Restoration (GoR) based on this phenomenon. The
evaluator predicts perceptual score by extracting feature representations from
the distorted and restored patches to measure GoR. Eventually, the quality
score of an input image is estimated by weighted sum of the patch scores.
Experimental results on Waterloo Exploration, LIVE and TID2013 show the
effectiveness and generalization ability of RAN compared to the
state-of-the-art NR-IQA models.Comment: AAAI'1
Speaker-following Video Subtitles
We propose a new method for improving the presentation of subtitles in video
(e.g. TV and movies). With conventional subtitles, the viewer has to constantly
look away from the main viewing area to read the subtitles at the bottom of the
screen, which disrupts the viewing experience and causes unnecessary eyestrain.
Our method places on-screen subtitles next to the respective speakers to allow
the viewer to follow the visual content while simultaneously reading the
subtitles. We use novel identification algorithms to detect the speakers based
on audio and visual information. Then the placement of the subtitles is
determined using global optimization. A comprehensive usability study indicated
that our subtitle placement method outperformed both conventional
fixed-position subtitling and another previous dynamic subtitling method in
terms of enhancing the overall viewing experience and reducing eyestrain
Collaborative Deep Reinforcement Learning for Joint Object Search
We examine the problem of joint top-down active search of multiple objects
under interaction, e.g., person riding a bicycle, cups held by the table, etc..
Such objects under interaction often can provide contextual cues to each other
to facilitate more efficient search. By treating each detector as an agent, we
present the first collaborative multi-agent deep reinforcement learning
algorithm to learn the optimal policy for joint active object localization,
which effectively exploits such beneficial contextual information. We learn
inter-agent communication through cross connections with gates between the
Q-networks, which is facilitated by a novel multi-agent deep Q-learning
algorithm with joint exploitation sampling. We verify our proposed method on
multiple object detection benchmarks. Not only does our model help to improve
the performance of state-of-the-art active localization models, it also reveals
interesting co-detection patterns that are intuitively interpretable
- …