7,812 research outputs found
Adaptive Information Cluster at Dublin City University
The Adaptive Information Cluster (AIC) is a collaboration between Dublin City University and University College Dublin, and in the AIC at DCU, we investigate and develop as one stream of our research activities, various content analysis tools that can automatically index and structure video information. This includes movies or CCTV footage and the motivation is to support useful searching and browsing features for the envisaged end-users of such systems. We bring in the HCI perspective to this highly-technically-oriented research by brainstorming, generating scenarios, sketching and prototyping the user-interfaces to the resulting video retrieval systems we develop, and we conduct usability studies to better understand the usage and opinions of such systems so as to guide the future direction of our technological research
Recurrent Scene Parsing with Perspective Understanding in the Loop
Objects may appear at arbitrary scales in perspective images of a scene,
posing a challenge for recognition systems that process images at a fixed
resolution. We propose a depth-aware gating module that adaptively selects the
pooling field size in a convolutional network architecture according to the
object scale (inversely proportional to the depth) so that small details are
preserved for distant objects while larger receptive fields are used for those
nearby. The depth gating signal is provided by stereo disparity or estimated
directly from monocular input. We integrate this depth-aware gating into a
recurrent convolutional neural network to perform semantic segmentation. Our
recurrent module iteratively refines the segmentation results, leveraging the
depth and semantic predictions from the previous iterations.
Through extensive experiments on four popular large-scale RGB-D datasets, we
demonstrate this approach achieves competitive semantic segmentation
performance with a model which is substantially more compact. We carry out
extensive analysis of this architecture including variants that operate on
monocular RGB but use depth as side-information during training, unsupervised
gating as a generic attentional mechanism, and multi-resolution gating. We find
that gated pooling for joint semantic segmentation and depth yields
state-of-the-art results for quantitative monocular depth estimation
- …