246 research outputs found
Substructure and Boundary Modeling for Continuous Action Recognition
This paper introduces a probabilistic graphical model for continuous action
recognition with two novel components: substructure transition model and
discriminative boundary model. The first component encodes the sparse and
global temporal transition prior between action primitives in state-space model
to handle the large spatial-temporal variations within an action class. The
second component enforces the action duration constraint in a discriminative
way to locate the transition boundaries between actions more accurately. The
two components are integrated into a unified graphical structure to enable
effective training and inference. Our comprehensive experimental results on
both public and in-house datasets show that, with the capability to incorporate
additional information that had not been explicitly or efficiently modeled by
previous methods, our proposed algorithm achieved significantly improved
performance for continuous action recognition.Comment: Detailed version of the CVPR 2012 paper. 15 pages, 6 figure
AMC: Attention guided Multi-modal Correlation Learning for Image Search
Given a user's query, traditional image search systems rank images according
to its relevance to a single modality (e.g., image content or surrounding
text). Nowadays, an increasing number of images on the Internet are available
with associated meta data in rich modalities (e.g., titles, keywords, tags,
etc.), which can be exploited for better similarity measure with queries. In
this paper, we leverage visual and textual modalities for image search by
learning their correlation with input query. According to the intent of query,
attention mechanism can be introduced to adaptively balance the importance of
different modalities. We propose a novel Attention guided Multi-modal
Correlation (AMC) learning method which consists of a jointly learned hierarchy
of intra and inter-attention networks. Conditioned on query's intent,
intra-attention networks (i.e., visual intra-attention network and language
intra-attention network) attend on informative parts within each modality; a
multi-modal inter-attention network promotes the importance of the most
query-relevant modalities. In experiments, we evaluate AMC models on the search
logs from two real world image search engines and show a significant boost on
the ranking of user-clicked images in search results. Additionally, we extend
AMC models to caption ranking task on COCO dataset and achieve competitive
results compared with recent state-of-the-arts.Comment: CVPR 201
Deep Networks for Image Super-Resolution with Sparse Prior
Deep learning techniques have been successfully applied in many areas of
computer vision, including low-level image restoration problems. For image
super-resolution, several models based on deep neural networks have been
recently proposed and attained superior performance that overshadows all
previous handcrafted models. The question then arises whether large-capacity
and data-driven models have become the dominant solution to the ill-posed
super-resolution problem. In this paper, we argue that domain expertise
represented by the conventional sparse coding model is still valuable, and it
can be combined with the key ingredients of deep learning to achieve further
improved results. We show that a sparse coding model particularly designed for
super-resolution can be incarnated as a neural network, and trained in a
cascaded structure from end to end. The interpretation of the network based on
sparse coding leads to much more efficient and effective training, as well as a
reduced model size. Our model is evaluated on a wide range of images, and shows
clear advantage over existing state-of-the-art methods in terms of both
restoration accuracy and human subjective quality
- …