9,249 research outputs found
"Mental Rotation" by Optimizing Transforming Distance
The human visual system is able to recognize objects despite transformations
that can drastically alter their appearance. To this end, much effort has been
devoted to the invariance properties of recognition systems. Invariance can be
engineered (e.g. convolutional nets), or learned from data explicitly (e.g.
temporal coherence) or implicitly (e.g. by data augmentation). One idea that
has not, to date, been explored is the integration of latent variables which
permit a search over a learned space of transformations. Motivated by evidence
that people mentally simulate transformations in space while comparing
examples, so-called "mental rotation", we propose a transforming distance.
Here, a trained relational model actively transforms pairs of examples so that
they are maximally similar in some feature space yet respect the learned
transformational constraints. We apply our method to nearest-neighbour problems
on the Toronto Face Database and NORB
Parsing Occluded People by Flexible Compositions
This paper presents an approach to parsing humans when there is significant
occlusion. We model humans using a graphical model which has a tree structure
building on recent work [32, 6] and exploit the connectivity prior that, even
in presence of occlusion, the visible nodes form a connected subtree of the
graphical model. We call each connected subtree a flexible composition of
object parts. This involves a novel method for learning occlusion cues. During
inference we need to search over a mixture of different flexible models. By
exploiting part sharing, we show that this inference can be done extremely
efficiently requiring only twice as many computations as searching for the
entire object (i.e., not modeling occlusion). We evaluate our model on the
standard benchmarked "We Are Family" Stickmen dataset and obtain significant
performance improvements over the best alternative algorithms.Comment: CVPR 15 Camera Read
Relational Reasoning Network (RRN) for Anatomical Landmarking
Accurately identifying anatomical landmarks is a crucial step in deformation
analysis and surgical planning for craniomaxillofacial (CMF) bones. Available
methods require segmentation of the object of interest for precise landmarking.
Unlike those, our purpose in this study is to perform anatomical landmarking
using the inherent relation of CMF bones without explicitly segmenting them. We
propose a new deep network architecture, called relational reasoning network
(RRN), to accurately learn the local and the global relations of the landmarks.
Specifically, we are interested in learning landmarks in CMF region: mandible,
maxilla, and nasal bones. The proposed RRN works in an end-to-end manner,
utilizing learned relations of the landmarks based on dense-block units and
without the need for segmentation. For a given a few landmarks as input, the
proposed system accurately and efficiently localizes the remaining landmarks on
the aforementioned bones. For a comprehensive evaluation of RRN, we used
cone-beam computed tomography (CBCT) scans of 250 patients. The proposed system
identifies the landmark locations very accurately even when there are severe
pathologies or deformations in the bones. The proposed RRN has also revealed
unique relationships among the landmarks that help us infer several reasoning
about informativeness of the landmark points. RRN is invariant to order of
landmarks and it allowed us to discover the optimal configurations (number and
location) for landmarks to be localized within the object of interest
(mandible) or nearby objects (maxilla and nasal). To the best of our knowledge,
this is the first of its kind algorithm finding anatomical relations of the
objects using deep learning.Comment: 10 pages, 6 Figures, 3 Table
Learning Visual Question Answering by Bootstrapping Hard Attention
Attention mechanisms in biological perception are thought to select subsets
of perceptual information for more sophisticated processing which would be
prohibitive to perform on all sensory inputs. In computer vision, however,
there has been relatively little exploration of hard attention, where some
information is selectively ignored, in spite of the success of soft attention,
where information is re-weighted and aggregated, but never filtered out. Here,
we introduce a new approach for hard attention and find it achieves very
competitive performance on a recently-released visual question answering
datasets, equalling and in some cases surpassing similar soft attention
architectures while entirely ignoring some features. Even though the hard
attention mechanism is thought to be non-differentiable, we found that the
feature magnitudes correlate with semantic relevance, and provide a useful
signal for our mechanism's attentional selection criterion. Because hard
attention selects important features of the input information, it can also be
more efficient than analogous soft attention mechanisms. This is especially
important for recent approaches that use non-local pairwise operations, whereby
computational and memory costs are quadratic in the size of the set of
features.Comment: ECCV 201
Constrained Deep Transfer Feature Learning and its Applications
Feature learning with deep models has achieved impressive results for both
data representation and classification for various vision tasks. Deep feature
learning, however, typically requires a large amount of training data, which
may not be feasible for some application domains. Transfer learning can be one
of the approaches to alleviate this problem by transferring data from data-rich
source domain to data-scarce target domain. Existing transfer learning methods
typically perform one-shot transfer learning and often ignore the specific
properties that the transferred data must satisfy. To address these issues, we
introduce a constrained deep transfer feature learning method to perform
simultaneous transfer learning and feature learning by performing transfer
learning in a progressively improving feature space iteratively in order to
better narrow the gap between the target domain and the source domain for
effective transfer of the data from the source domain to target domain.
Furthermore, we propose to exploit the target domain knowledge and incorporate
such prior knowledge as a constraint during transfer learning to ensure that
the transferred data satisfies certain properties of the target domain. To
demonstrate the effectiveness of the proposed constrained deep transfer feature
learning method, we apply it to thermal feature learning for eye detection by
transferring from the visible domain. We also applied the proposed method for
cross-view facial expression recognition as a second application. The
experimental results demonstrate the effectiveness of the proposed method for
both applications.Comment: International Conference on Computer Vision and Pattern Recognition,
201
- …