9,018 research outputs found
Fast, invariant representation for human action in the visual system
Humans can effortlessly recognize others' actions in the presence of complex
transformations, such as changes in viewpoint. Several studies have located the
regions in the brain involved in invariant action recognition, however, the
underlying neural computations remain poorly understood. We use
magnetoencephalography (MEG) decoding and a dataset of well-controlled,
naturalistic videos of five actions (run, walk, jump, eat, drink) performed by
different actors at different viewpoints to study the computational steps used
to recognize actions across complex transformations. In particular, we ask when
the brain discounts changes in 3D viewpoint relative to when it initially
discriminates between actions. We measure the latency difference between
invariant and non-invariant action decoding when subjects view full videos as
well as form-depleted and motion-depleted stimuli. Our results show no
difference in decoding latency or temporal profile between invariant and
non-invariant action recognition in full videos. However, when either form or
motion information is removed from the stimulus set, we observe a decrease and
delay in invariant action decoding. Our results suggest that the brain
recognizes actions and builds invariance to complex transformations at the same
time, and that both form and motion information are crucial for fast, invariant
action recognition
Multi-View Face Recognition From Single RGBD Models of the Faces
This work takes important steps towards solving the following problem of current interest: Assuming that each individual in a population can be modeled by a single frontal RGBD face image, is it possible to carry out face recognition for such a population using multiple 2D images captured from arbitrary viewpoints? Although the general problem as stated above is extremely challenging, it encompasses subproblems that can be addressed today. The subproblems addressed in this work relate to: (1) Generating a large set of viewpoint dependent face images from a single RGBD frontal image for each individual; (2) using hierarchical approaches based on view-partitioned subspaces to represent the training data; and (3) based on these hierarchical approaches, using a weighted voting algorithm to integrate the evidence collected from multiple images of the same face as recorded from different viewpoints. We evaluate our methods on three datasets: a dataset of 10 people that we created and two publicly available datasets which include a total of 48 people. In addition to providing important insights into the nature of this problem, our results show that we are able to successfully recognize faces with accuracies of 95% or higher, outperforming existing state-of-the-art face recognition approaches based on deep convolutional neural networks
Boosted Random ferns for object detection
© 20xx IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.In this paper we introduce the Boosted Random Ferns (BRFs) to rapidly build discriminative classifiers for learning and detecting object categories. At the core of our approach we use standard random ferns, but we introduce four main innovations that let us bring ferns from an instance to a category level, and still retain efficiency. First, we define binary features on the histogram of oriented gradients-domain (as opposed to intensity-), allowing for a better representation of intra-class variability. Second, both the positions where ferns are evaluated within the sliding window, and the location of the binary features for each fern are not chosen completely at random, but instead we use a boosting strategy to pick the most discriminative combination of them. This is further enhanced by our third contribution, that is to adapt the boosting strategy to enable sharing of binary features among different ferns, yielding high recognition rates at a low computational cost. And finally, we show that training can be performed online, for sequentially arriving images. Overall, the resulting classifier can be very efficiently trained, densely evaluated for all image locations in about 0.1 seconds, and provides detection rates similar to competing approaches that require expensive and significantly slower processing times. We demonstrate the effectiveness of our approach by thorough experimentation in publicly available datasets in which we compare against state-of-the-art, and for tasks of both 2D detection and 3D multi-view estimation.Peer ReviewedPostprint (author's final draft
- …