85,372 research outputs found
A characterization of visual feature recognition
technical reportNatural human interfaces are a key to realizing the dream of ubiquitous computing. This implies that embedded systems must be capable of sophisticated perception tasks. This paper analyzes the nature of a visual feature recognition workload. Visual feature recognition is a key component of a number of important applications, e.g. gesture based interfaces, lip tracking to augment speech recognition, smart cameras, automated surveillance systems, robotic vision, etc. Given the power sensitive nature of the embedded space and the natural conflict between low-power and high-performance implementations, a precise understanding of these algorithms is an important step developing efficient visual feature recognition applications for the embedded space. In particular, this work analyzes the performance characteristics of flesh toning, face detection and face recognition codes based on well known algorithms. We also show how the problem can be decomposed into a pipeline of filters that have efficient implementations as stream processors
Unsupervised learning of clutter-resistant visual representations from natural videos
Populations of neurons in inferotemporal cortex (IT) maintain an explicit
code for object identity that also tolerates transformations of object
appearance e.g., position, scale, viewing angle [1, 2, 3]. Though the learning
rules are not known, recent results [4, 5, 6] suggest the operation of an
unsupervised temporal-association-based method e.g., Foldiak's trace rule [7].
Such methods exploit the temporal continuity of the visual world by assuming
that visual experience over short timescales will tend to have invariant
identity content. Thus, by associating representations of frames from nearby
times, a representation that tolerates whatever transformations occurred in the
video may be achieved. Many previous studies verified that such rules can work
in simple situations without background clutter, but the presence of visual
clutter has remained problematic for this approach. Here we show that temporal
association based on large class-specific filters (templates) avoids the
problem of clutter. Our system learns in an unsupervised way from natural
videos gathered from the internet, and is able to perform a difficult
unconstrained face recognition task on natural images: Labeled Faces in the
Wild [8]
“I Look in Your Eyes, Honey”: Internal Face Features Induce Spatial Frequency Preference for Human Face Processing
Numerous psychophysical experiments found that humans preferably rely on a narrow
band of spatial frequencies for recognition of face identity. A recently
conducted theoretical study by the author suggests that this frequency
preference reflects an adaptation of the brain's face processing
machinery to this specific stimulus class (i.e., faces). The purpose of the
present study is to examine this property in greater detail and to specifically
elucidate the implication of internal face features (i.e., eyes, mouth, and
nose). To this end, I parameterized Gabor filters to match the spatial receptive
field of contrast sensitive neurons in the primary visual cortex (simple and
complex cells). Filter responses to a large number of face images were computed,
aligned for internal face features, and response-equalized
(“whitened”). The results demonstrate that the frequency
preference is caused by internal face features. Thus, the psychophysically
observed human frequency bias for face processing seems to be specifically
caused by the intrinsic spatial frequency content of internal face features
A robust illumination-invariant face recognition based on fusion of thermal IR, maximum filter and visible image
Face recognition has many challenges especially in real life detection, whereby to maintain consistency in getting an accurate recognition is almost impossible. Even for well-established state-of-the-art algorithms or methods will produce low accuracy in recognition if it was conducted under poor or bad lighting. To create a more robust face recognition with illumination invariant, this paper proposed an algorithm using a triple fusion approach. We are also implementing a hybrid method that combines the active approach by implementing thermal infrared imaging and also the passive approach of Maximum Filter and visual image. These approaches allow us to improve the image pre-processing as well as feature extraction and face detection, even if we capture a person’s face image in total darkness. In our experiment, Extended Yale B database are tested with Maximum Filter and compared against other state-of-the-filters. We have conduct-ed several experiments on mid-wave and long-wave thermal Infrared performance during pre-processing and saw that it is capable to improve recognition beyond what meets the eye. In our experiment, we found out that PCA eigenface cannot be produced in a poor or bad illumination. Mid-wave thermal creates the heat signature in the body and the Maximum Filter maintains the fine edges that are easily used by any classifiers such as SVM, OpenCV or even kNN together with Euclidian distance to perform face recognition. These configurations have been assembled for a face recognition portable robust system and the result showed that creating fusion between these processed image illumination invariants during preprocessing show far better results than just using visible image, thermal image or maximum filtered image separately
Nonlocal contrast calculated by the second order visual mechanisms and its significance in identifying facial emotions [version 2; peer review: 2 approved]
Background: Previously obtained results indicate that faces are /preattentively/ detected in the visual scene very fast, and information on facial expression is rapidly extracted at the lower levels of the visual system. At the same time different facial attributes make different contributions in facial expression recognition. However, it is known, among the preattentive mechanisms there are none that would be selective for certain facial features, such as eyes or mouth. The aim of our study was to identify a candidate for the role of such a mechanism. Our assumption was that the most informative areas of the image are those characterized by spatial heterogeneity, particularly with nonlocal contrast changes. These areas may be identified /in the human visual system/ by the second-order visual /mechanisms/ filters selective to contrast modulations of brightness gradients. Methods: We developed a software program imitating the operation of these /mechanisms/ filters and finding areas of contrast heterogeneity in the image. Using this program, we extracted areas with maximum, minimum and medium contrast modulation amplitudes from the initial face images, then we used these to make three variants of one and the same face. The faces were demonstrated to the observers along with other objects synthesized the same way. The participants had to identify faces and define facial emotional expressions. Results: It was found that the greater is the contrast modulation amplitude of the areas shaping the face, the more precisely the emotion is identified. Conclusions: The results suggest that areas with a greater increase in nonlocal contrast are more informative in facial images, and the second-order visual /mechanisms/ filters can claim the role of /filters/ elements that detect areas of interest, attract visual attention and are windows through which subsequent levels of visual processing receive valuable information
FAME: Face Association through Model Evolution
We attack the problem of learning face models for public faces from
weakly-labelled images collected from web through querying a name. The data is
very noisy even after face detection, with several irrelevant faces
corresponding to other people. We propose a novel method, Face Association
through Model Evolution (FAME), that is able to prune the data in an iterative
way, for the face models associated to a name to evolve. The idea is based on
capturing discriminativeness and representativeness of each instance and
eliminating the outliers. The final models are used to classify faces on novel
datasets with possibly different characteristics. On benchmark datasets, our
results are comparable to or better than state-of-the-art studies for the task
of face identification.Comment: Draft version of the stud
- …