52,497 research outputs found
Predicting human gaze using low-level saliency combined with face detection
Under natural viewing conditions, human observers shift their gaze to allocate processing resources to subsets of the visual input. Many computational models try to predict such voluntary eye and attentional shifts. Although the important role of high level stimulus properties (e.g., semantic information) in search stands undisputed, most models are based on low-level image properties. We here demonstrate that a combined model of face detection and low-level saliency significantly outperforms a low-level model in predicting locations humans fixate on, based on eye-movement recordings of humans observing photographs of natural scenes, most of which contained at least one person. Observers, even when not instructed to look for anything particular, fixate on a face with a probability of over 80% within their first two fixations; furthermore, they exhibit more similar scanpaths when faces are present. Remarkably, our model’s predictive performance in images that do not contain faces is not impaired, and is even improved in some cases by spurious face detector responses
Image usefulness of compressed surveillance footage with different scene contents
The police use both subjective (i.e. police staff) and automated (e.g. face recognition systems) methods for the completion of visual tasks (e.g person identification). Image quality for police tasks has been defined as the image usefulness, or image suitability of the visual material to satisfy a visual task. It is not necessarily affected by any artefact that may affect the visual image quality (i.e. decrease fidelity), as long as these artefacts do not affect the relevant useful information for the task. The capture of useful information will be affected by the unconstrained conditions commonly encountered by CCTV systems such as variations in illumination and high compression levels. The main aim of this thesis is to investigate aspects of image quality and video compression that may affect the completion of police visual tasks/applications with respect to CCTV imagery. This is accomplished by investigating 3 specific police areas/tasks utilising: 1) the human visual system (HVS) for a face recognition task, 2) automated face recognition systems, and 3) automated human detection systems.
These systems (HVS and automated) were assessed with defined scene content properties, and video compression, i.e. H.264/MPEG-4 AVC. The performance of
imaging systems/processes (e.g. subjective investigations, performance of compression algorithms) are affected by scene content properties. No other investigation has been identified that takes into consideration scene content properties to the
same extend. Results have shown that the HVS is more sensitive to compression effects in comparison to the automated systems. In automated face recognition systems, `mixed lightness' scenes were the most affected and `low lightness' scenes were the least affected by compression. In contrast the HVS for the face recognition task, `low lightness' scenes were the most affected and `medium lightness' scenes the least affected. For the automated human detection systems, `close distance' and `run approach' are some of the most commonly affected scenes. Findings have the potential to broaden the methods used for testing imaging systems for security applications
A cognitive template for human face detection
Faces are highly informative social stimuli, yet before any information can be accessed, the face must first be detected in the visual field. A detection template that serves this purpose must be able to accommodate the wide variety of face images we encounter, but how this generality could be achieved remains unknown. In this study, we investigate whether statistical averages of previously encountered faces can form the basis of a general face detection template. We provide converging evidence from a range of methods—human similarity judgements and PCA-based image analysis of face averages (Experiment 1–3), human detection behaviour for faces embedded in complex scenes (Experiment 4 and 5), and simulations with a template-matching algorithm (Experiment 6 and 7)—to examine the formation, stability and robustness of statistical image averages as cognitive templates for human face detection. We integrate these findings with existing knowledge of face identification, ensemble coding, and the development of face perception
Recommended from our members
The visual processing of human faces and bodies as visual stimuli in natural scenes
How faces are recognized and detected has been the focus of an extensive corpus of research. As such, it is now well established that human faces can be detected rapidly in a visual scene and that they automatically capture a viewer’s attention over other objects. However, under natural viewing conditions the human face is attached to a substantial cue, the human body. The evidence to-date of a similar attentional processing advantage for human bodies is less clear. This is remarkable given the social significance and evidence of neural specificity for these stimuli. Additionally, most previous investigations of preferential attention towards faces and bodies have presented these stimuli in simple displays, namely uniform colour backgrounds (Bindemann, Scheepers, Ferguson & Burton, 2010). Therefore, this thesis aimed to address the relationship between attention and face and body processing in natural scenes directly by assessing the consequences of numerous experimental manipulations in both a visual search paradigm and additional singleton paradigm. The first line of enquiry examined participants’ ability to detect face and body stimuli in comparison to other objects in natural scenes. Subsequent experiments examined whether faces and bodies captured attention when they were task-irrelevant. In line with previous research, the main findings indicate that human faces do have attentional advantages and capture attention in both natural and grey scenes. They also indicate that human bodies (without the head) do not have detection advantages over other objects, nor do they capture attention in a bottom-up manner. Any biases or detection advantages observed for body targets are because they larger in size than other objects or because they are odd stimuli in that scene. Human full-body targets (including the face), which are perceived on a day-to-day basis, capture attention partly because they include face and partly because they are large objects in the scene. These findings modify claims of person perception suggesting that the detection of a full-body in natural scenes is facilitated by attention capture by faces, any advantages from bodies are the result of attention capture by their large size, rather than some attentional advantage. Future investigating into face and body processing should use natural backgrounds to gain a more realistic insight in to face and body processing in the real world
Ingroup and outgroup differences in face detection
Humans show improved recognition for faces from their own social group relative to faces from another social group. Yet before faces can be recognized, they must first be detected in the visual field. Here, we tested whether humans also show an ingroup bias at the earliest stage of face processing – the point at which the presence of a face is first detected. To this end, we measured viewers' ability to detect ingroup (Black and White) and outgroup faces (Asian, Black, and White) in everyday scenes. Ingroup faces were detected with greater speed and accuracy relative to outgroup faces (Experiment 1). Removing face hue impaired detection gen- erally, but the ingroup detection advantage was undimin- ished (Experiment 2). This same pattern was replicated by a detection algorithm using face templates derived from human data (Experiment 3). These findings demonstrate that the established ingroup bias in face processing can ex- tend to the early process of detection. This effect is ‘colour blind’, in the sense that group membership effects are inde- pendent of general effects of image hue. Moreover, it can be captured by tuning visual templates to reflect the statistics of observers' social experience. We conclude that group bias in face detection is both a visual and a social phenomenon
Recommended from our members
MUSCLE movie-database: a multimodal corpus with rich annotation for dialogue and saliency detection
What are the Visual Features Underlying Rapid Object Recognition?
Research progress in machine vision has been very significant in recent years. Robust face detection and identification algorithms are already readily available to consumers, and modern computer vision algorithms for generic object recognition are now coping with the richness and complexity of natural visual scenes. Unlike early vision models of object recognition that emphasized the role of figure-ground segmentation and spatial information between parts, recent successful approaches are based on the computation of loose collections of image features without prior segmentation or any explicit encoding of spatial relations. While these models remain simplistic models of visual processing, they suggest that, in principle, bottom-up activation of a loose collection of image features could support the rapid recognition of natural object categories and provide an initial coarse visual representation before more complex visual routines and attentional mechanisms take place. Focusing on biologically plausible computational models of (bottom-up) pre-attentive visual recognition, we review some of the key visual features that have been described in the literature. We discuss the consistency of these feature-based representations with classical theories from visual psychology and test their ability to account for human performance on a rapid object categorization task
SALSA: A Novel Dataset for Multimodal Group Behavior Analysis
Studying free-standing conversational groups (FCGs) in unstructured social
settings (e.g., cocktail party ) is gratifying due to the wealth of information
available at the group (mining social networks) and individual (recognizing
native behavioral and personality traits) levels. However, analyzing social
scenes involving FCGs is also highly challenging due to the difficulty in
extracting behavioral cues such as target locations, their speaking activity
and head/body pose due to crowdedness and presence of extreme occlusions. To
this end, we propose SALSA, a novel dataset facilitating multimodal and
Synergetic sociAL Scene Analysis, and make two main contributions to research
on automated social interaction analysis: (1) SALSA records social interactions
among 18 participants in a natural, indoor environment for over 60 minutes,
under the poster presentation and cocktail party contexts presenting
difficulties in the form of low-resolution images, lighting variations,
numerous occlusions, reverberations and interfering sound sources; (2) To
alleviate these problems we facilitate multimodal analysis by recording the
social interplay using four static surveillance cameras and sociometric badges
worn by each participant, comprising the microphone, accelerometer, bluetooth
and infrared sensors. In addition to raw data, we also provide annotations
concerning individuals' personality as well as their position, head, body
orientation and F-formation information over the entire event duration. Through
extensive experiments with state-of-the-art approaches, we show (a) the
limitations of current methods and (b) how the recorded multiple cues
synergetically aid automatic analysis of social interactions. SALSA is
available at http://tev.fbk.eu/salsa.Comment: 14 pages, 11 figure
RGBD Datasets: Past, Present and Future
Since the launch of the Microsoft Kinect, scores of RGBD datasets have been
released. These have propelled advances in areas from reconstruction to gesture
recognition. In this paper we explore the field, reviewing datasets across
eight categories: semantics, object pose estimation, camera tracking, scene
reconstruction, object tracking, human actions, faces and identification. By
extracting relevant information in each category we help researchers to find
appropriate data for their needs, and we consider which datasets have succeeded
in driving computer vision forward and why.
Finally, we examine the future of RGBD datasets. We identify key areas which
are currently underexplored, and suggest that future directions may include
synthetic data and dense reconstructions of static and dynamic scenes.Comment: 8 pages excluding references (CVPR style
- …