3,692 research outputs found

    Finding Faces in Cluttered Scenes using Random Labeled Graph Matching

    Get PDF
    An algorithm for locating quasi-frontal views of human faces in cluttered scenes is presented. The algorithm works by coupling a set of local feature detectors with a statistical model of the mutual distances between facial features it is invariant with respect to translation, rotation (in the plane), and scale and can handle partial occlusions of the face. On a challenging database with complicated and varied backgrounds, the algorithm achieved a correct localization rate of 95% in images where the face appeared quasi-frontally

    Self-Supervised Vision-Based Detection of the Active Speaker as Support for Socially-Aware Language Acquisition

    Full text link
    This paper presents a self-supervised method for visual detection of the active speaker in a multi-person spoken interaction scenario. Active speaker detection is a fundamental prerequisite for any artificial cognitive system attempting to acquire language in social settings. The proposed method is intended to complement the acoustic detection of the active speaker, thus improving the system robustness in noisy conditions. The method can detect an arbitrary number of possibly overlapping active speakers based exclusively on visual information about their face. Furthermore, the method does not rely on external annotations, thus complying with cognitive development. Instead, the method uses information from the auditory modality to support learning in the visual domain. This paper reports an extensive evaluation of the proposed method using a large multi-person face-to-face interaction dataset. The results show good performance in a speaker dependent setting. However, in a speaker independent setting the proposed method yields a significantly lower performance. We believe that the proposed method represents an essential component of any artificial cognitive system or robotic platform engaging in social interactions.Comment: 10 pages, IEEE Transactions on Cognitive and Developmental System

    Searching for a talking face: the effect of degrading the auditory signal

    Get PDF
    Previous research (e.g. McGurk and MacDonald, 1976) suggests that faces and voices are bound automatically, but recent evidence suggests that attention is involved in a task of searching for a talking face (Alsius and Soto-Faraco, 2011). We hypothesised that the processing demands of the stimuli may affect the amount of attentional resources required, and investigated what effect degrading the auditory stimulus had on the time taken to locate a talking face. Twenty participants were presented with between 2 and 4 faces articulating different sentences, and had to decide which of these faces matched the sentence that they heard. The results showed that in the least demanding auditory condition (clear speech in quiet), search times did not significantly increase when the number of faces increased. However, when speech was presented in background noise or was processed to simulate the information provided by a cochlear implant, search times increased as the number of faces increased. Thus, it seems that the amount of attentional resources required vary according to the processing demands of the auditory stimuli, and when processing load is increased then faces need to be individually attended to in order to complete the task. Based on these results we would expect cochlear-implant users to find the task of locating a talking face more attentionally demanding than normal hearing listeners

    Spacing affects some but not all visual searches: Implications for theories of attention and crowding

    Get PDF
    We investigated the effect of varying interstimulus spacing on an upright among inverted face search and a red–green among green–red bisected disk search. Both tasks are classic examples of serial search; however, spacing affects them very differently: As spacing increased, face discrimination performance improved significantly, whereas performance on the bisected disks remained poor. (No effect of spacing was observed for either a red among green or an L among + search tasks, two classic examples of parallel search.) In a second experiment, we precued the target location so that attention was no longer a limiting factor: Both serial search tasks were now equally affected by spacing, a result we attribute to a more classical form of crowding. The observed spacing effect in visual search suggests that for certain tasks, serial search may result from local neuronal competition between target and distractors, soliciting attentional resources; in other cases, serial search must occur for another reason, for example, because an item-by-item, attention-mediated recognition must take place. We speculate that this distinction may be based on whether or not there exist neuronal populations tuned to the relevant target–distractor distinction, and we discuss the possible relations between this spacing effect in visual search and other forms of crowding

    "Not the Usual Suspects": A Study of Factors Reducing the Effectiveness of CCTV

    Get PDF
    Previous research on the effectiveness of Closed Circuit Television (CCTV) has focused on critically assessing police and government claims that CCTV is effective in reducing crime. This paper presents a field study that investigates the relationship between CCTV system design and the performance of operator tasks. We carried out structured observations and interviews with 13 managers and 38 operators at 13 CCTV control rooms. A number of failures were identified, including the poor configuration of technology, poor quality video recordings, and a lack of system integration. Stakeholder communication was poor, and there were too many cameras and too few operators. These failures have been previously identified by researchers; however, no design improvements have been made to control rooms in the last decade. We identify a number of measures to improve operator performance, and contribute a set of recommendations for security managers and practitioners. Security Journal (2010) 23, 134-154. doi:10.1057/sj.2008.2; published online 6 October 200

    Robust manipulability-centric object detection in time-of-flight camera point clouds

    Full text link
    This paper presents a method for robustly identifying the manipulability of objects in a scene based on the capabilities of the manipulator. The method uses a directed histogram search of a time-of-flight camera generated 3D point cloud that exploits the logical connection between objects and the respective supporting surface to facilitate scene segmentation. Once segmented the points above the supporting surface are searched, again with a directed histogram, and potentially manipulatable objects identified. Finally, the manipulatable objects in the scene are identified as those from the potential objects set that are within the manipulators capabilities. It is shown empirically that the method robustly detects the supporting surface with ±15mm accuracy and successfully discriminates between graspable and non-graspable objects in cluttered and complex scenes

    Low-cost interactive active monocular range finder

    Full text link
    This paper describes a low-cost interactive active monocular range finder and illustrates the effect of introducing interactivity to the range acquisition process. The range finder consists of only one camera and a laser pointer, to which three LEDs are attached. When a user scans the laser along surfaces of objects, the camera captures the image of spots (one from the laser, and the others from LEDs), and triangulation is carried out using the camera\u27s viewing direction and the optical axis of the laser. The user interaction allows the range finder to acquire range data in which the sampling rate varies across the object depending on the underlying surface structures. Moreover, the processes of separating objects from the background and/or finding parts in the object can be achieved using the operator\u27s knowledge of the objects
    • 

    corecore