2,475 research outputs found

    SALSA: A Novel Dataset for Multimodal Group Behavior Analysis

    Get PDF
    Studying free-standing conversational groups (FCGs) in unstructured social settings (e.g., cocktail party ) is gratifying due to the wealth of information available at the group (mining social networks) and individual (recognizing native behavioral and personality traits) levels. However, analyzing social scenes involving FCGs is also highly challenging due to the difficulty in extracting behavioral cues such as target locations, their speaking activity and head/body pose due to crowdedness and presence of extreme occlusions. To this end, we propose SALSA, a novel dataset facilitating multimodal and Synergetic sociAL Scene Analysis, and make two main contributions to research on automated social interaction analysis: (1) SALSA records social interactions among 18 participants in a natural, indoor environment for over 60 minutes, under the poster presentation and cocktail party contexts presenting difficulties in the form of low-resolution images, lighting variations, numerous occlusions, reverberations and interfering sound sources; (2) To alleviate these problems we facilitate multimodal analysis by recording the social interplay using four static surveillance cameras and sociometric badges worn by each participant, comprising the microphone, accelerometer, bluetooth and infrared sensors. In addition to raw data, we also provide annotations concerning individuals' personality as well as their position, head, body orientation and F-formation information over the entire event duration. Through extensive experiments with state-of-the-art approaches, we show (a) the limitations of current methods and (b) how the recorded multiple cues synergetically aid automatic analysis of social interactions. SALSA is available at http://tev.fbk.eu/salsa.Comment: 14 pages, 11 figure

    On realistic target coverage by autonomous drones

    Get PDF
    Low-cost mini-drones with advanced sensing and maneuverability enable a new class of intelligent sensing systems. To achieve the full potential of such drones, it is necessary to develop new enhanced formulations of both common and emerging sensing scenarios. Namely, several fundamental challenges in visual sensing are yet to be solved including (1) fitting sizable targets in camera frames; (2) positioning cameras at effective viewpoints matching target poses; and (3) accounting for occlusion by elements in the environment, including other targets. In this article, we introduce Argus, an autonomous system that utilizes drones to collect target information incrementally through a two-tier architecture. To tackle the stated challenges, Argus employs a novel geometric model that captures both target shapes and coverage constraints. Recognizing drones as the scarcest resource, Argus aims to minimize the number of drones required to cover a set of targets. We prove this problem is NP-hard, and even hard to approximate, before deriving a best-possible approximation algorithm along with a competitive sampling heuristic which runs up to 100Ă— faster according to large-scale simulations. To test Argus in action, we demonstrate and analyze its performance on a prototype implementation. Finally, we present a number of extensions to accommodate more application requirements and highlight some open problems

    Infrared face recognition: a comprehensive review of methodologies and databases

    Full text link
    Automatic face recognition is an area with immense practical potential which includes a wide range of commercial and law enforcement applications. Hence it is unsurprising that it continues to be one of the most active research areas of computer vision. Even after over three decades of intense research, the state-of-the-art in face recognition continues to improve, benefitting from advances in a range of different research fields such as image processing, pattern recognition, computer graphics, and physiology. Systems based on visible spectrum images, the most researched face recognition modality, have reached a significant level of maturity with some practical success. However, they continue to face challenges in the presence of illumination, pose and expression changes, as well as facial disguises, all of which can significantly decrease recognition accuracy. Amongst various approaches which have been proposed in an attempt to overcome these limitations, the use of infrared (IR) imaging has emerged as a particularly promising research direction. This paper presents a comprehensive and timely review of the literature on this subject. Our key contributions are: (i) a summary of the inherent properties of infrared imaging which makes this modality promising in the context of face recognition, (ii) a systematic review of the most influential approaches, with a focus on emerging common trends as well as key differences between alternative methodologies, (iii) a description of the main databases of infrared facial images available to the researcher, and lastly (iv) a discussion of the most promising avenues for future research.Comment: Pattern Recognition, 2014. arXiv admin note: substantial text overlap with arXiv:1306.160

    The cognitive neuroscience of visual working memory

    Get PDF
    Visual working memory allows us to temporarily maintain and manipulate visual information in order to solve a task. The study of the brain mechanisms underlying this function began more than half a century ago, with Scoville and Milner’s (1957) seminal discoveries with amnesic patients. This timely collection of papers brings together diverse perspectives on the cognitive neuroscience of visual working memory from multiple fields that have traditionally been fairly disjointed: human neuroimaging, electrophysiological, behavioural and animal lesion studies, investigating both the developing and the adult brain

    Development of a text reading system on video images

    Get PDF
    Since the early days of computer science researchers sought to devise a machine which could automatically read text to help people with visual impairments. The problem of extracting and recognising text on document images has been largely resolved, but reading text from images of natural scenes remains a challenge. Scene text can present uneven lighting, complex backgrounds or perspective and lens distortion; it usually appears as short sentences or isolated words and shows a very diverse set of typefaces. However, video sequences of natural scenes provide a temporal redundancy that can be exploited to compensate for some of these deficiencies. Here we present a complete end-to-end, real-time scene text reading system on video images based on perspective aware text tracking. The main contribution of this work is a system that automatically detects, recognises and tracks text in videos of natural scenes in real-time. The focus of our method is on large text found in outdoor environments, such as shop signs, street names and billboards. We introduce novel efficient techniques for text detection, text aggregation and text perspective estimation. Furthermore, we propose using a set of Unscented Kalman Filters (UKF) to maintain each text regionÂżs identity and to continuously track the homography transformation of the text into a fronto-parallel view, thereby being resilient to erratic camera motion and wide baseline changes in orientation. The orientation of each text line is estimated using a method that relies on the geometry of the characters themselves to estimate a rectifying homography. This is done irrespective of the view of the text over a large range of orientations. We also demonstrate a wearable head-mounted device for text reading that encases a camera for image acquisition and a pair of headphones for synthesized speech output. Our system is designed for continuous and unsupervised operation over long periods of time. It is completely automatic and features quick failure recovery and interactive text reading. It is also highly parallelised in order to maximize the usage of available processing power and to achieve real-time operation. We show comparative results that improve the current state-of-the-art when correcting perspective deformation of scene text. The end-to-end system performance is demonstrated on sequences recorded in outdoor scenarios. Finally, we also release a dataset of text tracking videos along with the annotated ground-truth of text regions

    Scene analysis in the natural environment

    Get PDF
    The problem of scene analysis has been studied in a number of different fields over the past decades. These studies have led to a number of important insights into problems of scene analysis, but not all of these insights are widely appreciated. Despite this progress, there are also critical shortcomings in current approaches that hinder further progress. Here we take the view that scene analysis is a universal problem solved by all animals, and that we can gain new insight by studying the problems that animals face in complex natural environments. In particular, the jumping spider, songbird, echolocating bat, and electric fish, all exhibit behaviors that require robust solutions to scene analysis problems encountered in the natural environment. By examining the behaviors of these seemingly disparate animals, we emerge with a framework for studying analysis comprising four essential properties: 1) the ability to solve ill-posed problems, 2) the ability to integrate and store information across time and modality, 3) efficient recovery and representation of 3D scene structure, and 4) the use of optimal motor actions for acquiring information to progress towards behavioral goals

    Neural mechanisms underlying neurooptometric rehabilitation following traumatic brain injury

    Get PDF
    Mild to severe traumatic brain injuries have lasting effects on everyday functioning. Issues relating to sensory problems are often overlooked or not addressed until well after the onset of the injury. In particular, vision problems related to ambient vision and the magnocellular pathway often result in posttrauma vision syndrome or visual midline shift syndrome. Symptoms from these syndromes are not restricted to the visual domain. Patients commonly experience proprioceptive, kinesthetic, vestibular, cognitive, and language problems. Neurooptometric rehabilitation often entails the use of corrective lenses, prisms, and binasal occlusion to accommodate the unstable magnocellular system. However, little is known regarding the neural mechanisms engaged during neurooptometric rehabilitation, nor how these mechanisms impact other domains. Event-related potentials from noninvasive electrophysiological recordings can be used to assess rehabilitation progress in patients. In this case report, high-density visual event-related potentials were recorded from one patient with posttrauma vision syndrome and secondary visual midline shift syndrome during a pattern reversal task, both with and without prisms. Results indicate that two factors occurring during the end portion of the P148 component (168–256 milliseconds poststimulus onset) map onto two separate neural systems that were engaged with and without neurooptometric rehabilitation. Without prisms, neural sources within somatosensory, language, and executive brain regions engage inefficient magnocellular system processing. However, when corrective prisms were worn, primary visual areas were appropriately engaged. The impact of using early neurooptometric rehabilitation for posttrauma vision syndrome, visual midline shift syndrome, and other similar subtle vision disorders to support neural reorganization is discussed
    • …
    corecore