98,168 research outputs found

    SALSA: A Novel Dataset for Multimodal Group Behavior Analysis

    Get PDF
    Studying free-standing conversational groups (FCGs) in unstructured social settings (e.g., cocktail party ) is gratifying due to the wealth of information available at the group (mining social networks) and individual (recognizing native behavioral and personality traits) levels. However, analyzing social scenes involving FCGs is also highly challenging due to the difficulty in extracting behavioral cues such as target locations, their speaking activity and head/body pose due to crowdedness and presence of extreme occlusions. To this end, we propose SALSA, a novel dataset facilitating multimodal and Synergetic sociAL Scene Analysis, and make two main contributions to research on automated social interaction analysis: (1) SALSA records social interactions among 18 participants in a natural, indoor environment for over 60 minutes, under the poster presentation and cocktail party contexts presenting difficulties in the form of low-resolution images, lighting variations, numerous occlusions, reverberations and interfering sound sources; (2) To alleviate these problems we facilitate multimodal analysis by recording the social interplay using four static surveillance cameras and sociometric badges worn by each participant, comprising the microphone, accelerometer, bluetooth and infrared sensors. In addition to raw data, we also provide annotations concerning individuals' personality as well as their position, head, body orientation and F-formation information over the entire event duration. Through extensive experiments with state-of-the-art approaches, we show (a) the limitations of current methods and (b) how the recorded multiple cues synergetically aid automatic analysis of social interactions. SALSA is available at http://tev.fbk.eu/salsa.Comment: 14 pages, 11 figure

    Real-time human ambulation, activity, and physiological monitoring:taxonomy of issues, techniques, applications, challenges and limitations

    Get PDF
    Automated methods of real-time, unobtrusive, human ambulation, activity, and wellness monitoring and data analysis using various algorithmic techniques have been subjects of intense research. The general aim is to devise effective means of addressing the demands of assisted living, rehabilitation, and clinical observation and assessment through sensor-based monitoring. The research studies have resulted in a large amount of literature. This paper presents a holistic articulation of the research studies and offers comprehensive insights along four main axes: distribution of existing studies; monitoring device framework and sensor types; data collection, processing and analysis; and applications, limitations and challenges. The aim is to present a systematic and most complete study of literature in the area in order to identify research gaps and prioritize future research directions

    Low-cost natural interface based on head movements

    Get PDF
    Sometimes people look for freedom in the virtual world. However, not all have the possibility to interact with a computer in the same way. Nowadays, almost every job requires interaction with computerized systems, so people with physical impairments do not have the same freedom to control a mouse, a keyboard or a touchscreen. In the last years, some of the government programs to help people with reduced mobility suffered a lot with the global economic crisis and some of those programs were even cut down to reduce costs. This paper focuses on the development of a touchless human-computer interface, which allows anyone to control a computer without using a keyboard, mouse or touchscreen. By reusing Microsoft Kinect sensors from old videogames consoles, a cost-reduced, easy to use, and open-source interface was developed, allowing control of a computer using only the head, eyes or mouth movements, with the possibility of complementary sound commands. There are already available similar commercial solutions, but they are so expensive that their price tends to be a real obstacle in their purchase; on the other hand, free solutions usually do not offer the freedom that people with reduced mobility need. The present solution tries to address these drawbacks. (C) 2015 Published by Elsevier B.V

    A Mimetic Strategy to Engage Voluntary Physical Activity In Interactive Entertainment

    Full text link
    We describe the design and implementation of a vision based interactive entertainment system that makes use of both involuntary and voluntary control paradigms. Unintentional input to the system from a potential viewer is used to drive attention-getting output and encourage the transition to voluntary interactive behaviour. The iMime system consists of a character animation engine based on the interaction metaphor of a mime performer that simulates non-verbal communication strategies, without spoken dialogue, to capture and hold the attention of a viewer. The system was developed in the context of a project studying care of dementia sufferers. Care for a dementia sufferer can place unreasonable demands on the time and attentional resources of their caregivers or family members. Our study contributes to the eventual development of a system aimed at providing relief to dementia caregivers, while at the same time serving as a source of pleasant interactive entertainment for viewers. The work reported here is also aimed at a more general study of the design of interactive entertainment systems involving a mixture of voluntary and involuntary control.Comment: 6 pages, 7 figures, ECAG08 worksho

    RGB-D datasets using microsoft kinect or similar sensors: a survey

    Get PDF
    RGB-D data has turned out to be a very useful representation of an indoor scene for solving fundamental computer vision problems. It takes the advantages of the color image that provides appearance information of an object and also the depth image that is immune to the variations in color, illumination, rotation angle and scale. With the invention of the low-cost Microsoft Kinect sensor, which was initially used for gaming and later became a popular device for computer vision, high quality RGB-D data can be acquired easily. In recent years, more and more RGB-D image/video datasets dedicated to various applications have become available, which are of great importance to benchmark the state-of-the-art. In this paper, we systematically survey popular RGB-D datasets for different applications including object recognition, scene classification, hand gesture recognition, 3D-simultaneous localization and mapping, and pose estimation. We provide the insights into the characteristics of each important dataset, and compare the popularity and the difficulty of those datasets. Overall, the main goal of this survey is to give a comprehensive description about the available RGB-D datasets and thus to guide researchers in the selection of suitable datasets for evaluating their algorithms

    Regularizing Deep Networks by Modeling and Predicting Label Structure

    Full text link
    We construct custom regularization functions for use in supervised training of deep neural networks. Our technique is applicable when the ground-truth labels themselves exhibit internal structure; we derive a regularizer by learning an autoencoder over the set of annotations. Training thereby becomes a two-phase procedure. The first phase models labels with an autoencoder. The second phase trains the actual network of interest by attaching an auxiliary branch that must predict output via a hidden layer of the autoencoder. After training, we discard this auxiliary branch. We experiment in the context of semantic segmentation, demonstrating this regularization strategy leads to consistent accuracy boosts over baselines, both when training from scratch, or in combination with ImageNet pretraining. Gains are also consistent over different choices of convolutional network architecture. As our regularizer is discarded after training, our method has zero cost at test time; the performance improvements are essentially free. We are simply able to learn better network weights by building an abstract model of the label space, and then training the network to understand this abstraction alongside the original task.Comment: to appear at CVPR 201
    • …
    corecore