24,655 research outputs found

    Human mobility monitoring in very low resolution visual sensor network

    Get PDF
    This paper proposes an automated system for monitoring mobility patterns using a network of very low resolution visual sensors (30 30 pixels). The use of very low resolution sensors reduces privacy concern, cost, computation requirement and power consumption. The core of our proposed system is a robust people tracker that uses low resolution videos provided by the visual sensor network. The distributed processing architecture of our tracking system allows all image processing tasks to be done on the digital signal controller in each visual sensor. In this paper, we experimentally show that reliable tracking of people is possible using very low resolution imagery. We also compare the performance of our tracker against a state-of-the-art tracking method and show that our method outperforms. Moreover, the mobility statistics of tracks such as total distance traveled and average speed derived from trajectories are compared with those derived from ground truth given by Ultra-Wide Band sensors. The results of this comparison show that the trajectories from our system are accurate enough to obtain useful mobility statistics

    Robust Deep Multi-Modal Sensor Fusion using Fusion Weight Regularization and Target Learning

    Full text link
    Sensor fusion has wide applications in many domains including health care and autonomous systems. While the advent of deep learning has enabled promising multi-modal fusion of high-level features and end-to-end sensor fusion solutions, existing deep learning based sensor fusion techniques including deep gating architectures are not always resilient, leading to the issue of fusion weight inconsistency. We propose deep multi-modal sensor fusion architectures with enhanced robustness particularly under the presence of sensor failures. At the core of our gating architectures are fusion weight regularization and fusion target learning operating on auxiliary unimodal sensing networks appended to the main fusion model. The proposed regularized gating architectures outperform the existing deep learning architectures with and without gating under both clean and corrupted sensory inputs resulted from sensor failures. The demonstrated improvements are particularly pronounced when one or more multiple sensory modalities are corrupted.Comment: 8 page

    Cognitive visual tracking and camera control

    Get PDF
    Cognitive visual tracking is the process of observing and understanding the behaviour of a moving person. This paper presents an efficient solution to extract, in real-time, high-level information from an observed scene, and generate the most appropriate commands for a set of pan-tilt-zoom (PTZ) cameras in a surveillance scenario. Such a high-level feedback control loop, which is the main novelty of our work, will serve to reduce uncertainties in the observed scene and to maximize the amount of information extracted from it. It is implemented with a distributed camera system using SQL tables as virtual communication channels, and Situation Graph Trees for knowledge representation, inference and high-level camera control. A set of experiments in a surveillance scenario show the effectiveness of our approach and its potential for real applications of cognitive vision

    SALSA: A Novel Dataset for Multimodal Group Behavior Analysis

    Get PDF
    Studying free-standing conversational groups (FCGs) in unstructured social settings (e.g., cocktail party ) is gratifying due to the wealth of information available at the group (mining social networks) and individual (recognizing native behavioral and personality traits) levels. However, analyzing social scenes involving FCGs is also highly challenging due to the difficulty in extracting behavioral cues such as target locations, their speaking activity and head/body pose due to crowdedness and presence of extreme occlusions. To this end, we propose SALSA, a novel dataset facilitating multimodal and Synergetic sociAL Scene Analysis, and make two main contributions to research on automated social interaction analysis: (1) SALSA records social interactions among 18 participants in a natural, indoor environment for over 60 minutes, under the poster presentation and cocktail party contexts presenting difficulties in the form of low-resolution images, lighting variations, numerous occlusions, reverberations and interfering sound sources; (2) To alleviate these problems we facilitate multimodal analysis by recording the social interplay using four static surveillance cameras and sociometric badges worn by each participant, comprising the microphone, accelerometer, bluetooth and infrared sensors. In addition to raw data, we also provide annotations concerning individuals' personality as well as their position, head, body orientation and F-formation information over the entire event duration. Through extensive experiments with state-of-the-art approaches, we show (a) the limitations of current methods and (b) how the recorded multiple cues synergetically aid automatic analysis of social interactions. SALSA is available at http://tev.fbk.eu/salsa.Comment: 14 pages, 11 figure

    NTU RGB+D 120: A Large-Scale Benchmark for 3D Human Activity Understanding

    Full text link
    Research on depth-based human activity analysis achieved outstanding performance and demonstrated the effectiveness of 3D representation for action recognition. The existing depth-based and RGB+D-based action recognition benchmarks have a number of limitations, including the lack of large-scale training samples, realistic number of distinct class categories, diversity in camera views, varied environmental conditions, and variety of human subjects. In this work, we introduce a large-scale dataset for RGB+D human action recognition, which is collected from 106 distinct subjects and contains more than 114 thousand video samples and 8 million frames. This dataset contains 120 different action classes including daily, mutual, and health-related activities. We evaluate the performance of a series of existing 3D activity analysis methods on this dataset, and show the advantage of applying deep learning methods for 3D-based human action recognition. Furthermore, we investigate a novel one-shot 3D activity recognition problem on our dataset, and a simple yet effective Action-Part Semantic Relevance-aware (APSR) framework is proposed for this task, which yields promising results for recognition of the novel action classes. We believe the introduction of this large-scale dataset will enable the community to apply, adapt, and develop various data-hungry learning techniques for depth-based and RGB+D-based human activity understanding. [The dataset is available at: http://rose1.ntu.edu.sg/Datasets/actionRecognition.asp]Comment: IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI

    Behavior analysis for aging-in-place using similarity heatmaps

    Get PDF
    The demand for healthcare services for an increasing population of older adults is faced with the shortage of skilled caregivers and a constant increase in healthcare costs. In addition, the strong preference of the elderly to live independently has been driving much research on "ambient-assisted living" (AAL) systems to support aging-in-place. In this paper, we propose to employ a low-resolution image sensor network for behavior analysis of a home occupant. A network of 10 low-resolution cameras (30x30 pixels) is installed in a service flat of an elderly, based on which the user's mobility tracks are extracted using a maximum likelihood tracker. We propose a novel measure to find similar patterns of behavior between each pair of days from the user's detected positions, based on heatmaps and Earth mover's distance (EMD). Then, we use an exemplar-based approach to identify sleeping, eating, and sitting activities, and walking patterns of the elderly user for two weeks of real-life recordings. The proposed system achieves an overall accuracy of about 94%

    Autonomous real-time surveillance system with distributed IP cameras

    Get PDF
    An autonomous Internet Protocol (IP) camera based object tracking and behaviour identification system, capable of running in real-time on an embedded system with limited memory and processing power is presented in this paper. The main contribution of this work is the integration of processor intensive image processing algorithms on an embedded platform capable of running at real-time for monitoring the behaviour of pedestrians. The Algorithm Based Object Recognition and Tracking (ABORAT) system architecture presented here was developed on an Intel PXA270-based development board clocked at 520 MHz. The platform was connected to a commercial stationary IP-based camera in a remote monitoring station for intelligent image processing. The system is capable of detecting moving objects and their shadows in a complex environment with varying lighting intensity and moving foliage. Objects moving close to each other are also detected to extract their trajectories which are then fed into an unsupervised neural network for autonomous classification. The novel intelligent video system presented is also capable of performing simple analytic functions such as tracking and generating alerts when objects enter/leave regions or cross tripwires superimposed on live video by the operator

    Human behavioural analysis with self-organizing map for ambient assisted living

    Get PDF
    This paper presents a system for automatically classifying the resting location of a moving object in an indoor environment. The system uses an unsupervised neural network (Self Organising Feature Map) fully implemented on a low-cost, low-power automated home-based surveillance system, capable of monitoring activity level of elders living alone independently. The proposed system runs on an embedded platform with a specialised ceiling-mounted video sensor for intelligent activity monitoring. The system has the ability to learn resting locations, to measure overall activity levels and to detect specific events such as potential falls. First order motion information, including first order moving average smoothing, is generated from the 2D image coordinates (trajectories). A novel edge-based object detection algorithm capable of running at a reasonable speed on the embedded platform has been developed. The classification is dynamic and achieved in real-time. The dynamic classifier is achieved using a SOFM and a probabilistic model. Experimental results show less than 20% classification error, showing the robustness of our approach over others in literature with minimal power consumption. The head location of the subject is also estimated by a novel approach capable of running on any resource limited platform with power constraints
    corecore