3,344 research outputs found

    Human behavioural analysis with self-organizing map for ambient assisted living

    Get PDF
    This paper presents a system for automatically classifying the resting location of a moving object in an indoor environment. The system uses an unsupervised neural network (Self Organising Feature Map) fully implemented on a low-cost, low-power automated home-based surveillance system, capable of monitoring activity level of elders living alone independently. The proposed system runs on an embedded platform with a specialised ceiling-mounted video sensor for intelligent activity monitoring. The system has the ability to learn resting locations, to measure overall activity levels and to detect specific events such as potential falls. First order motion information, including first order moving average smoothing, is generated from the 2D image coordinates (trajectories). A novel edge-based object detection algorithm capable of running at a reasonable speed on the embedded platform has been developed. The classification is dynamic and achieved in real-time. The dynamic classifier is achieved using a SOFM and a probabilistic model. Experimental results show less than 20% classification error, showing the robustness of our approach over others in literature with minimal power consumption. The head location of the subject is also estimated by a novel approach capable of running on any resource limited platform with power constraints

    Hierarchical fuzzy logic based approach for object tracking

    Get PDF
    In this paper a novel tracking approach based on fuzzy concepts is introduced. A methodology for both single and multiple object tracking is presented. The aim of this methodology is to use these concepts as a tool to, while maintaining the needed accuracy, reduce the complexity usually involved in object tracking problems. Several dynamic fuzzy sets are constructed according to both kinematic and non-kinematic properties that distinguish the object to be tracked. Meanwhile kinematic related fuzzy sets model the object's motion pattern, the non-kinematic fuzzy sets model the object's appearance. The tracking task is performed through the fusion of these fuzzy models by means of an inference engine. This way, object detection and matching steps are performed exclusively using inference rules on fuzzy sets. In the multiple object methodology, each object is associated with a confidence degree and a hierarchical implementation is performed based on that confidence degree.info:eu-repo/semantics/publishedVersio

    Articulated human tracking and behavioural analysis in video sequences

    Get PDF
    Recently, there has been a dramatic growth of interest in the observation and tracking of human subjects through video sequences. Arguably, the principal impetus has come from the perceived demand for technological surveillance, however applications in entertainment, intelligent domiciles and medicine are also increasing. This thesis examines human articulated tracking and the classi cation of human movement, rst separately and then as a sequential process. First, this thesis considers the development and training of a 3D model of human body structure and dynamics. To process video sequences, an observation model is also designed with a multi-component likelihood based on edge, silhouette and colour. This is de ned on the articulated limbs, and visible from a single or multiple cameras, each of which may be calibrated from that sequence. Second, for behavioural analysis, we develop a methodology in which actions and activities are described by semantic labels generated from a Movement Cluster Model (MCM). Third, a Hierarchical Partitioned Particle Filter (HPPF) was developed for human tracking that allows multi-level parameter search consistent with the body structure. This tracker relies on the articulated motion prediction provided by the MCM at pose or limb level. Fourth, tracking and movement analysis are integrated to generate a probabilistic activity description with action labels. The implemented algorithms for tracking and behavioural analysis are tested extensively and independently against ground truth on human tracking and surveillance datasets. Dynamic models are shown to predict and generate synthetic motion, while MCM recovers both periodic and non-periodic activities, de ned either on the whole body or at the limb level. Tracking results are comparable with the state of the art, however the integrated behaviour analysis adds to the value of the approach.Overseas Research Students Awards Scheme (ORSAS

    Backing off: hierarchical decomposition of activity for 3D novel pose recovery

    Get PDF
    For model-based 3D human pose estimation, even simple models of the human body lead to high-dimensional state spaces. Where the class of activity is known a priori, low-dimensional activity models learned from training data make possible a thorough and efficient search for the best pose. Conversely, searching for solutions in the full state space places no restriction on the class of motion to be recovered, but is both difficult and expensive. This paper explores a potential middle ground between these approaches, using the hierarchical Gaussian process latent variable model to learn activity at different hierarchical scales within the human skeleton. We show that by training on full-body activity data then descending through the hierarchy in stages and exploring subtrees independently of one another, novel poses may be recovered. Experimental results on motion capture data and monocular video sequences demonstrate the utility of the approach, and comparisons are drawn with existing low-dimensional activity models. © 2009. The copyright of this document resides with its authors

    Backing off: hierarchical decomposition of activity for 3D novel pose recovery

    Get PDF
    For model-based 3D human pose estimation, even simple models of the human body lead to high-dimensional state spaces. Where the class of activity is known a priori, low-dimensional activity models learned from training data make possible a thorough and efficient search for the best pose. Conversely, searching for solutions in the full state space places no restriction on the class of motion to be recovered, but is both difficult and expensive. This paper explores a potential middle ground between these approaches, using the hierarchical Gaussian process latent variable model to learn activity at different hierarchical scales within the human skeleton. We show that by training on full-body activity data then descending through the hierarchy in stages and exploring subtrees independently of one another, novel poses may be recovered. Experimental results on motion capture data and monocular video sequences demonstrate the utility of the approach, and comparisons are drawn with existing low-dimensional activity models. © 2009. The copyright of this document resides with its authors

    A Deeper Look into DeepCap

    Get PDF
    Human performance capture is a highly important computer vision problem with many applications in movie production and virtual/augmented reality. Many previous performance capture approaches either required expensive multi-view setups or did not recover dense space-time coherent geometry with frame-to-frame correspondences. We propose a novel deep learning approach for monocular dense human performance capture. Our method is trained in a weakly supervised manner based on multi-view supervision completely removing the need for training data with 3D ground truth annotations. The network architecture is based on two separate networks that disentangle the task into a pose estimation and a non-rigid surface deformation step. Extensive qualitative and quantitative evaluations show that our approach outperforms the state of the art in terms of quality and robustness. This work is an extended version of DeepCap where we provide more detailed explanations, comparisons and results as well as applications

    Globally Continuous and Non-Markovian Crowd Activity Analysis from Videos

    Get PDF
    Automatically recognizing activities in video is a classic problem in vision and helps to understand behaviors, describe scenes and detect anomalies. We propose an unsupervised method for such purposes. Given video data, we discover recurring activity patterns that appear, peak, wane and disappear over time. By using non-parametric Bayesian methods, we learn coupled spatial and temporal patterns with minimum prior knowledge. To model the temporal changes of patterns, previous works compute Markovian progressions or locally continuous motifs whereas we model time in a globally continuous and non-Markovian way. Visually, the patterns depict flows of major activities. Temporally, each pattern has its own unique appearance-disappearance cycles. To compute compact pattern representations, we also propose a hybrid sampling method. By combining these patterns with detailed environment information, we interpret the semantics of activities and report anomalies. Also, our method fits data better and detects anomalies that were difficult to detect previously

    A vision-based approach to fire detection

    Get PDF
    WOS:000341810500001 (NÂș de Acesso Web of Science)This paper presents a vision-based method for fire detection from fixed surveillance smart cameras. The method integrates several well-known techniques properly adapted to cope with the challenges related to the actual deployment of the vision system. Concretely, background subtraction is performed with a context-based learning mechanism so as to attain higher accuracy and robustness. The computational cost of a frequency analysis of potential fire regions is reduced by means of focusing its operation with an attentive mechanism. For fast discrimination between fire regions and fire-coloured moving objects, a new colour-based model of fire’s appearance and a new wavelet-based model of fire’s frequency signature are proposed. To reduce the false alarm rate due to the presence of fire-coloured moving objects, the category and behaviour of each moving object is taken into account in the decision-making. To estimate the expected object’s size in the image plane and to generate geo-referenced alarms, the camera-world mapping is approximated with a GPS-based calibration process. Experimental results demonstrate the ability of the proposed method to detect fires with an average success rate of 93.1 % at a processing rate of 10 Hz, which is often sufficient for real-life applications

    Efficient 3D Reconstruction, Streaming and Visualization of Static and Dynamic Scene Parts for Multi-client Live-telepresence in Large-scale Environments

    Full text link
    Despite the impressive progress of telepresence systems for room-scale scenes with static and dynamic scene entities, expanding their capabilities to scenarios with larger dynamic environments beyond a fixed size of a few square-meters remains challenging. In this paper, we aim at sharing 3D live-telepresence experiences in large-scale environments beyond room scale with both static and dynamic scene entities at practical bandwidth requirements only based on light-weight scene capture with a single moving consumer-grade RGB-D camera. To this end, we present a system which is built upon a novel hybrid volumetric scene representation in terms of the combination of a voxel-based scene representation for the static contents, that not only stores the reconstructed surface geometry but also contains information about the object semantics as well as their accumulated dynamic movement over time, and a point-cloud-based representation for dynamic scene parts, where the respective separation from static parts is achieved based on semantic and instance information extracted for the input frames. With an independent yet simultaneous streaming of both static and dynamic content, where we seamlessly integrate potentially moving but currently static scene entities in the static model until they are becoming dynamic again, as well as the fusion of static and dynamic data at the remote client, our system is able to achieve VR-based live-telepresence at close to real-time rates. Our evaluation demonstrates the potential of our novel approach in terms of visual quality, performance, and ablation studies regarding involved design choices

    Development of a text reading system on video images

    Get PDF
    Since the early days of computer science researchers sought to devise a machine which could automatically read text to help people with visual impairments. The problem of extracting and recognising text on document images has been largely resolved, but reading text from images of natural scenes remains a challenge. Scene text can present uneven lighting, complex backgrounds or perspective and lens distortion; it usually appears as short sentences or isolated words and shows a very diverse set of typefaces. However, video sequences of natural scenes provide a temporal redundancy that can be exploited to compensate for some of these deficiencies. Here we present a complete end-to-end, real-time scene text reading system on video images based on perspective aware text tracking. The main contribution of this work is a system that automatically detects, recognises and tracks text in videos of natural scenes in real-time. The focus of our method is on large text found in outdoor environments, such as shop signs, street names and billboards. We introduce novel efficient techniques for text detection, text aggregation and text perspective estimation. Furthermore, we propose using a set of Unscented Kalman Filters (UKF) to maintain each text regionÂżs identity and to continuously track the homography transformation of the text into a fronto-parallel view, thereby being resilient to erratic camera motion and wide baseline changes in orientation. The orientation of each text line is estimated using a method that relies on the geometry of the characters themselves to estimate a rectifying homography. This is done irrespective of the view of the text over a large range of orientations. We also demonstrate a wearable head-mounted device for text reading that encases a camera for image acquisition and a pair of headphones for synthesized speech output. Our system is designed for continuous and unsupervised operation over long periods of time. It is completely automatic and features quick failure recovery and interactive text reading. It is also highly parallelised in order to maximize the usage of available processing power and to achieve real-time operation. We show comparative results that improve the current state-of-the-art when correcting perspective deformation of scene text. The end-to-end system performance is demonstrated on sequences recorded in outdoor scenarios. Finally, we also release a dataset of text tracking videos along with the annotated ground-truth of text regions
    • 

    corecore