45 research outputs found

    Multi-view Face Pose Classification by Boosting with Weak Hypothesis Fusion Using Visual and Infrared Images

    Get PDF
    This paper proposes a novel method for multi-view face pose classification through sequential learning and sensor fusion. The basic idea is to use face images observed in visual and thermal infrared (IR) bands, with the same sampling weight in a multi-class boosting structure. The main contribution of this paper is a multi-class AdaBoost classification framework where information obtained from visual and infrared bands interactively complement each other. This is achieved by learning weak hypothesis for visual and IR band independently and then fusing the optimized hypothesis sub-ensembles. In addition, an effective feature descriptor is introduced to thermal IR images. Experiments are conducted on a visual and thermal IR image dataset containing 4844 face images in 5 different poses. Results have shown significant increase in classification rate as compared with an existing multi-class AdaBoost algorithm SAMME trained on visual or infrared images alone, as well as a simple baseline classification-fusion algorithm

    Image Classification by Multi-Class Boosting of Visual and Infrared Fusion with Applications to Object Pose Recognition

    Get PDF
    This paper proposes a novel method for multiview object pose classification through sequential learning and sensor fusion. The basic idea is to use images observed in visual and infrared (IR) bands, with the same sampling weight under a multi-class boosting framework. The main contribution of this paper is a multi-class AdaBoost classification framework where visual and infrared information interactively complement each other. This is achieved by learning hypothesis for visual and infrared bands independently and then fusing the optimized hypothesis subensembles. Experiments are conducted on several image datasets including a set of visual and thermal IR images containing 4844 face images in 5 different poses. Results have shown significant increase in classification rate as compared with an existing multi-class AdaBoost algorithm SAMME trained on visual or infrared images alone, as well as a simple baseline classification-fusion algorithm

    Riemannian Manifold-Based Support Vector Machine for Human Activity Classification in Images

    Get PDF
    This paper addresses the issue of classification of human activities in still images. We propose a novel method where part-based features focusing on human and object interaction are utilized for activity representation, and classification is designed on manifolds by exploiting underlying Riemannian geometry. The main contributions of the paper include: (a) represent human activity by appearance features from image patches containing hands, and by structural features formed from the distances between the torso and patch centers; (b) formulate SVM kernel function based on the geodesics on Riemannian manifolds under the log-Euclidean metric; (c) apply multi-class SVM classifier on the manifold under the one-against-all strategy. Experiments were conducted on a dataset containing 2750 images in 7 classes of activities from 10 subjects. Results have shown good performance (average classification rate of 95.83%, false positive 0.71%, false negative 4.24%). Comparisons with three other related classifiers provide further support to the proposed method

    Riemannian Manifold-Based Modeling and Classification Methods for Video Activities with Applications to Assisted Living and Smart Home

    Get PDF
    This thesis mainly focuses on visual-information based daily activity classification, anomaly detection, and video tracking through using visual sensors.\ua0The main reasons for adopting visual-information based methods are due to: (i) vision plays a major role in recognition/classification of activities which is a fundamental issue in a human-centric system; (ii) visual sensor-based analysis may possibly offer high performance with minimum disturbance to individuals\u27 daily lives.Manifolds are employed for efficient modeling and low-dimensional representation of video activities, due to the following reasons: (a) the nonlinear nature of manifolds enables effective description of dynamic processes of human activities involving non-planar movement, which lie on a nonlinear manifold other than a vector space; (b) many video features of human activities may be effectively described by low-dimensional data points on the Riemannian manifold while still maintaining the important property such as topology and geometry; (c) the Riemannian geometry provides a way to measure the distances/dissimilarities between different activities on the nonlinear manifold, hence is a suitable tool for classification and tracking.In this thesis, six different methods for visual analysis of human activities are introduced, including fall detection in video, activity classification in image and video, and video tracking using single camera and multiple cameras. Considering the contribution in theoretical aspects, the use of Riemannian manifolds was investigated for mathematical modeling of video activities, and new methods were developed for characterizing and distinguishing different activities. Experiments on real-world video/image datasets were conducted to evaluate the performance of each method. Results, comparisons, and evaluations showed that the methods achieved state-of-the-art performance. From the perspective of application, the methods have a wide range of potential applications such as assisted living, smart homes, eHealthcare, smart vehicles, office automation, safety systems and services, security systems, situation-aware human-computer interfaces, robot learning, etc

    Visual Object Tracking and Classification Using Multiple Sensor Measurements

    No full text
    Multiple sensor measurement has gained in popularity for computer vision tasks such as visual object tracking and visual pattern classification. The main idea is that multiple sensors may provide rich and redundant information, due to wide spatial or frequency coverage of the scene, which is advantageous over single sensor measurement in learning object model/feature and inferring target state/attribute in complex scenarios.This thesis mainly addresses two problems, both exploiting multiple sensor measurement. One is video object tracking through occlusions using multiple uncalibrated cameras with overlapping fields of view, the other is multi-class image classification through sensor fusion of visual-band and thermal infrared (IR) cameras.Paper A proposes a multi-view tracker in an alternate mode with online learning on Riemannian manifolds by cross-view appearance mapping. The mapping of object appearance between views is achieved by projective transformations that are estimated from warped vertical axis of tracked object by combining multi-view geometric constraints. A similarity metric is defined on Riemannian manifolds, as the shortest geodesic distance between a candidate object and a set of mapped references from multiple views. Based on this metric, a criterion of multi-view maximum likelihood (ML) is introduced for the inference of object state.Paper B proposes a visual-IR fusion-based classifier by multi-class boosting with sub-ensemble learning. In our scheme, a multi-class AdaBoost classification framework is presented where information obtained from visual and thermal IR bands interactively complement each other. This is accomplished by learning weak hypotheses for visual and IR bands independently and then fusing them as learning a sub-ensemble.Proposed methods are shown to be effective and have improved performance compared to previous approaches that are closely related, as demonstrated through experiments based on real-world datasets

    Riemannian Manifold-Based Modeling and Classification Methods for Video Activities with Applications to Assisted Living and Smart Home

    No full text
    This thesis mainly focuses on visual-information based daily activity classification, anomaly detection, and video tracking through using visual sensors.\ua0The main reasons for adopting visual-information based methods are due to: (i) vision plays a major role in recognition/classification of activities which is a fundamental issue in a human-centric system; (ii) visual sensor-based analysis may possibly offer high performance with minimum disturbance to individuals\u27 daily lives.Manifolds are employed for efficient modeling and low-dimensional representation of video activities, due to the following reasons: (a) the nonlinear nature of manifolds enables effective description of dynamic processes of human activities involving non-planar movement, which lie on a nonlinear manifold other than a vector space; (b) many video features of human activities may be effectively described by low-dimensional data points on the Riemannian manifold while still maintaining the important property such as topology and geometry; (c) the Riemannian geometry provides a way to measure the distances/dissimilarities between different activities on the nonlinear manifold, hence is a suitable tool for classification and tracking.In this thesis, six different methods for visual analysis of human activities are introduced, including fall detection in video, activity classification in image and video, and video tracking using single camera and multiple cameras. Considering the contribution in theoretical aspects, the use of Riemannian manifolds was investigated for mathematical modeling of video activities, and new methods were developed for characterizing and distinguishing different activities. Experiments on real-world video/image datasets were conducted to evaluate the performance of each method. Results, comparisons, and evaluations showed that the methods achieved state-of-the-art performance. From the perspective of application, the methods have a wide range of potential applications such as assisted living, smart homes, eHealthcare, smart vehicles, office automation, safety systems and services, security systems, situation-aware human-computer interfaces, robot learning, etc

    Visual Object Tracking and Classification Using Multiple Sensor Measurements

    Get PDF
    Multiple sensor measurement has gained in popularity for computer vision tasks such as visual object tracking and visual pattern classification. The main idea is that multiple sensors may provide rich and redundant information, due to wide spatial or frequency coverage of the scene, which is advantageous over single sensor measurement in learning object model/feature and inferring target state/attribute in complex scenarios.This thesis mainly addresses two problems, both exploiting multiple sensor measurement. One is video object tracking through occlusions using multiple uncalibrated cameras with overlapping fields of view, the other is multi-class image classification through sensor fusion of visual-band and thermal infrared (IR) cameras.Paper A proposes a multi-view tracker in an alternate mode with online learning on Riemannian manifolds by cross-view appearance mapping. The mapping of object appearance between views is achieved by projective transformations that are estimated from warped vertical axis of tracked object by combining multi-view geometric constraints. A similarity metric is defined on Riemannian manifolds, as the shortest geodesic distance between a candidate object and a set of mapped references from multiple views. Based on this metric, a criterion of multi-view maximum likelihood (ML) is introduced for the inference of object state.Paper B proposes a visual-IR fusion-based classifier by multi-class boosting with sub-ensemble learning. In our scheme, a multi-class AdaBoost classification framework is presented where information obtained from visual and thermal IR bands interactively complement each other. This is accomplished by learning weak hypotheses for visual and IR bands independently and then fusing them as learning a sub-ensemble.Proposed methods are shown to be effective and have improved performance compared to previous approaches that are closely related, as demonstrated through experiments based on real-world datasets

    Exploiting Riemannian Manifolds for Daily Activity Classification in Video Towards Health Care

    No full text
    This paper addresses the problem of classifying activities of daily living in video. The proposed method uses a tree structure of two layers, where in each node of the tree there resides a Riemannian manifold that corresponds to different part-based covariance features. In the first layer, activities are classified according to the dynamics of upper body parts. In the second layer, activities are further classified according to the appearance of local image patches at hands in key frames, where the interacting objects are likely to be attached. The novelties of this paper include: (i) characterizing the motion of upper body parts by a covariance matrix of distances between each pair of key points and the orientations of lines that connect them; (ii) describing human-object interaction by the appearance of local regions around hands in key frames that are selected based on the proximity of hands to other key points; (iii) formulating a pairwise geodesics-based kernel for activity classification on Riemannian manifolds under the log-Euclidean metric. Experiments were conducted on a video dataset containing a total number of 426 video events (activities) from 4 classes. The proposed method is shown to be effective by achieving high classification accuracy (93.79% on average) and small false alarms (1.99% on average) overall, as well as for each individual class

    Human fall detection in videos via boosting and fusing statistical features of appearance, shape and motion dynamics on Riemannian manifolds with applications to assisted living

    No full text
    This paper addresses issues in fall detection from videos. It is commonly observed that a falling person undergoes large appearance change, shape deformation and physical displacement, thus the focus here is on the analysis of these dynamic features that vary drastically in camera views while a person falls onto the ground. A novel approach is proposed that performs such analysis on Riemannian manifolds, detecting falls from a single camera with arbitrary view angles. The main novelties of this paper include: (a) representing the dynamic appearance, shape and motion of a target person each being points moving on a different Riemannian manifold; (b) characterizing the dynamics of different features by computing velocity statistics of their corresponding manifold points, based on geodesic distances; (c) employing a feature weighting approach, where each statistical feature is weighted according to the mutual information; (d) fusing statistical features learned from different manifolds with a two-stage ensemble learning strategy under a boosting framework. Experiments have been conducted on two video datasets for fall detection. Tests, evaluations and comparisons with 6 state-of-the-art methods have provided support to the effectiveness of the proposed method

    Human Fall Detection in Videos by Fusing Statistical Features of Shape and Motion Dynamics on Riemannian Manifolds

    No full text
    This paper addresses issues in fall detection in videos. We propose a novel method to detect human falls from arbitrary view angles, through analyzing dynamic shape and motion of image regions of human bodies on Riemannian manifolds. The proposed method exploits time-dependent dynamic features on smooth manifolds based on the observation that human falls often involve drastically shape changes and abrupt motions as comparing with other activities. The main novelties of this paper include: (a) representing videos of human activities by dynamic shape points and motion points moving on two separate unit n-spheres, or, two simple Riemannian manifolds; (b) characterizing the dynamic shape and motion of each video activity by computing the velocity statistics on the two manifolds, based on geodesic distances; (c) combining the statistical features of dynamic shape and motion that are learned from their corresponding manifolds via mutual information. Experiments were conducted on three video datasets, containing 400 videos of 5 activities, 100 videos of 4 activities, and 768 videos of 3 activities, respectively, where videos were captured from cameras in different view angles. Our test results have shown high detection rate (average 99.38%) and low false alarm (average 1.84%). Comparisons with eight state-of-the-art methods have provided further support to the proposed method
    corecore