61 research outputs found

    View and Illumination Invariant Object Classification Based on 3D Color Histogram Using Convolutional Neural Networks

    Get PDF
    Object classification is an important step in visual recognition and semantic analysis of visual content. In this paper, we propose a method for classification of objects that is invariant to illumination color, illumination direction and viewpoint based on 3D color histogram. A 3D color histogram of an image is represented as a 2D image, to capture the color composition while preserving the neighborhood information of color bins, to realize the necessary visual cues for classification of objects. Also, the ability of convolutional neural network (CNN) to learn invariant visual patterns is exploited for object classification. The efficacy of the proposed method is demonstrated on Amsterdam Library of Object Images (ALOI) dataset captured under various illumination conditions and angles-of-view

    Computer Vision-based Accident Detection in Traffic Surveillance

    Full text link
    Computer vision-based accident detection through video surveillance has become a beneficial but daunting task. In this paper, a neoteric framework for detection of road accidents is proposed. The proposed framework capitalizes on Mask R-CNN for accurate object detection followed by an efficient centroid based object tracking algorithm for surveillance footage. The probability of an accident is determined based on speed and trajectory anomalies in a vehicle after an overlap with other vehicles. The proposed framework provides a robust method to achieve a high Detection Rate and a low False Alarm Rate on general road-traffic CCTV surveillance footage. This framework was evaluated on diverse conditions such as broad daylight, low visibility, rain, hail, and snow using the proposed dataset. This framework was found effective and paves the way to the development of general-purpose vehicular accident detection algorithms in real-time.Comment: Accepted in 10th ICCCNT 201

    Bag of Deep Features for Instructor Activity Recognition in Lecture Room

    Get PDF
    This paper has been presented at : 25th International Conference on MultiMedia Modeling (MMM2019)This research aims to explore contextual visual information in the lecture room, to assist an instructor to articulate the effectiveness of the delivered lecture. The objective is to enable a self-evaluation mechanism for the instructor to improve lecture productivity by understanding their activities. Teacher’s effectiveness has a remarkable impact on uplifting students performance to make them succeed academically and professionally. Therefore, the process of lecture evaluation can significantly contribute to improve academic quality and governance. In this paper, we propose a vision-based framework to recognize the activities of the instructor for self-evaluation of the delivered lectures. The proposed approach uses motion templates of instructor activities and describes them through a Bag-of-Deep features (BoDF) representation. Deep spatio-temporal features extracted from motion templates are utilized to compile a visual vocabulary. The visual vocabulary for instructor activity recognition is quantized to optimize the learning model. A Support Vector Machine classifier is used to generate the model and predict the instructor activities. We evaluated the proposed scheme on a self-captured lecture room dataset, IAVID-1. Eight instructor activities: pointing towards the student, pointing towards board or screen, idle, interacting, sitting, walking, using a mobile phone and using a laptop, are recognized with an 85.41% accuracy. As a result, the proposed framework enables instructor activity recognition without human intervention.Sergio A Velastin has received funding from the Universidad Carlos III de Madrid, the European Union’s Seventh Framework Programme for research, technological development and demonstration under grant agreement no 600371, el Ministerio de Economía, Industria y Competitividad (COFUND2014-51509) el Ministerio de Educación, Cultura y Deporte (CEI-15-17) and Banco Santander

    Human activity learning for assistive robotics using a classifier ensemble

    Get PDF
    Assistive robots in ambient assisted living environments can be equipped with learning capabilities to effectively learn and execute human activities. This paper proposes a human activity learning (HAL) system for application in assistive robotics. An RGB-depth sensor is used to acquire information of human activities, and a set of statistical, spatial and temporal features for encoding key aspects of human activities are extracted from the acquired information of human activities. Redundant features are removed and the relevant features used in the HAL model. An ensemble of three individual classifiers—support vector machines (SVMs), K-nearest neighbour and random forest - is employed to learn the activities. The performance of the proposed system is improved when compared with the performance of other methods using a single classifier. This approach is evaluated on experimental dataset created for this work and also on a benchmark dataset—the Cornell Activity Dataset (CAD-60). Experimental results show the overall performance achieved by the proposed system is comparable to the state of the art and has the potential to benefit applications in assistive robots for reducing the time spent in learning activities

    Human behavioral analysis using evolutionary algorithms and deep learning

    No full text
    Human behavior analysis refers to the use of machine learning techniques and computer vision to recognize and classify human behavior. One may classify human beahvior into gesture, event, action, and activity based on the duration for which the subjects motion is anlyzed. Depending on the region of interest, this can be further classified into facial expression, hand-gesture, or upper/body action. In the last decade, recognizing action in videos has gained a lot of interest in the computer vision research community due to its applications in ambient assisted living, health monitoring, video analytics, sports anlysis, robotics, and automatic video surveillance

    EFFICIENT APPROACHES FOR HUMAN ACTION RECOGNITION USING DEEP LEARNING

    No full text
    The object of this research work is to address some of the issues affecting vision based human action recognition. The subjects gait characteristics, appearance, execution speed, capturing conditions, and modality of the observations affect the spatial and temporal visual information captured in an observation used for action recognition. To address these distortions in subject’s visual information, deep learning approaches are used to learn discriminative features for human action recognition. The thesis begins with an action recognition approach that is unaffected by visual factors due to the use of visual markers for obtaining accurate motion information. The motion characteristics of human actions are considered in computing a new motion capture (MOCAP) action representation, which is in turn used by a stacked autoencoder for action recognition. This representation is computed from the skeletal information in MOCAP observation after it is normalized by the corresponding subject’s t-pose (i.e., reference pose). This work addresses the inconsistency in speed and motion of limbs across observations by considering the subject independent representation of actions and tolerance of stacked autoencoder to noise/distortions in input representation. As obtaining accurate motion information by tracking visual markers may not be feasible in many scenarios, the next approach considers motion information captured by depth camera to recognize fall action. A new temporal template capturing subject’s pose over a given period of time is us (CNN) for the detection of fall action. The illumination invariance of depth information and the ability of the CNN to learn the local patterns associated with each action, minimize the impact of subjects gait characteristics and execution speed on the overall performance. Since the existing and the earlier temporal templates cannot assign higher significance to motion in the beginning and middle frames of the observation, we propose new motion history images emphasizing motion in these temporal regions. The convolutional neural network (ConvNet) features extracted from these motion history images computed from depth and RGB video streams are used to recognize the human actions. By considering multi-modal features, the illumination invariance of depth and the precise subject’s pose information from RGB video are utilized for action recognition. Finally, evidence across classifiers using different temporal templates are combined for efficient recognition of human actions irrespective of the location of their key poses in the temporal regions. Even though the earlier approaches are real-time and have high performance, the sensitivity of temporal templates to angle-of-view limits its application to observations captured in an unconstrained environment. So, we propose a view-independent approach to action recognition for videos captured by a regular digital camera using convolutional neural network. The action bank representation of videos containing similar local patterns for videos of the same actions are given as input to CNN to learn the linear patterns associated with each class for action recognition. Since the initial weights of a CNN affects its performance after training with a back propagation algorithm (BPA), we combine evidences across multiple CNN classifiers to minimize the impact of the solution being stuck in a local minimum. We consider the outputs of the binary coded classifier as the evidence value associated with the prediction, thereby assigning high confidence (≈ 1) to accurate predictions and low confidence (≈ 0) to incorrect predictions. As a result, combining evidences across classifiers leads to selecting predictions with high confidence, thereby resulting in the improvement of the overall performance. iv Since the effectiveness of the above technique depends on the complementary information obtained from the implicit diversity of the CNN classifiers, we propose an approach to initialize the weights using genetic algorithms (GA) that can optimize the convolutional neural network after training with back-propagation algorithm. The convolution masks in CNN architecture are considered as the GA-string, whose fitness is computed as the accuracy of CNN classifier after training with back-propagation algorithm for a fixed number of epochs. As a result, the CNN training algorithm which combines the global and local search capabilities of GA and back-propagation algorithm, respectively, is proposed to identify the initial weights to achieve better performance. A near ideal performance is achieved when evidence across the classifiers (of candidate solutions) is combined using fusion rules for action recognition, due to the high mean and low standard deviation of the CNN classifiers in comparison to random weight initialization. In summary, this thesis proposes new methods to human action recognition by using domain specific action representation as input to deep learning models for action detection. A MOCAP action representation generated from the characteristics of recognized actions is used by a stacked autoencoder to recognize human actions. The new temporal template of depth video, capturing subject’s pose over a given time period is used to detect fall event and recognize human actions by the CNN from the local patterns associated with each action. The convolutional neural network (ConvNet) features extracted from the RGB and depth temporal templates emphasizing motion in the beginning and middle frames of video observations are used for human action recognition. Finally, a view-independent action recognition model using action bank features is optimized by a) increasing the complementary information across multiple CNN classifiers through unique weight initialization and b) combining the global and local search capabilities of GA and back-propagation algorithm, respectively, to identify the initial weights in order to achieve better performanc

    Classification of human actions using pose-based features and stacked auto encoder

    No full text
    In this paper, we propose a method for classification of human actions using pose based features. We demonstrate that statistical information of key movements of actions can be utilized in designing an efficient input representation, using fuzzy membership functions. The ability of stacked auto encoder to learn the underlying features of input data is exploited to recognize human actions. The efficacy of the proposed approach is demonstrated on CMU MOCAP and Berkeley MHAD datasets

    Human action recognition using genetic algorithms and convolutional neural networks

    No full text
    In this paper, an approach for human action recognition using genetic algorithms (GA) and deep convolutional neural networks (CNN) is proposed. We demonstrate that initializing the weights of a convolutional neural network (CNN) classifier based on solutions generated by genetic algorithms (GA) minimizes the classification error. A gradient descent algorithm is used to train the CNN classifiers (to find a local minimum) during fitness evaluations of GA chromosomes. The global search capabilities of genetic algorithms and the local search ability of gradient descent algorithm are exploited to find a solution that is closer to global-optimum. We show that combining the evidences of classifiers generated using genetic algorithms helps to improve the performance. We demonstrate the efficacy of the proposed classification system for human action recognition on UCF50 dataset

    Human Action Recognition Based on Recognition of Linear Patterns in Action Bank Features Using Convolutional Neural Networks

    No full text
    In this paper, we proposed a deep convolutional network architecture for recognizing human actions in videos using action bank features. Action bank features computed against of a predefined set of videos known as an action bank, contain linear patterns representing the similarity of the video against the action bank videos. Due to the independence of the patterns across action bank features, a convolutional neural network with linear masks is considered to capture the local patterns associated with each action. The knowledge gained through training is used to assign an action label to videos during testing. Experiments conducted on UCF50 dataset demonstrates the effectiveness of the proposed approach in capturing and recognizing these linear local patterns

    One-shot periodic activity recognition using convolutional neural networks

    No full text
    Activities capture vital facts for the semantic analysis of human behavior. In this paper, we propose a method for recognizing human activities based on periodic actions from a single instance using convolutional neural networks (CNN). The height of the foot above the ground is considered as features to discriminate human locomotion activities. The periodic nature of actions in these activities is exploited to generate the training cases from a single instance using a sliding window. Also, the capability of a convolutional neural network to learn local visual patterns is exploited for human activity recognition. Experiments on Carnegie Mellon University (CMU) Mocap dataset demonstrate the effectiveness of the proposed approach
    corecore