61 research outputs found
View and Illumination Invariant Object Classification Based on 3D Color Histogram Using Convolutional Neural Networks
Object classification is an important step in visual recognition and semantic analysis of visual content. In this paper, we propose a method for classification of objects that is invariant to illumination color, illumination direction and viewpoint based on 3D color histogram. A 3D color histogram of an image is represented as a 2D image, to capture the color composition while preserving the neighborhood information of color bins, to realize the necessary visual cues for classification of objects. Also, the ability of convolutional neural network (CNN) to learn invariant visual patterns is exploited for object classification. The efficacy of the proposed method is demonstrated on Amsterdam Library of Object Images (ALOI) dataset captured under various illumination conditions and angles-of-view
Computer Vision-based Accident Detection in Traffic Surveillance
Computer vision-based accident detection through video surveillance has
become a beneficial but daunting task. In this paper, a neoteric framework for
detection of road accidents is proposed. The proposed framework capitalizes on
Mask R-CNN for accurate object detection followed by an efficient centroid
based object tracking algorithm for surveillance footage. The probability of an
accident is determined based on speed and trajectory anomalies in a vehicle
after an overlap with other vehicles. The proposed framework provides a robust
method to achieve a high Detection Rate and a low False Alarm Rate on general
road-traffic CCTV surveillance footage. This framework was evaluated on diverse
conditions such as broad daylight, low visibility, rain, hail, and snow using
the proposed dataset. This framework was found effective and paves the way to
the development of general-purpose vehicular accident detection algorithms in
real-time.Comment: Accepted in 10th ICCCNT 201
Bag of Deep Features for Instructor Activity Recognition in Lecture Room
This paper has been presented at : 25th International Conference on MultiMedia Modeling (MMM2019)This research aims to explore contextual visual information in the lecture room, to assist an instructor to articulate the effectiveness of the delivered lecture. The objective is to enable a self-evaluation mechanism for the instructor to improve lecture productivity by understanding their activities. Teacher’s effectiveness has a remarkable impact on uplifting students performance to make them succeed academically and professionally. Therefore, the process of lecture evaluation can significantly contribute to improve academic quality and governance. In this paper, we propose a vision-based framework to recognize the activities of the instructor for self-evaluation of the delivered lectures. The proposed approach uses motion templates of instructor activities and describes them through a Bag-of-Deep features (BoDF) representation. Deep spatio-temporal features extracted from motion templates are utilized to compile a visual vocabulary. The visual vocabulary for instructor activity recognition is quantized to optimize the learning model. A Support Vector Machine classifier is used to generate the model and predict the instructor activities. We evaluated the proposed scheme on a self-captured lecture room dataset, IAVID-1. Eight instructor activities: pointing towards the student, pointing towards board or screen, idle, interacting, sitting, walking, using a mobile phone and using a laptop, are recognized with an 85.41% accuracy. As a result, the proposed framework enables instructor activity recognition without human intervention.Sergio A Velastin has received funding from the Universidad Carlos III de Madrid, the European Union’s Seventh Framework Programme for research, technological development and demonstration under grant agreement no 600371, el Ministerio de Economía, Industria y Competitividad (COFUND2014-51509) el Ministerio de Educación, Cultura y Deporte (CEI-15-17) and Banco Santander
Human activity learning for assistive robotics using a classifier ensemble
Assistive robots in ambient assisted living environments can be equipped with learning capabilities to effectively learn and execute human activities. This paper proposes a human activity learning (HAL) system for application in assistive robotics. An RGB-depth sensor is used to acquire information of human activities, and a set of statistical, spatial and temporal features for encoding key aspects of human activities are extracted from the acquired information of human activities. Redundant features are removed and the relevant features used in the HAL model. An ensemble of three individual classifiers—support vector machines (SVMs), K-nearest neighbour and random forest - is employed to learn the activities. The performance of the proposed system is improved when compared with the performance of other methods using a single classifier. This approach is evaluated on experimental dataset created for this work and also on a benchmark dataset—the Cornell Activity Dataset (CAD-60). Experimental results show the overall performance achieved by the proposed system is comparable to the state of the art and has the potential to benefit applications in assistive robots for reducing the time spent in learning activities
Human behavioral analysis using evolutionary algorithms and deep learning
Human behavior analysis refers to the use of machine learning techniques and computer vision to recognize and classify human behavior.
One may classify human beahvior into gesture, event, action, and activity based on the duration for which the subjects motion is anlyzed.
Depending on the region of interest, this can be further classified into facial expression, hand-gesture, or upper/body action.
In the last decade, recognizing action in videos has gained a lot of interest in the computer vision research community due to its
applications in ambient assisted living, health monitoring, video analytics, sports anlysis, robotics, and automatic video surveillance
EFFICIENT APPROACHES FOR HUMAN ACTION RECOGNITION USING DEEP LEARNING
The object of this research work is to address some of the issues affecting vision
based human action recognition. The subjects gait characteristics, appearance, execution speed, capturing conditions, and modality of the observations affect the spatial
and temporal visual information captured in an observation used for action recognition.
To address these distortions in subject’s visual information, deep learning approaches
are used to learn discriminative features for human action recognition.
The thesis begins with an action recognition approach that is unaffected by visual
factors due to the use of visual markers for obtaining accurate motion information.
The motion characteristics of human actions are considered in computing a new motion capture (MOCAP) action representation, which is in turn used by a stacked
autoencoder for action recognition. This representation is computed from the skeletal
information in MOCAP observation after it is normalized by the corresponding subject’s t-pose (i.e., reference pose). This work addresses the inconsistency in speed and
motion of limbs across observations by considering the subject independent representation of actions and tolerance of stacked autoencoder to noise/distortions in input
representation.
As obtaining accurate motion information by tracking visual markers may not be
feasible in many scenarios, the next approach considers motion information captured
by depth camera to recognize fall action. A new temporal template capturing subject’s
pose over a given period of time is us
(CNN) for the detection of fall action. The illumination invariance of depth information and the ability of the CNN to learn the local patterns associated with each action,
minimize the impact of subjects gait characteristics and execution speed on the overall
performance.
Since the existing and the earlier temporal templates cannot assign higher significance to motion in the beginning and middle frames of the observation, we propose new
motion history images emphasizing motion in these temporal regions. The convolutional neural network (ConvNet) features extracted from these motion history images
computed from depth and RGB video streams are used to recognize the human actions.
By considering multi-modal features, the illumination invariance of depth and the precise subject’s pose information from RGB video are utilized for action recognition.
Finally, evidence across classifiers using different temporal templates are combined for
efficient recognition of human actions irrespective of the location of their key poses in
the temporal regions.
Even though the earlier approaches are real-time and have high performance, the
sensitivity of temporal templates to angle-of-view limits its application to observations captured in an unconstrained environment. So, we propose a view-independent
approach to action recognition for videos captured by a regular digital camera using
convolutional neural network. The action bank representation of videos containing
similar local patterns for videos of the same actions are given as input to CNN to
learn the linear patterns associated with each class for action recognition. Since the
initial weights of a CNN affects its performance after training with a back propagation
algorithm (BPA), we combine evidences across multiple CNN classifiers to minimize
the impact of the solution being stuck in a local minimum. We consider the outputs
of the binary coded classifier as the evidence value associated with the prediction,
thereby assigning high confidence (≈ 1) to accurate predictions and low confidence (≈
0) to incorrect predictions. As a result, combining evidences across classifiers leads to
selecting predictions with high confidence, thereby resulting in the improvement of the
overall performance.
iv
Since the effectiveness of the above technique depends on the complementary information obtained from the implicit diversity of the CNN classifiers, we propose an
approach to initialize the weights using genetic algorithms (GA) that can optimize
the convolutional neural network after training with back-propagation algorithm. The
convolution masks in CNN architecture are considered as the GA-string, whose fitness
is computed as the accuracy of CNN classifier after training with back-propagation
algorithm for a fixed number of epochs. As a result, the CNN training algorithm
which combines the global and local search capabilities of GA and back-propagation
algorithm, respectively, is proposed to identify the initial weights to achieve better
performance. A near ideal performance is achieved when evidence across the classifiers (of candidate solutions) is combined using fusion rules for action recognition, due
to the high mean and low standard deviation of the CNN classifiers in comparison to
random weight initialization.
In summary, this thesis proposes new methods to human action recognition by using domain specific action representation as input to deep learning models for action
detection. A MOCAP action representation generated from the characteristics of recognized actions is used by a stacked autoencoder to recognize human actions. The new
temporal template of depth video, capturing subject’s pose over a given time period
is used to detect fall event and recognize human actions by the CNN from the local
patterns associated with each action. The convolutional neural network (ConvNet)
features extracted from the RGB and depth temporal templates emphasizing motion
in the beginning and middle frames of video observations are used for human action
recognition. Finally, a view-independent action recognition model using action bank
features is optimized by a) increasing the complementary information across multiple CNN classifiers through unique weight initialization and b) combining the global
and local search capabilities of GA and back-propagation algorithm, respectively, to
identify the initial weights in order to achieve better performanc
Classification of human actions using pose-based features and stacked auto encoder
In this paper, we propose a method for classification of human actions using pose based features. We demonstrate that statistical information of key movements of actions can be utilized in designing an efficient input representation, using fuzzy membership functions. The ability of stacked auto encoder to learn the underlying features of input data is exploited to recognize human actions. The efficacy of the proposed approach is demonstrated on CMU MOCAP and Berkeley MHAD datasets
Human action recognition using genetic algorithms and convolutional neural networks
In this paper, an approach for human action recognition using genetic algorithms (GA) and deep convolutional neural networks (CNN) is proposed. We demonstrate that initializing the weights of a convolutional neural network (CNN) classifier based on solutions generated by genetic algorithms (GA) minimizes the classification error. A gradient descent algorithm is used to train the CNN classifiers (to find a local minimum) during fitness evaluations of GA chromosomes. The global search capabilities of genetic algorithms and the local search ability of gradient descent algorithm are exploited to find a solution that is closer to global-optimum. We show that combining the evidences of classifiers generated using genetic algorithms helps to improve the performance. We demonstrate the efficacy of the proposed classification system for human action recognition on UCF50 dataset
Human Action Recognition Based on Recognition of Linear Patterns in Action Bank Features Using Convolutional Neural Networks
In this paper, we proposed a deep convolutional network architecture for recognizing human actions in videos using action bank features. Action bank features computed against of a predefined set of videos known as an action bank, contain linear patterns representing the similarity of the video against the action bank videos. Due to the independence of the patterns across action bank features, a convolutional neural network with linear masks is considered to capture the local patterns associated with each action. The knowledge gained through training is used to assign an action label to videos during testing. Experiments conducted on UCF50 dataset demonstrates the effectiveness of the proposed approach in capturing and recognizing these linear local patterns
One-shot periodic activity recognition using convolutional neural networks
Activities capture vital facts for the semantic analysis of human behavior. In this paper, we propose a method for recognizing human activities based on periodic actions from a single instance using convolutional neural networks (CNN). The height of the foot above the ground is considered as features to discriminate human locomotion activities. The periodic nature of actions in these activities is exploited to generate the training cases from a single instance using a sliding window. Also, the capability of a convolutional neural network to learn local visual patterns is exploited for human activity recognition. Experiments on Carnegie Mellon University (CMU) Mocap dataset demonstrate the effectiveness of the proposed approach
- …