42,363 research outputs found
Deep Multimodal Feature Analysis for Action Recognition in RGB+D Videos
Single modality action recognition on RGB or depth sequences has been
extensively explored recently. It is generally accepted that each of these two
modalities has different strengths and limitations for the task of action
recognition. Therefore, analysis of the RGB+D videos can help us to better
study the complementary properties of these two types of modalities and achieve
higher levels of performance. In this paper, we propose a new deep autoencoder
based shared-specific feature factorization network to separate input
multimodal signals into a hierarchy of components. Further, based on the
structure of the features, a structured sparsity learning machine is proposed
which utilizes mixed norms to apply regularization within components and group
selection between them for better classification performance. Our experimental
results show the effectiveness of our cross-modality feature analysis framework
by achieving state-of-the-art accuracy for action classification on five
challenging benchmark datasets
Biologically inspired model simulating visual pathways and cerebellum function in human - Achieving visuomotor coordination and high precision movement with learning ability
In recent years, the interdisciplinary research between information science
and neuroscience has been a hotspot. In this paper, based on recent biological
findings, we proposed a new model to mimic visual information processing, motor
planning and control in central and peripheral nervous systems of human. Main
steps of the model are as follows: 1) Simulating "where" pathway in human: the
Selective Search method is applied to simulate the function of human dorsal
visual pathway to localize object candidates; 2) Simulating "what" pathway in
human: a Convolutional Deep Belief Network is applied to simulate the
hierarchical structure and function of human ventral visual pathway for object
recognition; 3) Simulating motor planning process in human: habitual motion
planning process in human is simulated, and motor commands are generated from
the combination of control signals from past experiences; 4) Simulating precise
movement control in human: calibrated control signals, which mimic the
adjustment for movement from cerebellum in human, are generated and updated
from calibration of movement errors in past experiences, and sent to the
movement model to achieve high precision. The proposed framework mimics
structures and functions of human recognition, visuomotor coordination and
precise motor control. Experiments on object localization, recognition and
movement control demonstrate that the new proposed model can not only
accomplish visuomotor coordination tasks, but also achieve high precision
movement with learning ability. Meanwhile, the results also prove the validity
of the introduced mechanisms. Furthermore, the proposed model could be
generalized and applied to other systems, such as mechanical and electrical
systems in robotics, to achieve fast response, high precision movement with
learning ability.Comment: 12 pages, 13 figure
A Survey on Content-Aware Video Analysis for Sports
Sports data analysis is becoming increasingly large-scale, diversified, and
shared, but difficulty persists in rapidly accessing the most crucial
information. Previous surveys have focused on the methodologies of sports video
analysis from the spatiotemporal viewpoint instead of a content-based
viewpoint, and few of these studies have considered semantics. This study
develops a deeper interpretation of content-aware sports video analysis by
examining the insight offered by research into the structure of content under
different scenarios. On the basis of this insight, we provide an overview of
the themes particularly relevant to the research on content-aware systems for
broadcast sports. Specifically, we focus on the video content analysis
techniques applied in sportscasts over the past decade from the perspectives of
fundamentals and general review, a content hierarchical model, and trends and
challenges. Content-aware analysis methods are discussed with respect to
object-, event-, and context-oriented groups. In each group, the gap between
sensation and content excitement must be bridged using proper strategies. In
this regard, a content-aware approach is required to determine user demands.
Finally, the paper summarizes the future trends and challenges for sports video
analysis. We believe that our findings can advance the field of research on
content-aware video analysis for broadcast sports.Comment: Accepted for publication in IEEE Transactions on Circuits and Systems
for Video Technology (TCSVT
A Probabilistic Semi-Supervised Approach to Multi-Task Human Activity Modeling
Human behavior is a continuous stochastic spatio-temporal process which is
governed by semantic actions and affordances as well as latent factors.
Therefore, video-based human activity modeling is concerned with a number of
tasks such as inferring current and future semantic labels, predicting future
continuous observations as well as imagining possible future label and feature
sequences. In this paper we present a semi-supervised probabilistic deep latent
variable model that can represent both discrete labels and continuous
observations as well as latent dynamics over time. This allows the model to
solve several tasks at once without explicit fine-tuning. We focus here on the
tasks of action classification, detection, prediction and anticipation as well
as motion prediction and synthesis based on 3D human activity data recorded
with Kinect. We further extend the model to capture hierarchical label
structure and to model the dependencies between multiple entities, such as a
human and objects. Our experiments demonstrate that our principled approach to
human activity modeling can be used to detect current and anticipate future
semantic labels and to predict and synthesize future label and feature
sequences. When comparing our model to state-of-the-art approaches, which are
specifically designed for e.g. action classification, we find that our
probabilistic formulation outperforms or is comparable to these task specific
models
Energy-based Models for Video Anomaly Detection
Automated detection of abnormalities in data has been studied in research
area in recent years because of its diverse applications in practice including
video surveillance, industrial damage detection and network intrusion
detection. However, building an effective anomaly detection system is a
non-trivial task since it requires to tackle challenging issues of the shortage
of annotated data, inability of defining anomaly objects explicitly and the
expensive cost of feature engineering procedure. Unlike existing appoaches
which only partially solve these problems, we develop a unique framework to
cope the problems above simultaneously. Instead of hanlding with ambiguous
definition of anomaly objects, we propose to work with regular patterns whose
unlabeled data is abundant and usually easy to collect in practice. This allows
our system to be trained completely in an unsupervised procedure and liberate
us from the need for costly data annotation. By learning generative model that
capture the normality distribution in data, we can isolate abnormal data points
that result in low normality scores (high abnormality scores). Moreover, by
leverage on the power of generative networks, i.e. energy-based models, we are
also able to learn the feature representation automatically rather than
replying on hand-crafted features that have been dominating anomaly detection
research over many decades. We demonstrate our proposal on the specific
application of video anomaly detection and the experimental results indicate
that our method performs better than baselines and are comparable with
state-of-the-art methods in many benchmark video anomaly detection datasets
Thin-Slicing Network: A Deep Structured Model for Pose Estimation in Videos
Deep ConvNets have been shown to be effective for the task of human pose
estimation from single images. However, several challenging issues arise in the
video-based case such as self-occlusion, motion blur, and uncommon poses with
few or no examples in training data sets. Temporal information can provide
additional cues about the location of body joints and help to alleviate these
issues. In this paper, we propose a deep structured model to estimate a
sequence of human poses in unconstrained videos. This model can be efficiently
trained in an end-to-end manner and is capable of representing appearance of
body joints and their spatio-temporal relationships simultaneously. Domain
knowledge about the human body is explicitly incorporated into the network
providing effective priors to regularize the skeletal structure and to enforce
temporal consistency. The proposed end-to-end architecture is evaluated on two
widely used benchmarks (Penn Action dataset and JHMDB dataset) for video-based
pose estimation. Our approach significantly outperforms the existing
state-of-the-art methods.Comment: Preliminary version to appear in CVPR201
Deep Temporal Sigmoid Belief Networks for Sequence Modeling
Deep dynamic generative models are developed to learn sequential dependencies
in time-series data. The multi-layered model is designed by constructing a
hierarchy of temporal sigmoid belief networks (TSBNs), defined as a sequential
stack of sigmoid belief networks (SBNs). Each SBN has a contextual hidden
state, inherited from the previous SBNs in the sequence, and is used to
regulate its hidden bias. Scalable learning and inference algorithms are
derived by introducing a recognition model that yields fast sampling from the
variational posterior. This recognition model is trained jointly with the
generative model, by maximizing its variational lower bound on the
log-likelihood. Experimental results on bouncing balls, polyphonic music,
motion capture, and text streams show that the proposed approach achieves
state-of-the-art predictive performance, and has the capacity to synthesize
various sequences.Comment: to appear in NIPS 201
Leveraging Machine Learning and Big Data for Smart Buildings: A Comprehensive Survey
Future buildings will offer new convenience, comfort, and efficiency
possibilities to their residents. Changes will occur to the way people live as
technology involves into people's lives and information processing is fully
integrated into their daily living activities and objects. The future
expectation of smart buildings includes making the residents' experience as
easy and comfortable as possible. The massive streaming data generated and
captured by smart building appliances and devices contains valuable information
that needs to be mined to facilitate timely actions and better decision making.
Machine learning and big data analytics will undoubtedly play a critical role
to enable the delivery of such smart services. In this paper, we survey the
area of smart building with a special focus on the role of techniques from
machine learning and big data analytics. This survey also reviews the current
trends and challenges faced in the development of smart building services
Efficient CNN Implementation for Eye-Gaze Estimation on Low-Power/Low-Quality Consumer Imaging Systems
Accurate and efficient eye gaze estimation is important for emerging consumer
electronic systems such as driver monitoring systems and novel user interfaces.
Such systems are required to operate reliably in difficult, unconstrained
environments with low power consumption and at minimal cost. In this paper a
new hardware friendly, convolutional neural network model with minimal
computational requirements is introduced and assessed for efficient
appearance-based gaze estimation. The model is tested and compared against
existing appearance based CNN approaches, achieving better eye gaze accuracy
with significantly fewer computational requirements. A brief updated literature
review is also provided
Distributed Machine Learning in Materials that Couple Sensing, Actuation, Computation and Communication
This paper reviews machine learning applications and approaches to detection,
classification and control of intelligent materials and structures with
embedded distributed computation elements. The purpose of this survey is to
identify desired tasks to be performed in each type of material or structure
(e.g., damage detection in composites), identify and compare common approaches
to learning such tasks, and investigate models and training paradigms used.
Machine learning approaches and common temporal features used in the domains of
structural health monitoring, morphable aircraft, wearable computing and
robotic skins are explored. As the ultimate goal of this research is to
incorporate the approaches described in this survey into a robotic material
paradigm, the potential for adapting the computational models used in these
applications, and corresponding training algorithms, to an amorphous network of
computing nodes is considered. Distributed versions of support vector machines,
graphical models and mixture models developed in the field of wireless sensor
networks are reviewed. Potential areas of investigation, including possible
architectures for incorporating machine learning into robotic nodes, training
approaches, and the possibility of using deep learning approaches for automatic
feature extraction, are discussed
- …