Search CORE

42,363 research outputs found

Deep Multimodal Feature Analysis for Action Recognition in RGB+D Videos

Author: Gong Yihong
Ng Tian-Tsong
Shahroudy Amir
Wang Gang
Publication venue
Publication date: 26/12/2016
Field of study

Single modality action recognition on RGB or depth sequences has been extensively explored recently. It is generally accepted that each of these two modalities has different strengths and limitations for the task of action recognition. Therefore, analysis of the RGB+D videos can help us to better study the complementary properties of these two types of modalities and achieve higher levels of performance. In this paper, we propose a new deep autoencoder based shared-specific feature factorization network to separate input multimodal signals into a hierarchy of components. Further, based on the structure of the features, a structured sparsity learning machine is proposed which utilizes mixed norms to apply regularization within components and group selection between them for better classification performance. Our experimental results show the effectiveness of our cross-modality feature analysis framework by achieving state-of-the-art accuracy for action classification on five challenging benchmark datasets

arXiv.org e-Print Archive

Biologically inspired model simulating visual pathways and cerebellum function in human - Achieving visuomotor coordination and high precision movement with learning ability

Author: Chen Jiahao
Li Yinlin
Qiao Hong
Wu Wei
Yin Peijie
Publication venue
Publication date: 07/03/2016
Field of study

In recent years, the interdisciplinary research between information science and neuroscience has been a hotspot. In this paper, based on recent biological findings, we proposed a new model to mimic visual information processing, motor planning and control in central and peripheral nervous systems of human. Main steps of the model are as follows: 1) Simulating "where" pathway in human: the Selective Search method is applied to simulate the function of human dorsal visual pathway to localize object candidates; 2) Simulating "what" pathway in human: a Convolutional Deep Belief Network is applied to simulate the hierarchical structure and function of human ventral visual pathway for object recognition; 3) Simulating motor planning process in human: habitual motion planning process in human is simulated, and motor commands are generated from the combination of control signals from past experiences; 4) Simulating precise movement control in human: calibrated control signals, which mimic the adjustment for movement from cerebellum in human, are generated and updated from calibration of movement errors in past experiences, and sent to the movement model to achieve high precision. The proposed framework mimics structures and functions of human recognition, visuomotor coordination and precise motor control. Experiments on object localization, recognition and movement control demonstrate that the new proposed model can not only accomplish visuomotor coordination tasks, but also achieve high precision movement with learning ability. Meanwhile, the results also prove the validity of the introduced mechanisms. Furthermore, the proposed model could be generalized and applied to other systems, such as mechanical and electrical systems in robotics, to achieve fast response, high precision movement with learning ability.Comment: 12 pages, 13 figure

arXiv.org e-Print Archive

A Survey on Content-Aware Video Analysis for Sports

Author: Shih Huang-Chia
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 03/03/2017
Field of study

Sports data analysis is becoming increasingly large-scale, diversified, and shared, but difficulty persists in rapidly accessing the most crucial information. Previous surveys have focused on the methodologies of sports video analysis from the spatiotemporal viewpoint instead of a content-based viewpoint, and few of these studies have considered semantics. This study develops a deeper interpretation of content-aware sports video analysis by examining the insight offered by research into the structure of content under different scenarios. On the basis of this insight, we provide an overview of the themes particularly relevant to the research on content-aware systems for broadcast sports. Specifically, we focus on the video content analysis techniques applied in sportscasts over the past decade from the perspectives of fundamentals and general review, a content hierarchical model, and trends and challenges. Content-aware analysis methods are discussed with respect to object-, event-, and context-oriented groups. In each group, the gap between sensation and content excitement must be bridged using proper strategies. In this regard, a content-aware approach is required to determine user demands. Finally, the paper summarizes the future trends and challenges for sports video analysis. We believe that our findings can advance the field of research on content-aware video analysis for broadcast sports.Comment: Accepted for publication in IEEE Transactions on Circuits and Systems for Video Technology (TCSVT

arXiv.org e-Print Archive

A Probabilistic Semi-Supervised Approach to Multi-Task Human Activity Modeling

Author: Bütepage Judith
Kjellström Hedvig
Kragic Danica
Publication venue
Publication date: 14/03/2019
Field of study

Human behavior is a continuous stochastic spatio-temporal process which is governed by semantic actions and affordances as well as latent factors. Therefore, video-based human activity modeling is concerned with a number of tasks such as inferring current and future semantic labels, predicting future continuous observations as well as imagining possible future label and feature sequences. In this paper we present a semi-supervised probabilistic deep latent variable model that can represent both discrete labels and continuous observations as well as latent dynamics over time. This allows the model to solve several tasks at once without explicit fine-tuning. We focus here on the tasks of action classification, detection, prediction and anticipation as well as motion prediction and synthesis based on 3D human activity data recorded with Kinect. We further extend the model to capture hierarchical label structure and to model the dependencies between multiple entities, such as a human and objects. Our experiments demonstrate that our principled approach to human activity modeling can be used to detect current and anticipate future semantic labels and to predict and synthesize future label and feature sequences. When comparing our model to state-of-the-art approaches, which are specifically designed for e.g. action classification, we find that our probabilistic formulation outperforms or is comparable to these task specific models

arXiv.org e-Print Archive

Energy-based Models for Video Anomaly Detection

Author: Nguyen Tu Dinh
Phung Dinh
Trevors Anthony
Venkatesh Svetha
Vu Hung
Publication venue
Publication date: 17/08/2017
Field of study

Automated detection of abnormalities in data has been studied in research area in recent years because of its diverse applications in practice including video surveillance, industrial damage detection and network intrusion detection. However, building an effective anomaly detection system is a non-trivial task since it requires to tackle challenging issues of the shortage of annotated data, inability of defining anomaly objects explicitly and the expensive cost of feature engineering procedure. Unlike existing appoaches which only partially solve these problems, we develop a unique framework to cope the problems above simultaneously. Instead of hanlding with ambiguous definition of anomaly objects, we propose to work with regular patterns whose unlabeled data is abundant and usually easy to collect in practice. This allows our system to be trained completely in an unsupervised procedure and liberate us from the need for costly data annotation. By learning generative model that capture the normality distribution in data, we can isolate abnormal data points that result in low normality scores (high abnormality scores). Moreover, by leverage on the power of generative networks, i.e. energy-based models, we are also able to learn the feature representation automatically rather than replying on hand-crafted features that have been dominating anomaly detection research over many decades. We demonstrate our proposal on the specific application of video anomaly detection and the experimental results indicate that our method performs better than baselines and are comparable with state-of-the-art methods in many benchmark video anomaly detection datasets

arXiv.org e-Print Archive

Thin-Slicing Network: A Deep Structured Model for Pose Estimation in Videos

Author: Hilliges Otmar
Song Jie
Van Gool Luc
Wang Limin
Publication venue
Publication date: 31/03/2017
Field of study

Deep ConvNets have been shown to be effective for the task of human pose estimation from single images. However, several challenging issues arise in the video-based case such as self-occlusion, motion blur, and uncommon poses with few or no examples in training data sets. Temporal information can provide additional cues about the location of body joints and help to alleviate these issues. In this paper, we propose a deep structured model to estimate a sequence of human poses in unconstrained videos. This model can be efficiently trained in an end-to-end manner and is capable of representing appearance of body joints and their spatio-temporal relationships simultaneously. Domain knowledge about the human body is explicitly incorporated into the network providing effective priors to regularize the skeletal structure and to enforce temporal consistency. The proposed end-to-end architecture is evaluated on two widely used benchmarks (Penn Action dataset and JHMDB dataset) for video-based pose estimation. Our approach significantly outperforms the existing state-of-the-art methods.Comment: Preliminary version to appear in CVPR201

arXiv.org e-Print Archive

Deep Temporal Sigmoid Belief Networks for Sequence Modeling

Author: Carin Lawrence
Carlson David
Gan Zhe
Henao Ricardo
Li Chunyuan
Publication venue
Publication date: 23/09/2015
Field of study

Deep dynamic generative models are developed to learn sequential dependencies in time-series data. The multi-layered model is designed by constructing a hierarchy of temporal sigmoid belief networks (TSBNs), defined as a sequential stack of sigmoid belief networks (SBNs). Each SBN has a contextual hidden state, inherited from the previous SBNs in the sequence, and is used to regulate its hidden bias. Scalable learning and inference algorithms are derived by introducing a recognition model that yields fast sampling from the variational posterior. This recognition model is trained jointly with the generative model, by maximizing its variational lower bound on the log-likelihood. Experimental results on bouncing balls, polyphonic music, motion capture, and text streams show that the proposed approach achieves state-of-the-art predictive performance, and has the capacity to synthesize various sequences.Comment: to appear in NIPS 201

arXiv.org e-Print Archive

Leveraging Machine Learning and Big Data for Smart Buildings: A Comprehensive Survey

Author: Al-Fuqaha Ala
Alwajidi Safaa
Benhaddou Driss
Fong Alvis C.
Gupta Ajay
Qadir Junaid
Qolomany Basheer
Publication venue
Publication date: 19/05/2019
Field of study

Future buildings will offer new convenience, comfort, and efficiency possibilities to their residents. Changes will occur to the way people live as technology involves into people's lives and information processing is fully integrated into their daily living activities and objects. The future expectation of smart buildings includes making the residents' experience as easy and comfortable as possible. The massive streaming data generated and captured by smart building appliances and devices contains valuable information that needs to be mined to facilitate timely actions and better decision making. Machine learning and big data analytics will undoubtedly play a critical role to enable the delivery of such smart services. In this paper, we survey the area of smart building with a special focus on the role of techniques from machine learning and big data analytics. This survey also reviews the current trends and challenges faced in the development of smart building services

arXiv.org e-Print Archive

Efficient CNN Implementation for Eye-Gaze Estimation on Low-Power/Low-Quality Consumer Imaging Systems

Author: Corcoran Peter
Drimbarean Alexandru
Kar Anuradha
Lemley Joseph
Publication venue
Publication date: 28/06/2018
Field of study

Accurate and efficient eye gaze estimation is important for emerging consumer electronic systems such as driver monitoring systems and novel user interfaces. Such systems are required to operate reliably in difficult, unconstrained environments with low power consumption and at minimal cost. In this paper a new hardware friendly, convolutional neural network model with minimal computational requirements is introduced and assessed for efficient appearance-based gaze estimation. The model is tested and compared against existing appearance based CNN approaches, achieving better eye gaze accuracy with significantly fewer computational requirements. A brief updated literature review is also provided

arXiv.org e-Print Archive

Distributed Machine Learning in Materials that Couple Sensing, Actuation, Computation and Communication

Author: Correll Nikolaus
Hughes Dana
Publication venue
Publication date: 10/06/2016
Field of study

This paper reviews machine learning applications and approaches to detection, classification and control of intelligent materials and structures with embedded distributed computation elements. The purpose of this survey is to identify desired tasks to be performed in each type of material or structure (e.g., damage detection in composites), identify and compare common approaches to learning such tasks, and investigate models and training paradigms used. Machine learning approaches and common temporal features used in the domains of structural health monitoring, morphable aircraft, wearable computing and robotic skins are explored. As the ultimate goal of this research is to incorporate the approaches described in this survey into a robotic material paradigm, the potential for adapting the computational models used in these applications, and corresponding training algorithms, to an amorphous network of computing nodes is considered. Distributed versions of support vector machines, graphical models and mixture models developed in the field of wireless sensor networks are reviewed. Potential areas of investigation, including possible architectures for incorporating machine learning into robotic nodes, training approaches, and the possibility of using deep learning approaches for automatic feature extraction, are discussed

arXiv.org e-Print Archive