42,363 research outputs found

    Deep Multimodal Feature Analysis for Action Recognition in RGB+D Videos

    Full text link
    Single modality action recognition on RGB or depth sequences has been extensively explored recently. It is generally accepted that each of these two modalities has different strengths and limitations for the task of action recognition. Therefore, analysis of the RGB+D videos can help us to better study the complementary properties of these two types of modalities and achieve higher levels of performance. In this paper, we propose a new deep autoencoder based shared-specific feature factorization network to separate input multimodal signals into a hierarchy of components. Further, based on the structure of the features, a structured sparsity learning machine is proposed which utilizes mixed norms to apply regularization within components and group selection between them for better classification performance. Our experimental results show the effectiveness of our cross-modality feature analysis framework by achieving state-of-the-art accuracy for action classification on five challenging benchmark datasets

    Biologically inspired model simulating visual pathways and cerebellum function in human - Achieving visuomotor coordination and high precision movement with learning ability

    Full text link
    In recent years, the interdisciplinary research between information science and neuroscience has been a hotspot. In this paper, based on recent biological findings, we proposed a new model to mimic visual information processing, motor planning and control in central and peripheral nervous systems of human. Main steps of the model are as follows: 1) Simulating "where" pathway in human: the Selective Search method is applied to simulate the function of human dorsal visual pathway to localize object candidates; 2) Simulating "what" pathway in human: a Convolutional Deep Belief Network is applied to simulate the hierarchical structure and function of human ventral visual pathway for object recognition; 3) Simulating motor planning process in human: habitual motion planning process in human is simulated, and motor commands are generated from the combination of control signals from past experiences; 4) Simulating precise movement control in human: calibrated control signals, which mimic the adjustment for movement from cerebellum in human, are generated and updated from calibration of movement errors in past experiences, and sent to the movement model to achieve high precision. The proposed framework mimics structures and functions of human recognition, visuomotor coordination and precise motor control. Experiments on object localization, recognition and movement control demonstrate that the new proposed model can not only accomplish visuomotor coordination tasks, but also achieve high precision movement with learning ability. Meanwhile, the results also prove the validity of the introduced mechanisms. Furthermore, the proposed model could be generalized and applied to other systems, such as mechanical and electrical systems in robotics, to achieve fast response, high precision movement with learning ability.Comment: 12 pages, 13 figure

    A Survey on Content-Aware Video Analysis for Sports

    Full text link
    Sports data analysis is becoming increasingly large-scale, diversified, and shared, but difficulty persists in rapidly accessing the most crucial information. Previous surveys have focused on the methodologies of sports video analysis from the spatiotemporal viewpoint instead of a content-based viewpoint, and few of these studies have considered semantics. This study develops a deeper interpretation of content-aware sports video analysis by examining the insight offered by research into the structure of content under different scenarios. On the basis of this insight, we provide an overview of the themes particularly relevant to the research on content-aware systems for broadcast sports. Specifically, we focus on the video content analysis techniques applied in sportscasts over the past decade from the perspectives of fundamentals and general review, a content hierarchical model, and trends and challenges. Content-aware analysis methods are discussed with respect to object-, event-, and context-oriented groups. In each group, the gap between sensation and content excitement must be bridged using proper strategies. In this regard, a content-aware approach is required to determine user demands. Finally, the paper summarizes the future trends and challenges for sports video analysis. We believe that our findings can advance the field of research on content-aware video analysis for broadcast sports.Comment: Accepted for publication in IEEE Transactions on Circuits and Systems for Video Technology (TCSVT

    A Probabilistic Semi-Supervised Approach to Multi-Task Human Activity Modeling

    Full text link
    Human behavior is a continuous stochastic spatio-temporal process which is governed by semantic actions and affordances as well as latent factors. Therefore, video-based human activity modeling is concerned with a number of tasks such as inferring current and future semantic labels, predicting future continuous observations as well as imagining possible future label and feature sequences. In this paper we present a semi-supervised probabilistic deep latent variable model that can represent both discrete labels and continuous observations as well as latent dynamics over time. This allows the model to solve several tasks at once without explicit fine-tuning. We focus here on the tasks of action classification, detection, prediction and anticipation as well as motion prediction and synthesis based on 3D human activity data recorded with Kinect. We further extend the model to capture hierarchical label structure and to model the dependencies between multiple entities, such as a human and objects. Our experiments demonstrate that our principled approach to human activity modeling can be used to detect current and anticipate future semantic labels and to predict and synthesize future label and feature sequences. When comparing our model to state-of-the-art approaches, which are specifically designed for e.g. action classification, we find that our probabilistic formulation outperforms or is comparable to these task specific models

    Energy-based Models for Video Anomaly Detection

    Full text link
    Automated detection of abnormalities in data has been studied in research area in recent years because of its diverse applications in practice including video surveillance, industrial damage detection and network intrusion detection. However, building an effective anomaly detection system is a non-trivial task since it requires to tackle challenging issues of the shortage of annotated data, inability of defining anomaly objects explicitly and the expensive cost of feature engineering procedure. Unlike existing appoaches which only partially solve these problems, we develop a unique framework to cope the problems above simultaneously. Instead of hanlding with ambiguous definition of anomaly objects, we propose to work with regular patterns whose unlabeled data is abundant and usually easy to collect in practice. This allows our system to be trained completely in an unsupervised procedure and liberate us from the need for costly data annotation. By learning generative model that capture the normality distribution in data, we can isolate abnormal data points that result in low normality scores (high abnormality scores). Moreover, by leverage on the power of generative networks, i.e. energy-based models, we are also able to learn the feature representation automatically rather than replying on hand-crafted features that have been dominating anomaly detection research over many decades. We demonstrate our proposal on the specific application of video anomaly detection and the experimental results indicate that our method performs better than baselines and are comparable with state-of-the-art methods in many benchmark video anomaly detection datasets

    Thin-Slicing Network: A Deep Structured Model for Pose Estimation in Videos

    Full text link
    Deep ConvNets have been shown to be effective for the task of human pose estimation from single images. However, several challenging issues arise in the video-based case such as self-occlusion, motion blur, and uncommon poses with few or no examples in training data sets. Temporal information can provide additional cues about the location of body joints and help to alleviate these issues. In this paper, we propose a deep structured model to estimate a sequence of human poses in unconstrained videos. This model can be efficiently trained in an end-to-end manner and is capable of representing appearance of body joints and their spatio-temporal relationships simultaneously. Domain knowledge about the human body is explicitly incorporated into the network providing effective priors to regularize the skeletal structure and to enforce temporal consistency. The proposed end-to-end architecture is evaluated on two widely used benchmarks (Penn Action dataset and JHMDB dataset) for video-based pose estimation. Our approach significantly outperforms the existing state-of-the-art methods.Comment: Preliminary version to appear in CVPR201

    Deep Temporal Sigmoid Belief Networks for Sequence Modeling

    Full text link
    Deep dynamic generative models are developed to learn sequential dependencies in time-series data. The multi-layered model is designed by constructing a hierarchy of temporal sigmoid belief networks (TSBNs), defined as a sequential stack of sigmoid belief networks (SBNs). Each SBN has a contextual hidden state, inherited from the previous SBNs in the sequence, and is used to regulate its hidden bias. Scalable learning and inference algorithms are derived by introducing a recognition model that yields fast sampling from the variational posterior. This recognition model is trained jointly with the generative model, by maximizing its variational lower bound on the log-likelihood. Experimental results on bouncing balls, polyphonic music, motion capture, and text streams show that the proposed approach achieves state-of-the-art predictive performance, and has the capacity to synthesize various sequences.Comment: to appear in NIPS 201

    Leveraging Machine Learning and Big Data for Smart Buildings: A Comprehensive Survey

    Full text link
    Future buildings will offer new convenience, comfort, and efficiency possibilities to their residents. Changes will occur to the way people live as technology involves into people's lives and information processing is fully integrated into their daily living activities and objects. The future expectation of smart buildings includes making the residents' experience as easy and comfortable as possible. The massive streaming data generated and captured by smart building appliances and devices contains valuable information that needs to be mined to facilitate timely actions and better decision making. Machine learning and big data analytics will undoubtedly play a critical role to enable the delivery of such smart services. In this paper, we survey the area of smart building with a special focus on the role of techniques from machine learning and big data analytics. This survey also reviews the current trends and challenges faced in the development of smart building services

    Efficient CNN Implementation for Eye-Gaze Estimation on Low-Power/Low-Quality Consumer Imaging Systems

    Full text link
    Accurate and efficient eye gaze estimation is important for emerging consumer electronic systems such as driver monitoring systems and novel user interfaces. Such systems are required to operate reliably in difficult, unconstrained environments with low power consumption and at minimal cost. In this paper a new hardware friendly, convolutional neural network model with minimal computational requirements is introduced and assessed for efficient appearance-based gaze estimation. The model is tested and compared against existing appearance based CNN approaches, achieving better eye gaze accuracy with significantly fewer computational requirements. A brief updated literature review is also provided

    Distributed Machine Learning in Materials that Couple Sensing, Actuation, Computation and Communication

    Full text link
    This paper reviews machine learning applications and approaches to detection, classification and control of intelligent materials and structures with embedded distributed computation elements. The purpose of this survey is to identify desired tasks to be performed in each type of material or structure (e.g., damage detection in composites), identify and compare common approaches to learning such tasks, and investigate models and training paradigms used. Machine learning approaches and common temporal features used in the domains of structural health monitoring, morphable aircraft, wearable computing and robotic skins are explored. As the ultimate goal of this research is to incorporate the approaches described in this survey into a robotic material paradigm, the potential for adapting the computational models used in these applications, and corresponding training algorithms, to an amorphous network of computing nodes is considered. Distributed versions of support vector machines, graphical models and mixture models developed in the field of wireless sensor networks are reviewed. Potential areas of investigation, including possible architectures for incorporating machine learning into robotic nodes, training approaches, and the possibility of using deep learning approaches for automatic feature extraction, are discussed
    corecore