1,612 research outputs found

    Incorporating Boltzmann Machine Priors for Semantic Labeling in Images and Videos

    Get PDF
    Semantic labeling is the task of assigning category labels to regions in an image. For example, a scene may consist of regions corresponding to categories such as sky, water, and ground, or parts of a face such as eyes, nose, and mouth. Semantic labeling is an important mid-level vision task for grouping and organizing image regions into coherent parts. Labeling these regions allows us to better understand the scene itself as well as properties of the objects in the scene, such as their parts, location, and interaction within the scene. Typical approaches for this task include the conditional random field (CRF), which is well-suited to modeling local interactions among adjacent image regions. However the CRF is limited in dealing with complex, global (long-range) interactions between regions in an image, and between frames in a video. This thesis presents approaches to modeling long-range interactions within images and videos, for use in semantic labeling. In order to model these long-range interactions, we incorporate priors based on the restricted Boltzmann machine (RBM). The RBM is a generative model which has demonstrated the ability to learn the shape of an object and the CRBM is a temporal extension which can learn the motion of an object. Although the CRF is a good baseline labeler, we show how the RBM and CRBM can be added to the architecture to model both the global object shape within an image and the temporal dependencies of the object from previous frames in a video. We demonstrate the labeling performance of our models for the parts of complex face images from the Labeled Faces in the Wild database (for images) and the YouTube Faces Database (for videos). Our hybrid models produce results that are both quantitatively and qualitatively better than the baseline CRF alone for both images and videos

    Towards Deeper Understanding in Neuroimaging

    Get PDF
    Neuroimaging is a growing domain of research, with advances in machine learning having tremendous potential to expand understanding in neuroscience and improve public health. Deep neural networks have recently and rapidly achieved historic success in numerous domains, and as a consequence have completely redefined the landscape of automated learners, giving promise of significant advances in numerous domains of research. Despite recent advances and advantages over traditional machine learning methods, deep neural networks have yet to have permeated significantly into neuroscience studies, particularly as a tool for discovery. This dissertation presents well-established and novel tools for unsupervised learning which aid in feature discovery, with relevant applications to neuroimaging. Through our works within, this dissertation presents strong evidence that deep learning is a viable and important tool for neuroimaging studies

    Action recognition based on efficient deep feature learning in the spatio-temporal domain

    Get PDF
    © 20xx IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.Hand-crafted feature functions are usually designed based on the domain knowledge of a presumably controlled environment and often fail to generalize, as the statistics of real-world data cannot always be modeled correctly. Data-driven feature learning methods, on the other hand, have emerged as an alternative that often generalize better in uncontrolled environments. We present a simple, yet robust, 2D convolutional neural network extended to a concatenated 3D network that learns to extract features from the spatio-temporal domain of raw video data. The resulting network model is used for content-based recognition of videos. Relying on a 2D convolutional neural network allows us to exploit a pretrained network as a descriptor that yielded the best results on the largest and challenging ILSVRC-2014 dataset. Experimental results on commonly used benchmarking video datasets demonstrate that our results are state-of-the-art in terms of accuracy and computational time without requiring any preprocessing (e.g., optic flow) or a priori knowledge on data capture (e.g., camera motion estimation), which makes it more general and flexible than other approaches. Our implementation is made available.Peer ReviewedPostprint (author's final draft

    Exploring the landscapes of "computing": digital, neuromorphic, unconventional -- and beyond

    Get PDF
    The acceleration race of digital computing technologies seems to be steering toward impasses -- technological, economical and environmental -- a condition that has spurred research efforts in alternative, "neuromorphic" (brain-like) computing technologies. Furthermore, since decades the idea of exploiting nonlinear physical phenomena "directly" for non-digital computing has been explored under names like "unconventional computing", "natural computing", "physical computing", or "in-materio computing". This has been taking place in niches which are small compared to other sectors of computer science. In this paper I stake out the grounds of how a general concept of "computing" can be developed which comprises digital, neuromorphic, unconventional and possible future "computing" paradigms. The main contribution of this paper is a wide-scope survey of existing formal conceptualizations of "computing". The survey inspects approaches rooted in three different kinds of background mathematics: discrete-symbolic formalisms, probabilistic modeling, and dynamical-systems oriented views. It turns out that different choices of background mathematics lead to decisively different understandings of what "computing" is. Across all of this diversity, a unifying coordinate system for theorizing about "computing" can be distilled. Within these coordinates I locate anchor points for a foundational formal theory of a future computing-engineering discipline that includes, but will reach beyond, digital and neuromorphic computing.Comment: An extended and carefully revised version of this manuscript has now (March 2021) been published as "Toward a generalized theory comprising digital, neuromorphic, and unconventional computing" in the new open-access journal Neuromorphic Computing and Engineerin

    PeopleNet: A Novel People Counting Framework for Head-Mounted Moving Camera Videos

    Get PDF
    Traditional crowd counting (optical flow or feature matching) techniques have been upgraded to deep learning (DL) models due to their lack of automatic feature extraction and low-precision outcomes. Most of these models were tested on surveillance scene crowd datasets captured by stationary shooting equipment. It is very challenging to perform people counting from the videos shot with a head-mounted moving camera; this is mainly due to mixing the temporal information of the moving crowd with the induced camera motion. This study proposed a transfer learning-based PeopleNet model to tackle this significant problem. For this, we have made some significant changes to the standard VGG16 model, by disabling top convolutional blocks and replacing its standard fully connected layers with some new fully connected and dense layers. The strong transfer learning capability of the VGG16 network yields in-depth insights of the PeopleNet into the good quality of density maps resulting in highly accurate crowd estimation. The performance of the proposed model has been tested over a self-generated image database prepared from moving camera video clips, as there is no public and benchmark dataset for this work. The proposed framework has given promising results on various crowd categories such as dense, sparse, average, etc. To ensure versatility, we have done self and cross-evaluation on various crowd counting models and datasets, which proves the importance of the PeopleNet model in adverse defense of society

    MoFA: Model-based Deep Convolutional Face Autoencoder for Unsupervised Monocular Reconstruction

    Get PDF
    In this work we propose a novel model-based deep convolutional autoencoder that addresses the highly challenging problem of reconstructing a 3D human face from a single in-the-wild color image. To this end, we combine a convolutional encoder network with an expert-designed generative model that serves as decoder. The core innovation is our new differentiable parametric decoder that encapsulates image formation analytically based on a generative model. Our decoder takes as input a code vector with exactly defined semantic meaning that encodes detailed face pose, shape, expression, skin reflectance and scene illumination. Due to this new way of combining CNN-based with model-based face reconstruction, the CNN-based encoder learns to extract semantically meaningful parameters from a single monocular input image. For the first time, a CNN encoder and an expert-designed generative model can be trained end-to-end in an unsupervised manner, which renders training on very large (unlabeled) real world data feasible. The obtained reconstructions compare favorably to current state-of-the-art approaches in terms of quality and richness of representation.Comment: International Conference on Computer Vision (ICCV) 2017 (Oral), 13 page

    Bidirectional Learning in Recurrent Neural Networks Using Equilibrium Propagation

    Get PDF
    Neurobiologically-plausible learning algorithms for recurrent neural networks that can perform supervised learning are a neglected area of study. Equilibrium propagation is a recent synthesis of several ideas in biological and artificial neural network research that uses a continuous-time, energy-based neural model with a local learning rule. However, despite dealing with recurrent networks, equilibrium propagation has only been applied to discriminative categorization tasks. This thesis generalizes equilibrium propagation to bidirectional learning with asymmetric weights. Simultaneously learning the discriminative as well as generative transformations for a set of data points and their corresponding category labels, bidirectional equilibrium propagation utilizes recurrence and weight asymmetry to share related but non-identical representations within the network. Experiments on an artificial dataset demonstrate the ability to learn both transformations, as well as the ability for asymmetric-weight networks to generalize their discriminative training to the untrained generative task
    • …
    corecore