20,839 research outputs found

    Deep Recurrent Learning for Efficient Image Recognition Using Small Data

    Get PDF
    Recognition is fundamental yet open and challenging problem in computer vision. Recognition involves the detection and interpretation of complex shapes of objects or persons from previous encounters or knowledge. Biological systems are considered as the most powerful, robust and generalized recognition models. The recent success of learning based mathematical models known as artificial neural networks, especially deep neural networks, have propelled researchers to utilize such architectures for developing bio-inspired computational recognition models. However, the computational complexity of these models increases proportionally to the challenges posed by the recognition problem, and more importantly, these models require a large amount of data for successful learning. Additionally, the feedforward-based hierarchical models do not exploit another important biological learning paradigm, known as recurrency, which ubiquitously exists in the biological visual system and has been shown to be quite crucial for recognition. Consequently, this work aims to develop novel biologically relevant deep recurrent learning models for robust recognition using limited training data. First, we design an efficient deep simultaneous recurrent network (DSRN) architecture for solving several challenging image recognition tasks. The use of simultaneous recurrency in the proposed model improves the recognition performance and offers reduced computational complexity compared to the existing hierarchical deep learning models. Moreover, the DSRN architecture inherently learns meaningful representations of data during the training process which is essential to achieve superior recognition performance. However, probabilistic models such as deep generative models are particularly adept at learning representations directly from unlabeled input data. Accordingly, we show the generalization of the proposed deep simultaneous recurrency concept by developing a probabilistic deep simultaneous recurrent belief network (DSRBN) architecture which is more efficient in learning the underlying representation of the data compared to the state-of-the-art generative models. Finally, we propose a deep recurrent learning framework for solving the image recognition task using small data. We incorporate Bayesian statistics to the DSRBN generative model to propose a deep recurrent generative Bayesian model that addresses the challenge of learning from a small amount of data. Our findings suggest that the proposed deep recurrent Bayesian framework demonstrates better image recognition performance compared to the state-of-the-art models in a small data learning scenario. In conclusion, this dissertation proposes novel deep recurrent learning pipelines, which utilize not only limited training data to achieve improved image recognition performance but also require significantly reduced training parameters

    Learning and Reasoning for Robot Sequential Decision Making under Uncertainty

    Full text link
    Robots frequently face complex tasks that require more than one action, where sequential decision-making (SDM) capabilities become necessary. The key contribution of this work is a robot SDM framework, called LCORPP, that supports the simultaneous capabilities of supervised learning for passive state estimation, automated reasoning with declarative human knowledge, and planning under uncertainty toward achieving long-term goals. In particular, we use a hybrid reasoning paradigm to refine the state estimator, and provide informative priors for the probabilistic planner. In experiments, a mobile robot is tasked with estimating human intentions using their motion trajectories, declarative contextual knowledge, and human-robot interaction (dialog-based and motion-based). Results suggest that, in efficiency and accuracy, our framework performs better than its no-learning and no-reasoning counterparts in office environment.Comment: In proceedings of 34th AAAI conference on Artificial Intelligence, 202

    Deeply Learning the Messages in Message Passing Inference

    Full text link
    Deep structured output learning shows great promise in tasks like semantic image segmentation. We proffer a new, efficient deep structured model learning scheme, in which we show how deep Convolutional Neural Networks (CNNs) can be used to estimate the messages in message passing inference for structured prediction with Conditional Random Fields (CRFs). With such CNN message estimators, we obviate the need to learn or evaluate potential functions for message calculation. This confers significant efficiency for learning, since otherwise when performing structured learning for a CRF with CNN potentials it is necessary to undertake expensive inference for every stochastic gradient iteration. The network output dimension for message estimation is the same as the number of classes, in contrast to the network output for general CNN potential functions in CRFs, which is exponential in the order of the potentials. Hence CNN message learning has fewer network parameters and is more scalable for cases that a large number of classes are involved. We apply our method to semantic image segmentation on the PASCAL VOC 2012 dataset. We achieve an intersection-over-union score of 73.4 on its test set, which is the best reported result for methods using the VOC training images alone. This impressive performance demonstrates the effectiveness and usefulness of our CNN message learning method.Comment: 11 pages. Appearing in Proc. The Twenty-ninth Annual Conference on Neural Information Processing Systems (NIPS), 2015, Montreal, Canad

    Deep Learning Techniques for Music Generation -- A Survey

    Full text link
    This paper is a survey and an analysis of different ways of using deep learning (deep artificial neural networks) to generate musical content. We propose a methodology based on five dimensions for our analysis: Objective - What musical content is to be generated? Examples are: melody, polyphony, accompaniment or counterpoint. - For what destination and for what use? To be performed by a human(s) (in the case of a musical score), or by a machine (in the case of an audio file). Representation - What are the concepts to be manipulated? Examples are: waveform, spectrogram, note, chord, meter and beat. - What format is to be used? Examples are: MIDI, piano roll or text. - How will the representation be encoded? Examples are: scalar, one-hot or many-hot. Architecture - What type(s) of deep neural network is (are) to be used? Examples are: feedforward network, recurrent network, autoencoder or generative adversarial networks. Challenge - What are the limitations and open challenges? Examples are: variability, interactivity and creativity. Strategy - How do we model and control the process of generation? Examples are: single-step feedforward, iterative feedforward, sampling or input manipulation. For each dimension, we conduct a comparative analysis of various models and techniques and we propose some tentative multidimensional typology. This typology is bottom-up, based on the analysis of many existing deep-learning based systems for music generation selected from the relevant literature. These systems are described and are used to exemplify the various choices of objective, representation, architecture, challenge and strategy. The last section includes some discussion and some prospects.Comment: 209 pages. This paper is a simplified version of the book: J.-P. Briot, G. Hadjeres and F.-D. Pachet, Deep Learning Techniques for Music Generation, Computational Synthesis and Creative Systems, Springer, 201
    corecore