4,675 research outputs found

    Multiple Texture Boltzmann Machines

    Get PDF
    We assess the generative power of the mPoTmodel of [10] with tiled-convolutional weight sharing as a model for visual textures by specifically training on this task, evaluating model performance on texture synthesis and inpainting tasks using quantitative metrics. We also analyze the relative importance of the mean and covariance parts of the mPoT model by comparing its performance to those of its subcomponents, tiled-convolutional versions of the PoT/FoE and Gaussian-Bernoulli restricted Boltzmann machine (GB-RBM). Our results suggest that while state-of-the-art or better performance can be achieved using the mPoT, similar performance can be achieved with the mean-only model. We then develop a model for multiple textures based on the GB-RBM, using a shared set of weights but texturespecific hidden unit biases. We show comparable performance of the multiple texture model to individually trained texture models.

    A Theoretical Analysis of Deep Neural Networks for Texture Classification

    Full text link
    We investigate the use of Deep Neural Networks for the classification of image datasets where texture features are important for generating class-conditional discriminative representations. To this end, we first derive the size of the feature space for some standard textural features extracted from the input dataset and then use the theory of Vapnik-Chervonenkis dimension to show that hand-crafted feature extraction creates low-dimensional representations which help in reducing the overall excess error rate. As a corollary to this analysis, we derive for the first time upper bounds on the VC dimension of Convolutional Neural Network as well as Dropout and Dropconnect networks and the relation between excess error rate of Dropout and Dropconnect networks. The concept of intrinsic dimension is used to validate the intuition that texture-based datasets are inherently higher dimensional as compared to handwritten digits or other object recognition datasets and hence more difficult to be shattered by neural networks. We then derive the mean distance from the centroid to the nearest and farthest sampling points in an n-dimensional manifold and show that the Relative Contrast of the sample data vanishes as dimensionality of the underlying vector space tends to infinity.Comment: Accepted in International Joint Conference on Neural Networks, IJCNN 201

    Audio-based music classification with a pretrained convolutional network

    Get PDF
    Recently the ‘Million Song Dataset’, containing audio features and metadata for one million songs, was made available. In this paper, we build a convolutional network that is then trained to perform artist recognition, genre recognition and key detection. The network is tailored to summarize the audio features over musically significant timescales. It is infeasible to train the network on all available data in a supervised fashion, so we use unsupervised pretraining to be able to harness the entire dataset: we train a convolutional deep belief network on all data, and then use the learnt parameters to initialize a convolutional multilayer perceptron with the same architecture. The MLP is then trained on a labeled subset of the data for each task. We also train the same MLP with randomly initialized weights. We find that our convolutional approach improves accuracy for the genre recognition and artist recognition tasks. Unsupervised pretraining improves convergence speed in all cases. For artist recognition it improves accuracy as well
    • 

    corecore