592 research outputs found

    Encoding of phonology in a recurrent neural model of grounded speech

    Full text link
    We study the representation and encoding of phonemes in a recurrent neural network model of grounded speech. We use a model which processes images and their spoken descriptions, and projects the visual and auditory representations into the same semantic space. We perform a number of analyses on how information about individual phonemes is encoded in the MFCC features extracted from the speech signal, and the activations of the layers of the model. Via experiments with phoneme decoding and phoneme discrimination we show that phoneme representations are most salient in the lower layers of the model, where low-level signals are processed at a fine-grained level, although a large amount of phonological information is retain at the top recurrent layer. We further find out that the attention mechanism following the top recurrent layer significantly attenuates encoding of phonology and makes the utterance embeddings much more invariant to synonymy. Moreover, a hierarchical clustering of phoneme representations learned by the network shows an organizational structure of phonemes similar to those proposed in linguistics.Comment: Accepted at CoNLL 201

    Music Genre Classification With Neural Networks: An Examination Of Several Impactful Variables

    Get PDF
    There have been several attempts to classify music with content-based machine learning approaches. Most of these projects followed a similar procedure with a Deep Belief Network. In this project, we examined the performance of convolutional neural networks (CNN) and recurrent neural networks (RNN) as well as other components of a classification architecture, such as the choice of dataset, pre-processing techniques, and the sample size. Under a controlled environment, we discovered that the most successful architecture was a Mel-spectrogram combined with a CNN. Although our results fell behind the state-of-the-art performance, we outperform other music classification studies that use a CNN by a large margin. By performing binary classification, we also discovered individuality across genres that caused inconsistent performance

    A hybrid deep learning approach for texture analysis

    Get PDF
    Texture classification is a problem that has various applications such as remote sensing and forest species recognition. Solutions tend to be custom fit to the dataset used but fails to generalize. The Convolutional Neural Network (CNN) in combination with Support Vector Machine (SVM) form a robust selection between powerful invariant feature extractor and accurate classifier. The fusion of classifiers shows the stability of classification among different datasets and slight improvement compared to state of the art methods. The classifiers are fused using confusion matrix after independent training of each using the same training set, then put to test. Statistical information about each classifier is fed to a confusion matrix that generates two confidence measures used in building two binary classifiers. The binary classifier is allowed to activate or deactivate a classifier during testing time based on a confidence measure obtained from the confusion matrix. The method obtained results approaching state of the art with a difference less than 1% in classification success rates. Moreover, the method was able to maintain this success rate among different datasets while other methods had failed to obtain similar stability. Two datasets had been used in this research Brodatz and Kylberg where the results came 98.17% and 99.70%. In comparison to conventional methods in the literature, it came as 98.9% and 99.64% respectively

    Improving Classification in Single and Multi-View Images

    Get PDF
    Image classification is a sub-field of computer vision that focuses on identifying objects within digital images. In order to improve image classification we must address the following areas of improvement: 1) Single and Multi-View data quality using data pre-processing techniques. 2) Enhancing deep feature learning to extract alternative representation of the data. 3) Improving decision or prediction of labels. This dissertation presents a series of four published papers that explore different improvements of image classification. In our first paper, we explore the Siamese network architecture to create a Convolution Neural Network based similarity metric. We learn the priority features that differentiate two given input images. The metric proposed achieves state-of-the-art Fβ measure. In our second paper, we explore multi-view data classification. We investigate the application of Generative Adversarial Networks GANs on Multi-view data image classification and few-shot learning. Experimental results show that our method outperforms state-of-the-art research. In our third paper, we take on the challenge of improving ResNet backbone model. For this task, we focus on improving channel attention mechanisms. We utilize Discrete Wavelet Transform compression to address the channel representation problem. Experimental results on ImageNet shows that our method outperforms baseline SENet-34 and SOTA FcaNet-34 at no extra computational cost. In our fourth paper, we investigate further the potential of orthogonalization of filters for extraction of diverse information for channel attention. We prove that using only random constant orthogonal filters is sufficient enough to achieve good channel attention. We test our proposed method using ImageNet, Places365, and Birds datasets for image classification, MS-COCO for object detection, and instance segmentation tasks. Our method outperforms FcaNet, and WaveNet and achieves the state-of-the-art results

    Improving Classification in Single and Multi-View Images

    Get PDF
    Image classification is a sub-field of computer vision that focuses on identifying objects within digital images. In order to improve image classification we must address the following areas of improvement: 1) Single and Multi-View data quality using data pre-processing techniques. 2) Enhancing deep feature learning to extract alternative representation of the data. 3) Improving decision or prediction of labels. This dissertation presents a series of four published papers that explore different improvements of image classification. In our first paper, we explore the Siamese network architecture to create a Convolution Neural Network based similarity metric. We learn the priority features that differentiate two given input images. The metric proposed achieves state-of-the-art Fβ measure. In our second paper, we explore multi-view data classification. We investigate the application of Generative Adversarial Networks GANs on Multi-view data image classification and few-shot learning. Experimental results show that our method outperforms state-of-the-art research. In our third paper, we take on the challenge of improving ResNet backbone model. For this task, we focus on improving channel attention mechanisms. We utilize Discrete Wavelet Transform compression to address the channel representation problem. Experimental results on ImageNet shows that our method outperforms baseline SENet-34 and SOTA FcaNet-34 at no extra computational cost. In our fourth paper, we investigate further the potential of orthogonalization of filters for extraction of diverse information for channel attention. We prove that using only random constant orthogonal filters is sufficient enough to achieve good channel attention. We test our proposed method using ImageNet, Places365, and Birds datasets for image classification, MS-COCO for object detection, and instance segmentation tasks. Our method outperforms FcaNet, and WaveNet and achieves the state-of-the-art results

    Elastic pre-stack seismic inversion through Discrete Cosine Transform reparameterization and Convolutional Neural Networks

    Get PDF
    We develop a pre-stack inversion algorithm that combines a Discrete Cosine Transform (DCT) reparameterization of data and model spaces with a Convolutional Neural Network (CNN). The CNN is trained to predict the mapping between the DCT-transformed seismic data and the DCT-transformed 2-D elastic model. A convolutional forward modeling based on the full Zoeppritz equations constitutes the link between the elastic properties and the seismic data. The direct sequential co-simulation algorithm with joint probability distribution is used to generate the training and validation datasets under the assumption of a stationary non-parametric prior and a Gaussian variogram model for the elastic properties. The DCT is an orthogonal transformation that is here used as an additional feature extraction technique that reduces the number of unknown parameters in the inversion and the dimensionality of the input and output of the network. The DCT reparameterization also acts as a regularization operator in the model space and allows for the preservation of the lateral and vertical continuity of the elastic properties in the recovered solution. We also implement a Monte Carlo simulation strategy that propagates onto the estimated elastic model the uncertainties related to both noise contamination and network approximation. We focus on synthetic inversions on a realistic subsurface model that mimics a real gas-saturated reservoir hosted in a turbiditic sequence. We compare the outcomes of the implemented algorithm with those provided by a popular linear inversion approach and we also assess the robustness of the CNN inversion to errors in the estimated source wavelet and to erroneous assumptions about the noise statistic. Our tests confirm the applicability of the proposed approach, opening the possibility to estimate the subsurface elastic parameters and the associated uncertainties in near real-time while satisfactorily preserving the assumed spatial variability and the statistical properties of the elastic parameters
    corecore