209 research outputs found

    A Genetic Programming Approach to Designing Convolutional Neural Network Architectures

    Full text link
    The convolutional neural network (CNN), which is one of the deep learning models, has seen much success in a variety of computer vision tasks. However, designing CNN architectures still requires expert knowledge and a lot of trial and error. In this paper, we attempt to automatically construct CNN architectures for an image classification task based on Cartesian genetic programming (CGP). In our method, we adopt highly functional modules, such as convolutional blocks and tensor concatenation, as the node functions in CGP. The CNN structure and connectivity represented by the CGP encoding method are optimized to maximize the validation accuracy. To evaluate the proposed method, we constructed a CNN architecture for the image classification task with the CIFAR-10 dataset. The experimental result shows that the proposed method can be used to automatically find the competitive CNN architecture compared with state-of-the-art models.Comment: This is the revised version of the GECCO 2017 paper. The code of our method is available at https://github.com/sg-nm/cgp-cn

    Recurrent DNNs and its Ensembles on the TIMIT Phone Recognition Task

    Full text link
    In this paper, we have investigated recurrent deep neural networks (DNNs) in combination with regularization techniques as dropout, zoneout, and regularization post-layer. As a benchmark, we chose the TIMIT phone recognition task due to its popularity and broad availability in the community. It also simulates a low-resource scenario that is helpful in minor languages. Also, we prefer the phone recognition task because it is much more sensitive to an acoustic model quality than a large vocabulary continuous speech recognition task. In recent years, recurrent DNNs pushed the error rates in automatic speech recognition down. But, there was no clear winner in proposed architectures. The dropout was used as the regularization technique in most cases, but combination with other regularization techniques together with model ensembles was omitted. However, just an ensemble of recurrent DNNs performed best and achieved an average phone error rate from 10 experiments 14.84 % (minimum 14.69 %) on core test set that is slightly lower then the best-published PER to date, according to our knowledge. Finally, in contrast of the most papers, we published the open-source scripts to easily replicate the results and to help continue the development.Comment: Submitted to SPECOM 2018, 20th International Conference on Speech and Compute

    How can sustainable public transport be improved? A traffic sign recognition approach using convolutional neural network

    Get PDF
    Sustainable public transport is an important factor to boost urban economic development, and it is also an important part of building a low-carbon environmental society. The application of driverless technology in public transport injects new impetus into its sustainable development. Road traffic sign recognition is the key technology of driverless public transport. It is particularly important to adopt innovative algorithms to optimize the accuracy of traffic sign recognition and build sustainable public transport. Therefore, this paper proposes a convolutional neural network (CNN) based on k-means to optimize the accuracy of traffic sign recognition, and it proposes a sparse maximum CNN to identify difficult traffic signs through hierarchical classification. In the rough classification stage, k-means CNN is used to extract features, and improved support vector machine (SVM) is used for classification. Then, in the fine classification stage, sparse maximum CNN is used for classification. The research results show that the algorithm improves the accuracy of traffic sign recognition more comprehensively and effectively, and it can be effectively applied in unmanned driving technology, which will also bring new breakthroughs for the sustainable development of public transport

    Sequence to sequence learning and its speech applications

    Full text link
    Recurrent Neural Networks (RNNs), which has the attractive properties of modelling sequences, has been dominant in speech field in the recent decades. Convolutional Neural Networks (CNNs) has been shown as an alternative to model sequences because of its capacity of reducing spectral variations and modeling spectral correlations in acoustic features for automatic speech recognition (ASR). Recent work suggests that complex numbers could be used as a richer feature representation than spectrum which may benefit the speech related tasks. In the thesis, we first cover the basic concepts in machine learning, building blocks of deep learning and discuss the popular methods that are capable of doing sequence-to-sequence modelling, specially convolutional neural networks, which is famous as a class of feed-forward nets. We then present two research work related to sequence-to-sequence modelling on speech. We introduce a new approach to address speech recognition with convolutional neural networks which shows the comparable results with their recurrent neural networks counterpart. In addition, we present a new model taking advantage of the representation in the complex domain and define complex convolutions, complex batch-normalization, complex weight initialization strategies. The new model results in state-of-the-art of speech spectrum prediction in a convolutional recurrent setting.Les réseaux neuronaux récurrents (RNN) ont été dominants dans le domaine de la parole au cours des dernières décennies, étant donné leurs propriétés attrayantes de modélisation de séquence. Les réseaux neuronaux convolutionnels (CNN) ont été présentés comme une alternative pour la modélisation de séquences en raison de leur capacité à réduire les variations spectrales et à modéliser les corrélations spectrales dans les caractéristiques acoustiques pour la reconnaissance automatique de la parole (ASR). Des travaux récents suggèrent que les nombres complexes pourraient être utilisés comme une représentation de caractéristique plus riche que le spectre et qui pouvaient donc être bénéfique pour les tâches liées à la parole. Dans la thèse, nous abordons d’abord les concepts de base de l’apprentissage automatique, les blocs de construction de l’apprentissage profond et discutons des méthodes populaires capables de faire des modélisations séquentielles, en particulier des réseaux de neurones convolutionnels, célèbres en tant que réseaux feedfoward. Nous présentons ensuite deux travaux de recherche liés à la modélisation séquence-séquence sur la parole. Premierement, nous introduisons une nouvelle approche pour adresser la reconnaissance de la parole avec des réseaux de neurones convolutionnels qui montre des performances comparables avec leur homologue des réseaux neuronaux récurrents. Deuxièmement, nous présentons un nouveau mo- dèle, tirant parti de la représentation dans le domaine complexe, et définissons des circonvolutions complexes, des stratégies complexes de normalisation par lots et d’initialisation de poids complexes. Le modèle a atteint l’état de l’art de la tâche de prédiction du spectre de la parole dans un cadre récurrent convolutionnel
    • …
    corecore