209 research outputs found
A Genetic Programming Approach to Designing Convolutional Neural Network Architectures
The convolutional neural network (CNN), which is one of the deep learning
models, has seen much success in a variety of computer vision tasks. However,
designing CNN architectures still requires expert knowledge and a lot of trial
and error. In this paper, we attempt to automatically construct CNN
architectures for an image classification task based on Cartesian genetic
programming (CGP). In our method, we adopt highly functional modules, such as
convolutional blocks and tensor concatenation, as the node functions in CGP.
The CNN structure and connectivity represented by the CGP encoding method are
optimized to maximize the validation accuracy. To evaluate the proposed method,
we constructed a CNN architecture for the image classification task with the
CIFAR-10 dataset. The experimental result shows that the proposed method can be
used to automatically find the competitive CNN architecture compared with
state-of-the-art models.Comment: This is the revised version of the GECCO 2017 paper. The code of our
method is available at https://github.com/sg-nm/cgp-cn
Recurrent DNNs and its Ensembles on the TIMIT Phone Recognition Task
In this paper, we have investigated recurrent deep neural networks (DNNs) in
combination with regularization techniques as dropout, zoneout, and
regularization post-layer. As a benchmark, we chose the TIMIT phone recognition
task due to its popularity and broad availability in the community. It also
simulates a low-resource scenario that is helpful in minor languages. Also, we
prefer the phone recognition task because it is much more sensitive to an
acoustic model quality than a large vocabulary continuous speech recognition
task. In recent years, recurrent DNNs pushed the error rates in automatic
speech recognition down. But, there was no clear winner in proposed
architectures. The dropout was used as the regularization technique in most
cases, but combination with other regularization techniques together with model
ensembles was omitted. However, just an ensemble of recurrent DNNs performed
best and achieved an average phone error rate from 10 experiments 14.84 %
(minimum 14.69 %) on core test set that is slightly lower then the
best-published PER to date, according to our knowledge. Finally, in contrast of
the most papers, we published the open-source scripts to easily replicate the
results and to help continue the development.Comment: Submitted to SPECOM 2018, 20th International Conference on Speech and
Compute
How can sustainable public transport be improved? A traffic sign recognition approach using convolutional neural network
Sustainable public transport is an important factor to boost urban economic development, and it is also an important part of building a low-carbon environmental society. The application of driverless technology in public transport injects new impetus into its sustainable development. Road traffic sign recognition is the key technology of driverless public transport. It is particularly important to adopt innovative algorithms to optimize the accuracy of traffic sign recognition and build sustainable public transport. Therefore, this paper proposes a convolutional neural network (CNN) based on k-means to optimize the accuracy of traffic sign recognition, and it proposes a sparse maximum CNN to identify difficult traffic signs through hierarchical classification. In the rough classification stage, k-means CNN is used to extract features, and improved support vector machine (SVM) is used for classification. Then, in the fine classification stage, sparse maximum CNN is used for classification. The research results show that the algorithm improves the accuracy of traffic sign recognition more comprehensively and effectively, and it can be effectively applied in unmanned driving technology, which will also bring new breakthroughs for the sustainable development of public transport
Sequence to sequence learning and its speech applications
Recurrent Neural Networks (RNNs), which has the attractive properties of modelling sequences, has been dominant in speech field in the recent decades. Convolutional Neural Networks (CNNs) has been shown as an alternative to model sequences because of its capacity of reducing spectral variations and modeling spectral correlations in acoustic features for automatic speech recognition (ASR). Recent work suggests that complex numbers could be used as a richer feature representation than spectrum which may benefit the speech related tasks.
In the thesis, we first cover the basic concepts in machine learning, building blocks of deep learning and discuss the popular methods that are capable of doing sequence-to-sequence modelling, specially convolutional neural networks, which is famous as a class of feed-forward nets. We then present two research work related to sequence-to-sequence modelling on speech. We introduce a new approach to address speech recognition with convolutional neural networks which shows the comparable results with their recurrent neural networks counterpart. In addition, we present a new model taking advantage of the representation in the complex domain and define complex convolutions, complex batch-normalization, complex weight initialization strategies. The new model results in state-of-the-art of speech spectrum prediction in a convolutional recurrent setting.Les réseaux neuronaux récurrents (RNN) ont été dominants dans le domaine de la parole au cours des dernières décennies, étant donné leurs propriétés attrayantes de modélisation de séquence. Les réseaux neuronaux convolutionnels (CNN) ont
été présentés comme une alternative pour la modélisation de séquences en raison de leur capacité à réduire les variations spectrales et à modéliser les corrélations spectrales dans les caractéristiques acoustiques pour la reconnaissance automatique de la parole (ASR). Des travaux récents suggèrent que les nombres complexes pourraient être utilisés comme une représentation de caractéristique plus riche que le spectre et qui pouvaient donc être bénéfique pour les tâches liées à la parole. Dans la thèse, nous abordons d’abord les concepts de base de l’apprentissage automatique, les blocs de construction de l’apprentissage profond et discutons des méthodes populaires capables de faire des modélisations séquentielles, en particulier des réseaux de neurones convolutionnels, célèbres en tant que réseaux feedfoward. Nous présentons ensuite deux travaux de recherche liés à la modélisation séquence-séquence sur la parole. Premierement, nous introduisons une nouvelle approche pour adresser la reconnaissance de la parole avec des réseaux de neurones
convolutionnels qui montre des performances comparables avec leur homologue des réseaux neuronaux récurrents. Deuxièmement, nous présentons un nouveau mo- dèle, tirant parti de la représentation dans le domaine complexe, et définissons des circonvolutions complexes, des stratégies complexes de normalisation par lots et d’initialisation de poids complexes. Le modèle a atteint l’état de l’art de la tâche de prédiction du spectre de la parole dans un cadre récurrent convolutionnel
- …