21 research outputs found
A Minimal Architecture for General Cognition
A minimalistic cognitive architecture called MANIC is presented. The MANIC
architecture requires only three function approximating models, and one state
machine. Even with so few major components, it is theoretically sufficient to
achieve functional equivalence with all other cognitive architectures, and can
be practically trained. Instead of seeking to transfer architectural
inspiration from biology into artificial intelligence, MANIC seeks to minimize
novelty and follow the most well-established constructs that have evolved
within various sub-fields of data science. From this perspective, MANIC offers
an alternate approach to a long-standing objective of artificial intelligence.
This paper provides a theoretical analysis of the MANIC architecture.Comment: 8 pages, 8 figures, conference, Proceedings of the 2015 International
Joint Conference on Neural Network
GERİ-YAYILMALI ÖĞRENME ALGORİTMASINDAKİ ÖĞRENME PARAMETRELERİNİN GENETİK
Bu çalışmada, ileri beslemeli bir sinir ağının eğitiminde kullanılan geri-yayılmalı öğrenme algoritmasındaki öğrenme parametreleri genetik algoritmalar kullanılarak belirlenmiştir. Öğrenme parametreleri öğrenme ve momentum katsayıları olarak bilinmektedir. Öğrenme parametreleri ağın öğrenme hızının arttırılması, öğrenme esnasında oluşabilecek osilasyonların giderilmesi ve lokal minimumlardan kaçılması gibi özellikleri belirlemektedirler. Dolayısıyla bu parametrelerin uygun biçimde seçilmesi ağın daha etkin olarak eğitilmesinde oldukça önemlidir. Öğrenme parametrelerinin genetik algoritma ile belirlenmesi için, dört katmanlı ileri beslemeli bir ağ tasarlanmıştır. Tasarlanan ağdaki üç öğrenme ve üç momentum katsayısı, genetik bir kromozom ile ifade edilmiştir. Çalışmanın amacı; en uygun kromozomun seçilmesidir. Ortaya konulan yöntemin test edilmesinde özel tanımlı iki boyutlu regresyon problemlerinden yararlanılmıştır. Yapılan test çalışması ortaya konulan yöntemin geleneksel sabit parametreli öğrenme algoritmasına göre daha etkin olduğunu göstermiştir
Gauss-newton Based Learning For Fully Recurrent Neural Networks
The thesis discusses a novel off-line and on-line learning approach for Fully Recurrent Neural Networks (FRNNs). The most popular algorithm for training FRNNs, the Real Time Recurrent Learning (RTRL) algorithm, employs the gradient descent technique for finding the optimum weight vectors in the recurrent neural network. Within the framework of the research presented, a new off-line and on-line variation of RTRL is presented, that is based on the Gauss-Newton method. The method itself is an approximate Newton\u27s method tailored to the specific optimization problem, (non-linear least squares), which aims to speed up the process of FRNN training. The new approach stands as a robust and effective compromise between the original gradient-based RTRL (low computational complexity, slow convergence) and Newton-based variants of RTRL (high computational complexity, fast convergence). By gathering information over time in order to form Gauss-Newton search vectors, the new learning algorithm, GN-RTRL, is capable of converging faster to a better quality solution than the original algorithm. Experimental results reflect these qualities of GN-RTRL, as well as the fact that GN-RTRL may have in practice lower computational cost in comparison, again, to the original RTRL
An Improved Bees Algorithm for Training Deep Recurrent Networks for Sentiment Classification
Recurrent neural networks (RNNs) are powerful tools for learning information from
temporal sequences. Designing an optimum deep RNN is difficult due to configuration and training
issues, such as vanishing and exploding gradients. In this paper, a novel metaheuristic optimisation
approach is proposed for training deep RNNs for the sentiment classification task. The approach
employs an enhanced Ternary Bees Algorithm (BA-3+), which operates for large dataset classification
problems by considering only three individual solutions in each iteration. BA-3+ combines the
collaborative search of three bees to find the optimal set of trainable parameters of the proposed deep
recurrent learning architecture. Local learning with exploitative search utilises the greedy selection
strategy. Stochastic gradient descent (SGD) learning with singular value decomposition (SVD) aims to
handle vanishing and exploding gradients of the decision parameters with the stabilisation strategy
of SVD. Global learning with explorative search achieves faster convergence without getting trapped
at local optima to find the optimal set of trainable parameters of the proposed deep recurrent learning
architecture. BA-3+ has been tested on the sentiment classification task to classify symmetric and
asymmetric distribution of the datasets from different domains, including Twitter, product reviews,
and movie reviews. Comparative results have been obtained for advanced deep language models and
Differential Evolution (DE) and Particle Swarm Optimization (PSO) algorithms. BA-3+ converged
to the global minimum faster than the DE and PSO algorithms, and it outperformed the SGD, DE,
and PSO algorithms for the Turkish and English datasets. The accuracy value and F1 measure have
improved at least with a 30–40% improvement than the standard SGD algorithm for all classification
datasets. Accuracy rates in the RNN model trained with BA-3+ ranged from 80% to 90%, while the
RNN trained with SGD was able to achieve between 50% and 60% for most datasets. The performance
of the RNN model with BA-3+ has as good as for Tree-LSTMs and Recursive Neural Tensor Networks
(RNTNs) language models, which achieved accuracy results of up to 90% for some datasets. The
improved accuracy and convergence results show that BA-3+ is an efficient, stable algorithm for the
complex classification task, and it can handle the vanishing and exploding gradients problem of
deep RNNs
A Study of Learning Issues in Feedforward Neural Networks
When training a feedforward stochastic gradient descendent trained neural network, there is a possibility of not learning a batch of patterns correctly that causes the network to fail in the predictions in the areas adjacent to those patterns. This problem has usually been resolved by directly adding more complexity to the network, normally by increasing the number of learning layers, which means it will be heavier to run on the workstation. In this paper, the properties and the effect of the patterns on the network are analysed and two main reasons why the patterns are not learned correctly are distinguished: the disappearance of the Jacobian gradient on the processing layers of the network and the opposite direction of the gradient of those patterns. A simplified experiment has been carried out on a simple neural network and the errors appearing during and after training have been monitored. Taking into account the data obtained, the initial hypothesis of causes seems to be correct. Finally, some corrections to the network are proposed with the aim of solving those training issues and to be able to offer a sufficiently correct prediction, in order to increase the complexity of the network as little as possible.The authors were supported by the government of the Basque Country through the research grant ELKARTEK KK-2021/00014 BASQNET (Estudio de nuevas técnicas de inteligencia artificial basadas en Deep Learning dirigidas a la optimización de procesos industriales)