7 research outputs found
Parallel growing and training of neural networks using output parallelism
In order to find an appropriate architecture for a large-scale real-world application automatically and efficiently, a natural method is to divide the original problem into a set of sub-problems. In this paper, we propose a simple neural network task decomposition method based on output parallelism. By using this method, a problem can be divided flexibly into several sub-problems as chosen, each of which is composed of the whole input vector and a fraction of the output vector. Each module (for one sub-problem) is responsible for producing a fraction of the output vector of the original problem. The hidden structure for the original problem’s output units are decoupled. These modules can be grown and trained in parallel on parallel processing elements. Incorporated with a constructive learning algorithm, our method does not require excessive computation and any prior knowledge concerning decomposition. The feasibility of output parallelism is analyzed and proved. Some benchmarks are implemented to test the validity of this method. Their results show that this method can reduce computational time, increase learning speed and improve generalization accuracy for both classification and regression problems
Flexibility and accuracy enhancement techniques for neural networks
Master'sMASTER OF ENGINEERIN
Constructive neural networks : generalisation, convergence and architectures
Feedforward neural networks trained via supervised learning have proven to be
successful in the field of pattern recognition. The most important feature of a
pattern recognition technique is its ability to successfully classify future data.
This is known as generalisation. A more practical aspect of pattern recognition
methods is how quickly they can be trained and how reliably a good solution is
found. Feedforward neural networks have been shown to provide good generali-
sation on a variety of problems. A number of training techniques also exist that
provide fast convergence.
Two problems often addressed within the field of feedforward neural networks are
how to improve thegeneralisation and convergence of these pattern recognition
techniques. These two problems are addressed in this thesis through the frame-
work of constructive neural network algorithms. Constructive neural networks
are a type of feedforward neural network in which the network architecture is
built during the training process. The type of architecture built can affect both
generalisation and convergence speed.
Convergence speed and reliability areimportant properties of feedforward neu-
ral networks. These properties are studied by examining different training al-
gorithms and the effect of using a constructive process. A new gradient based
training algorithm, SARPROP, is introduced. This algorithm addresses the
problems of poor convergence speed and reliability when using a gradient based
training method. SARPROP is shown to increase both convergence speed and
the chance of convergence to a good solution. This is achieved through the
combination of gradient based and Simulated Annealing methods.
The convergence properties of various constructive algorithms are examined
through a series of empirical studies. The results of these studies demonstrate
that the cascade architecture allows for faster, more reliable convergence using a
gradient based method than a single layer architecture with a comparable num-
ber of weights. It is shown that constructive algorithms that bias the search
direction of the gradient based training algorithm for the newly added hidden
neurons, produce smaller networks and more rapid convergence. A constructive
algorithm using search direction biasing is shown to converge to solutions with
networks that are unreliable and ineÆcient to train using a non-constructive
gradient based algorithm. The technique of weight freezing is shown to result in
larger architectures than those obtained from training the whole network.
Improving the generalisation ability of constructive neural networks is an im-
portant area of investigation. A series of empirical studies are performed to
examine the effect of regularisation on generalisation in constructive cascade al-
gorithms. It is found that the combination of early stopping and regularisation
results in better generalisation than the use of early stopping alone. A cubic
regularisation term that greatly penalises large weights is shown to be benefi-
cial for generalisation in cascade networks. An adaptive method of setting the
regularisation magnitude in constructive networks is introduced and is shown
to produce generalisation results similar to those obtained with a fixed, user-
optimised regularisation setting. This adaptive method also oftenresults in the
construction of smaller networks for more complex problems.
The insights obtained from the SARPROP algorithm and from the convergence
and generalisation empirical studies are used to create a new constructive cascade
algorithm, acasper. This algorithm is extensively benchmarked and is shown to
obtain good generalisation results in comparison to a number of well-respected
and successful neural network algorithms. A technique of incorporating the
validation data into the training set after network construction is introduced
and is shown to generally result in similar or improved generalisation.
The diÆculties of implementing a cascade architecture in VLSI are described
and results are given on the effect of the cascade architecture on such attributes
as weight growth, fan-in, network depth, and propagation delay. Two variants
of the cascade architecture are proposed. These new architectures are shown
to produce similar generalisation results to the cascade architecture, while also
addressing the problems of VLSI implementation of cascade networks