Search CORE

8,372 research outputs found

Parallel growing and training of neural networks using output parallelism

Author: Guan SU
Li SC
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/05/2002
Field of study

In order to find an appropriate architecture for a large-scale real-world application automatically and efficiently, a natural method is to divide the original problem into a set of sub-problems. In this paper, we propose a simple neural network task decomposition method based on output parallelism. By using this method, a problem can be divided flexibly into several sub-problems as chosen, each of which is composed of the whole input vector and a fraction of the output vector. Each module (for one sub-problem) is responsible for producing a fraction of the output vector of the original problem. The hidden structure for the original problem’s output units are decoupled. These modules can be grown and trained in parallel on parallel processing elements. Incorporated with a constructive learning algorithm, our method does not require excessive computation and any prior knowledge concerning decomposition. The feasibility of output parallelism is analyzed and proved. Some benchmarks are implemented to test the validity of this method. Their results show that this method can reduce computational time, increase learning speed and improve generalization accuracy for both classification and regression problems

Brunel University Research Archive

Task decomposition using pattern distributor

Author: Bao C
Guan SU
Neo T
Publication venue: Freund & Pettman
Publication date: 01/01/2004
Field of study

In this paper, we propose a new task decomposition method for multilayered feedforward neural networks, namely Task Decomposition with Pattern Distributor in order to shorten the training time and improve the generalization accuracy of a network under training. This new method uses the combination of modules (small-size feedforward network) in parallel and series, to produce the overall solution for a complex problem. Based on a “divide-and-conquer” technique, the original problem is decomposed into several simpler sub-problems by a pattern distributor module in the network, where each sub-problem is composed of the whole input vector and a fraction of the output vector of the original problem. These sub-problems are then solved by the corresponding groups of modules, where each group of modules is connected in series with the pattern distributor module and the modules in each group are connected in parallel. The design details and implementation of this new method are introduced in this paper. Several benchmark classification problems are used to test this new method. The analysis and experimental results show that this new method could reduce training time and improve generalization accuracy

Directory of Open Access Journals

Brunel University Research Archive

Reduced pattern training based on task decomposition using pattern distributor

Author: Bao C
Guan SU
Neo T
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/11/2007
Field of study

Task Decomposition with Pattern Distributor (PD) is a new task decomposition method for multilayered feedforward neural networks. Pattern distributor network is proposed that implements this new task decomposition method. We propose a theoretical model to analyze the performance of pattern distributor network. A method named Reduced Pattern Training is also introduced, aiming to improve the performance of pattern distribution. Our analysis and the experimental results show that reduced pattern training improves the performance of pattern distributor network significantly. The distributor module’s classification accuracy dominates the whole network’s performance. Two combination methods, namely Cross-talk based combination and Genetic Algorithm based combination, are presented to find suitable grouping for the distributor module. Experimental results show that this new method can reduce training time and improve network generalization accuracy when compared to a conventional method such as constructive backpropagation or a task decomposition method such as Output Parallelism

Brunel University Research Archive

Tackling Exascale Software Challenges in Molecular Dynamics Simulations with GROMACS

Author: A Arnold
A Faradjian
B Hess
C Schütte
G Wilson
JA Anderson
JC Phillips
KJ Bowers
KJ Bowers
L Verlet
M Eleftheriou
M Shirts
MJ Abraham
P Eastman
R Yokota
S Pronk
S Páll
U Essmann
W Humphrey
WM Brown
Y Andoh
Y Sugita
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2015
Field of study

GROMACS is a widely used package for biomolecular simulation, and over the last two decades it has evolved from small-scale efficiency to advanced heterogeneous acceleration and multi-level parallelism targeting some of the largest supercomputers in the world. Here, we describe some of the ways we have been able to realize this through the use of parallelization on all levels, combined with a constant focus on absolute performance. Release 4.6 of GROMACS uses SIMD acceleration on a wide range of architectures, GPU offloading acceleration, and both OpenMP and MPI parallelism within and between nodes, respectively. The recent work on acceleration made it necessary to revisit the fundamental algorithms of molecular simulation, including the concept of neighborsearching, and we discuss the present and future challenges we see for exascale simulation - in particular a very fine-grained task parallelism. We also discuss the software management, code peer review and continuous integration testing required for a project of this complexity.Comment: EASC 2014 conference proceedin

arXiv.org e-Print Archive

Digitala Vetenskapliga Arkivet - Academic Archive On-line

Multi-learner based recursive supervised training

Author: Guan SU
Iyer L
Ramanathan K
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2006
Field of study

In this paper, we propose the Multi-Learner Based Recursive Supervised Training (MLRT) algorithm which uses the existing framework of recursive task decomposition, by training the entire dataset, picking out the best learnt patterns, and then repeating the process with the remaining patterns. Instead of having a single learner to classify all datasets during each recursion, an appropriate learner is chosen from a set of three learners, based on the subset of data being trained, thereby avoiding the time overhead associated with the genetic algorithm learner utilized in previous approaches. In this way MLRT seeks to identify the inherent characteristics of the dataset, and utilize it to train the data accurately and efficiently. We observed that empirically, MLRT performs considerably well as compared to RPHP and other systems on benchmark data with 11% improvement in accuracy on the SPAM dataset and comparable performances on the VOWEL and the TWO-SPIRAL problems. In addition, for most datasets, the time taken by MLRT is considerably lower than the other systems with comparable accuracy. Two heuristic versions, MLRT-2 and MLRT-3 are also introduced to improve the efficiency in the system, and to make it more scalable for future updates. The performance in these versions is similar to the original MLRT system

Brunel University Research Archive