190 research outputs found
Порівняння ефективності групового навчання багатошарового персептрону на паралельному комп’ютері та обчислювальному кластері
Паралельний метод групового навчання багатошарового персептрону (БШП) на основі алгоритму зворотного поширення помилки та дослідження ефективності розпаралелення цього методу на паралельному комп’ютері та обчислювальних кластерах представлено в цій статті. Модель БШП та послідовний метод групового навчання описані теоретично. Представлено алгоритмічний опис паралельного методу групового навчання. Ефективність розпаралелення методу досліджена на поступово збільшуваних розмірностях сценаріїв навчання. Результати експериментальних досліджень показали, що (I) обчислювальний кластер з комунікаційним інтерфейсом Infiniband показав кращу ефективність розпаралелення, ніж паралельний комп’ютер загального призначення з ccNuma архітектурою через менші комунікаційні втрати та (II) ефективність розпаралелення представленого методу є достатньо високою для його успішного застосування на паралельних комп’ютерах та кластерах загального призначення, наявних в сучасних обчислювальних ГРІД-системах.The development of a parallel method for batch pattern training of a multilayer perceptron with the back propagation training algorithm and the research of its efficiency on general-purpose parallel computer and computational clusters are presented in this paper. The model of a multilayer perceptron and the usual sequential batch pattern training method are theoretically described. An algorithmic description of the parallel version of the batch pattern training method is presented. The efficiency of parallelization of the developed method is investigated on the progressive increasing the dimension of the parallelized problem. The results of the experimental researches show that (I) the computational cluster with Infiniband interconnection shows better values of parallelization efficiency in comparison with general-purpose parallel computer with ccNuma architecture due to lower communication overhead and (II) the parallelization efficiency of the method is high enough for its appropriate usage on general-purpose parallel computers and clusters available within modern computational grids
Recommended from our members
High performance latent dirichlet allocation for text mining
This thesis was submitted for the degree of Doctor of Philosophy and awarded by Brunel University.Latent Dirichlet Allocation (LDA), a total probability generative model, is a three-tier Bayesian model. LDA computes the latent topic structure of the data and obtains the significant information of documents. However, traditional LDA has several limitations in practical applications. LDA cannot be directly used in classification because it is a non-supervised learning model. It needs to be embedded into appropriate classification algorithms. LDA is a generative model as it normally generates the latent topics in the categories where the target documents do not belong to, producing the deviation in computation and reducing the classification accuracy. The number of topics in LDA influences the learning process of model parameters greatly. Noise samples in the training data also affect the final text classification result. And, the quality of LDA based classifiers depends on the quality of the training samples to a great extent. Although parallel LDA algorithms are proposed to deal with huge amounts of data, balancing computing loads in a computer cluster poses another challenge. This thesis presents a text classification method which combines the LDA model and Support Vector Machine (SVM) classification algorithm for an improved accuracy in classification when reducing the dimension of datasets. Based on Density-Based Spatial Clustering of Applications with Noise (DBSCAN), the algorithm automatically optimizes the number of topics to be selected which reduces the number of iterations in computation. Furthermore, this thesis presents a noise data reduction scheme to process noise data. When the noise ratio is large in the training data set, the noise reduction scheme can always produce a high level of accuracy in classification. Finally, the thesis parallelizes LDA using the MapReduce model which is the de facto computing standard in supporting data intensive applications. A genetic algorithm based load balancing algorithm is designed to balance the workloads among computers in a heterogeneous MapReduce cluster where the computers have a variety of computing resources in terms of CPU speed, memory space and hard disk space
Recommended from our members
Computing resources sensitive parallelization of neural neworks for large scale diabetes data modelling, diagnosis and prediction
This thesis was submitted for the degree of Doctor of Philosophy and awarded by Brunel University.Diabetes has become one of the most severe deceases due to an increasing number of diabetes patients globally. A large amount of digital data on diabetes has been collected through various channels. How to utilize these data sets to help doctors to make a decision on diagnosis, treatment and prediction of diabetic patients poses many challenges to the research community. The thesis investigates mathematical models with a focus on neural networks for large scale diabetes data modelling and analysis by utilizing modern computing technologies such as grid computing and cloud computing. These computing technologies provide users with an inexpensive way to have access to extensive computing resources over the Internet for solving data and computationally intensive problems. This thesis evaluates the performance of seven representative machine learning techniques in classification of diabetes data and the results show that neural network produces the best accuracy in classification but incurs high overhead in data training. As a result, the thesis develops MRNN, a parallel neural network model based on the MapReduce programming model which has become an enabling technology in support of data intensive applications in the clouds.
By partitioning the diabetic data set into a number of equally sized data blocks, the workload in training is distributed among a number of computing nodes for speedup in data training. MRNN is first evaluated in small scale experimental environments using 12 mappers and subsequently is evaluated in large scale simulated environments using up to 1000 mappers. Both the experimental and simulations results have shown the effectiveness of MRNN in classification, and its high scalability in data training.
MapReduce does not have a sophisticated job scheduling scheme for heterogonous computing environments in which the computing nodes may have varied computing capabilities. For this purpose, this thesis develops a load balancing scheme based on genetic algorithms with an aim to balance the training workload among heterogeneous computing nodes. The nodes with more computing capacities will receive more MapReduce jobs for execution. Divisible load theory is employed to guide the evolutionary process of the genetic algorithm with an aim to achieve fast convergence. The proposed load balancing scheme is evaluated in large scale simulated MapReduce environments with varied levels of heterogeneity using different sizes of data sets. All the results show that the genetic algorithm based load balancing scheme significantly reduce the makespan in job execution in comparison with the time consumed without load balancing.This work is funded by the EPSRC and China Market Association
Wrapper algorithms and their performance assessment on high-dimensional molecular data
Prediction problems on high-dimensional molecular data, e.g. the classification of microar-
ray samples into normal and cancer tissues, are complex and ill-posed since the number
of variables usually exceeds the number of observations by orders of magnitude. Recent
research in the area has propagated a variety of new statistical models in order to handle
these new biological datasets. In practice, however, these models are always applied in
combination with preprocessing and variable selection methods as well as model selection
which is mostly performed by cross-validation. Varma and Simon (2006) have used the
term ‘wrapper-algorithm’ for this integration of preprocessing and model selection into the
construction of statistical models. Additionally, they have proposed the method of nested
cross-validation (NCV) as a way of estimating their prediction error which has evolved to
the gold-standard by now.
In the first part, this thesis provides further theoretical and empirical justification for
the usage of NCV in the context of wrapper-algorithms. Moreover, a computationally less
intensive alternative to NCV is proposed which can be motivated in a decision theoretic
framework. The new method can be interpreted as a smoothed variant of NCV and, in
contrast to NCV, guarantees intuitive bounds for the estimation of the prediction error.
The second part focuses on the ranking of wrapper algorithms. Cross-study-validation is
proposed as an alternative concept to the repetition of separated within-study-validations
if several similar prediction problems are available. The concept is demonstrated using
six different wrapper algorithms for survival prediction on censored data on a selection of
eight breast cancer datasets. Additionally, a parametric bootstrap approach for simulating
realistic data from such related prediction problems is described and subsequently applied
to illustrate the concept of cross-study-validation for the ranking of wrapper algorithms.
Eventually, the last part approaches computational aspects of the analyses and simula-
tions performed in the thesis. The preprocessing before the analysis as well as the evaluation
of the prediction models requires the usage of large computing resources. Parallel comput-
ing approaches are illustrated on cluster, cloud and high performance computing resources
using the R programming language. Usage of heterogeneous hardware and processing of
large datasets are covered as well as the implementation of the R-package survHD for
the analysis and evaluation of high-dimensional wrapper algorithms for survival prediction
from censored data.Prädiktionsprobleme für hochdimensionale genetische Daten, z.B. die Klassifikation von
Proben in normales und Krebsgewebe, sind komplex und unterbestimmt, da die Anzahl
der Variablen die Anzahl der Beobachtungen um ein Vielfaches übersteigt. Die Forschung
hat auf diesem Gebiet in den letzten Jahren eine Vielzahl an neuen statistischen Meth-
oden hervorgebracht. In der Praxis werden diese Algorithmen jedoch stets in Kombination mit Vorbearbeitung und Variablenselektion sowie Modellwahlverfahren angewandt,
wobei letztere vorwiegend mit Hilfe von Kreuzvalidierung durchgeführt werden. Varma
und Simon (2006) haben den Begriff ’Wrapper-Algorithmus’ für eine derartige Einbet-
tung von Vorbearbeitung und Modellwahl in die Konstruktion einer statistischen Methode
verwendet. Zudem haben sie die genestete Kreuzvalidierung (NCV) als eine Methode
zur Sch ̈atzung ihrer Fehlerrate eingeführt, welche sich mittlerweile zum Goldstandard entwickelt hat. Im ersten Teil dieser Doktorarbeit, wird eine tiefergreifende theoretische
Grundlage sowie eine empirische Rechtfertigung für die Anwendung von NCV bei solchen
’Wrapper-Algorithmen’ vorgestellt. Außerdem wird eine alternative, weniger computerintensive Methode vorgeschlagen, welche im Rahmen der Entscheidungstheorie motiviert
wird. Diese neue Methode kann als eine gegl ̈attete Variante von NCV interpretiert wer-
den und hält im Gegensatz zu NCV intuitive Grenzen bei der Fehlerratenschätzung ein.
Der zweite Teil behandelt den Vergleich verschiedener ’Wrapper-Algorithmen’ bzw. das
Sch ̈atzen ihrer Reihenfolge gem ̈aß eines bestimmten Gütekriteriums. Als eine Alterna-
tive zur wiederholten Durchführung von Kreuzvalidierung auf einzelnen Datensätzen wird
das Konzept der studienübergreifenden Validierung vorgeschlagen. Das Konzept wird anhand von sechs verschiedenen ’Wrapper-Algorithmen’ für die Vorhersage von Uberlebenszeiten bei acht Brustkrebsstudien dargestellt. Zusätzlich wird ein Bootstrapverfahren
beschrieben, mit dessen Hilfe man mehrere realistische Datens ̈atze aus einer Menge von
solchen verwandten Prädiktionsproblemen generieren kann. Der letzte Teil beleuchtet
schließlich computationale Verfahren, die bei der Umsetzung der Analysen in dieser Dissertation eine tragende Rolle gespielt haben. Die Vorbearbeitungsschritte sowie die Evaluation der Prädiktionsmodelle erfordert die extensive Nutzung von Computerressourcen.
Es werden Ansätze zum parallelen Rechnen auf Cluster-, Cloud- und Hochleistungsrechen-
ressourcen unter der Verwendung der Programmiersprache R beschrieben. Die Benutzung
von heterogenen Hardwarearchitekturen, die Verarbeitung von großen Datensätzen sowie
die Entwicklung des R-Pakets survHD für die Analyse und Evaluierung von ’Wrapper-
Algorithmen’ zur Uberlebenszeitenanalyse
werden thematisiert
Parallel Asynchronous Matrix Multiplication for a Distributed Pipelined Neural Network
Machine learning is an approach to devise algorithms that compute an output without a given rule set but based on a self-learning concept. This approach is of great importance for several fields of applications in science and industry where traditional programming methods are not sufficient. In neural networks, a popular subclass of machine learning algorithms, commonly previous experience is used to train the network and produce good outputs for newly introduced inputs. By increasing the size of the network more complex problems can be solved which again rely on a huge amount of training data. Increasing the complexity also leads to higher computational demand and storage requirements and to the need for parallelization.
Several parallelization approaches of neural networks have already been considered. Most approaches use special purpose hardware whilst other work focuses on using standard hardware. Often these approaches target the problem by parallelizing the training data. In this work a new parallelization method named poadSGD is proposed for the parallelization of fully-connected, largescale feedforward networks on a compute cluster with standard hardware. poadSGD is based on the stochastic gradient descent algorithm. A block-wise distribution of the network's layers to groups of processes and a pipelining scheme for batches of the training samples are used. The network is updated asynchronously without interrupting ongoing computations of subsequent batches. For this task a one-sided communication scheme is used. A main algorithmic part of the batch-wise pipelined version consists of matrix multiplications which occur for a special distributed setup, where each matrix is held by a different process group.
GASPI, a parallel programming model from the field of "Partitioned Global Address Spaces" (PGAS) models is introduced and compared to other models from this class. As it mainly relies on one-sided and asynchronous communication it is a perfect candidate for the asynchronous update task in the poadSGD algorithm. Therefore, the matrix multiplication is also implemented based GASPI. In order to efficiently handle upcoming synchronizations within the process groups and achieve a good workload distribution, a two-dimensional block-cyclic data distribution is applied for the matrices. Based on this distribution, the multiplication algorithm is computed by diagonally iterating over the sub blocks of the resulting matrix and computing the sub blocks in subgroups of the processes. The sub blocks are computed by sharing the workload between the process groups and communicating mostly in pairs or in subgroups. The communication in pairs is set up to be overlapped by other ongoing computations. The implementations provide a special challenge, since the asynchronous communication routines must be handled with care as to which processor is working at what point in time with which data in order to prevent an unintentional dual use of data.
The theoretical analysis shows the matrix multiplication to be superior to a naive implementation when the dimension of the sub blocks of the matrices exceeds 382. The performance achieved in the test runs did not withstand the expectations the theoretical analysis predicted. The algorithm is executed on up to 512 cores and for matrices up to a size of 131,072 x 131,072.
The implementation using the GASPI API was found not be straightforward but to provide a good potential for overlapping communication with computations whenever the data dependencies of an application allow for it. The matrix multiplication was successfully implemented and can be used within an implementation of the poadSGD method that is yet to come. The poadSGD method seems to be very promising, especially as nowadays, with the larger amount of data and the increased complexity of the applications, the approaches to parallelization of neural networks are increasingly of interest
HPCCP/CAS Workshop Proceedings 1998
This publication is a collection of extended abstracts of presentations given at the HPCCP/CAS (High Performance Computing and Communications Program/Computational Aerosciences Project) Workshop held on August 24-26, 1998, at NASA Ames Research Center, Moffett Field, California. The objective of the Workshop was to bring together the aerospace high performance computing community, consisting of airframe and propulsion companies, independent software vendors, university researchers, and government scientists and engineers. The Workshop was sponsored by the HPCCP Office at NASA Ames Research Center. The Workshop consisted of over 40 presentations, including an overview of NASA's High Performance Computing and Communications Program and the Computational Aerosciences Project; ten sessions of papers representative of the high performance computing research conducted within the Program by the aerospace industry, academia, NASA, and other government laboratories; two panel sessions; and a special presentation by Mr. James Bailey
- …