thesis

Initializing Neural Networks Using Restricted Boltzmann Machines

Abstract

This thesis presents an approach to initialize the parameters of a discriminative feedforward neural network (FFN) model using the trained parameters of a generative classification Restricted Boltzmann machine (cRBM) model. The ultimate goal of FFN training is to obtain a network capable of making correct inferences on data not used in training. Selection of the FFN initialization is a critical step that results in trained networks with different parameters and abilities. Random selection is one simple method of parameter initialization. Unlike pretraining methods, this approach does not extract information from the training data, and the optimization does not guarantee that relevant parameters will result. Pretraining methods train generative models such as RBMs that define model parameters by learning about the training data structure using information based on clusters of points discovered in the data. This proposed method uses a cRBM that incorporates the class information in pretraining, determining a complete set of non-random FFN parameter initializations. Eliminating random initializations is one advantage in this approach over previous pretraining methods. This approach also uniquely alters the hidden layer bias parameters, compensating for differences in cRBM and FFN structure when adapting the cRBM parameters to the FFN. This alteration will be shown to provide meaningful parameters to the network by evaluating the network before training. Depending on the number of pretraining epochs and the influence of generative and discriminative methods in hybrid pretraining, the hidden layer bias adjustment allows initialized and untrained models to achieve a lower error range than corresponding models without the bias adjustment. Training FFNs with all parameters pretrained is capable of reducing the standard deviation of network errors from that of randomly initialized networks. One disadvantage of this proposed pretraining approach, as with many pretraining methods, is the necessity of two training phases compared to the single phase of backpropagation used for randomly initialized network

    Similar works