27 research outputs found
Stacking-based Deep Neural Network: Deep Analytic Network on Convolutional Spectral Histogram Features
Stacking-based deep neural network (S-DNN), in general, denotes a deep neural
network (DNN) resemblance in terms of its very deep, feedforward network
architecture. The typical S-DNN aggregates a variable number of individually
learnable modules in series to assemble a DNN-alike alternative to the targeted
object recognition tasks. This work likewise devises an S-DNN instantiation,
dubbed deep analytic network (DAN), on top of the spectral histogram (SH)
features. The DAN learning principle relies on ridge regression, and some key
DNN constituents, specifically, rectified linear unit, fine-tuning, and
normalization. The DAN aptitude is scrutinized on three repositories of varying
domains, including FERET (faces), MNIST (handwritten digits), and CIFAR10
(natural objects). The empirical results unveil that DAN escalates the SH
baseline performance over a sufficiently deep layer.Comment: 5 page
Improving large vocabulary continuous speech recognition by combining GMM-based and reservoir-based acoustic modeling
In earlier work we have shown that good phoneme recognition is possible with a so-called reservoir, a special type of recurrent neural network. In this paper, different architectures based on Reservoir Computing (RC) for large vocabulary continuous speech recognition are investigated. Besides experiments with HMM hybrids, it is shown that a RC-HMM tandem can achieve the same recognition accuracy as a classical HMM, which is a promising result for such a fairly new paradigm. It is also demonstrated that a state-level combination of the scores of the tandem and the baseline HMM leads to a significant improvement over the baseline. A word error rate reduction of the order of 20\% relative is possible
Predicting Parameters in Deep Learning
We demonstrate that there is significant redundancy in the parameterization
of several deep learning models. Given only a few weight values for each
feature it is possible to accurately predict the remaining values. Moreover, we
show that not only can the parameter values be predicted, but many of them need
not be learned at all. We train several different architectures by learning
only a small number of weights and predicting the rest. In the best case we are
able to predict more than 95% of the weights of a network without any drop in
accuracy
Stacking-Based Deep Neural Network: Deep Analytic Network for Pattern Classification
Stacking-based deep neural network (S-DNN) is aggregated with pluralities of
basic learning modules, one after another, to synthesize a deep neural network
(DNN) alternative for pattern classification. Contrary to the DNNs trained end
to end by backpropagation (BP), each S-DNN layer, i.e., a self-learnable
module, is to be trained decisively and independently without BP intervention.
In this paper, a ridge regression-based S-DNN, dubbed deep analytic network
(DAN), along with its kernelization (K-DAN), are devised for multilayer feature
re-learning from the pre-extracted baseline features and the structured
features. Our theoretical formulation demonstrates that DAN/K-DAN re-learn by
perturbing the intra/inter-class variations, apart from diminishing the
prediction errors. We scrutinize the DAN/K-DAN performance for pattern
classification on datasets of varying domains - faces, handwritten digits,
generic objects, to name a few. Unlike the typical BP-optimized DNNs to be
trained from gigantic datasets by GPU, we disclose that DAN/K-DAN are trainable
using only CPU even for small-scale training sets. Our experimental results
disclose that DAN/K-DAN outperform the present S-DNNs and also the BP-trained
DNNs, including multiplayer perceptron, deep belief network, etc., without data
augmentation applied.Comment: 14 pages, 7 figures, 11 table
On Distributed Deep Network for Processing Large-Scale Sets of Complex Data
Recent work in unsupervised feature learning and deep learning has shown that being able to train large models can dramatically improve performance. In this paper, we consider the problem of training a deep network with hundreds of parameters using distributed CPU cores. We have developed Bagging-Down SGD algorithm to solve the distributing problems. Bagging-Down SGD introduces the parameter server adding on the several model replicas, and separates the updating and the training computing to accelerate the whole system. We have successfully used our system to train a distributed deep network, and achieve state-of-the-art performance on MINIST, a visual handwriting font library. We show that these techniques dramatically accelerate the training of this kind of distributed deep network