4,973 research outputs found
Wavelet Neural Networks: A Practical Guide
Wavelet networks (WNs) are a new class of networks which have been used with great success in a wide range of application. However a general accepted framework for applying WNs is missing from the literature. In this study, we present a complete statistical model identification framework in order to apply WNs in various applications. The following subjects were thorough examined: the structure of a WN, training methods, initialization algorithms, variable significance and variable selection algorithms, model selection methods and finally methods to construct confidence and prediction intervals. In addition the complexity of each algorithm is discussed. Our proposed framework was tested in two simulated cases, in one chaotic time series described by the Mackey-Glass equation and in three real datasets described by daily temperatures in Berlin, daily wind speeds in New York and breast cancer classification. Our results have shown that the proposed algorithms produce stable and robust results indicating that our proposed framework can be applied in various applications
BayesNAS: A Bayesian Approach for Neural Architecture Search
One-Shot Neural Architecture Search (NAS) is a promising method to
significantly reduce search time without any separate training. It can be
treated as a Network Compression problem on the architecture parameters from an
over-parameterized network. However, there are two issues associated with most
one-shot NAS methods. First, dependencies between a node and its predecessors
and successors are often disregarded which result in improper treatment over
zero operations. Second, architecture parameters pruning based on their
magnitude is questionable. In this paper, we employ the classic Bayesian
learning approach to alleviate these two issues by modeling architecture
parameters using hierarchical automatic relevance determination (HARD) priors.
Unlike other NAS methods, we train the over-parameterized network for only one
epoch then update the architecture. Impressively, this enabled us to find the
architecture on CIFAR-10 within only 0.2 GPU days using a single GPU.
Competitive performance can be also achieved by transferring to ImageNet. As a
byproduct, our approach can be applied directly to compress convolutional
neural networks by enforcing structural sparsity which achieves extremely
sparse networks without accuracy deterioration.Comment: International Conference on Machine Learning 201
Toward Optimal Run Racing: Application to Deep Learning Calibration
This paper aims at one-shot learning of deep neural nets, where a highly
parallel setting is considered to address the algorithm calibration problem -
selecting the best neural architecture and learning hyper-parameter values
depending on the dataset at hand. The notoriously expensive calibration problem
is optimally reduced by detecting and early stopping non-optimal runs. The
theoretical contribution regards the optimality guarantees within the multiple
hypothesis testing framework. Experimentations on the Cifar10, PTB and Wiki
benchmarks demonstrate the relevance of the approach with a principled and
consistent improvement on the state of the art with no extra hyper-parameter
Photometric redshifts for Quasars in multi band Surveys
MLPQNA stands for Multi Layer Perceptron with Quasi Newton Algorithm and it
is a machine learning method which can be used to cope with regression and
classification problems on complex and massive data sets. In this paper we give
the formal description of the method and present the results of its application
to the evaluation of photometric redshifts for quasars. The data set used for
the experiment was obtained by merging four different surveys (SDSS, GALEX,
UKIDSS and WISE), thus covering a wide range of wavelengths from the UV to the
mid-infrared. The method is able i) to achieve a very high accuracy; ii) to
drastically reduce the number of outliers and catastrophic objects; iii) to
discriminate among parameters (or features) on the basis of their significance,
so that the number of features used for training and analysis can be optimized
in order to reduce both the computational demands and the effects of
degeneracy. The best experiment, which makes use of a selected combination of
parameters drawn from the four surveys, leads, in terms of DeltaZnorm (i.e.
(zspec-zphot)/(1+zspec)), to an average of DeltaZnorm = 0.004, a standard
deviation sigma = 0.069 and a Median Absolute Deviation MAD = 0.02 over the
whole redshift range (i.e. zspec <= 3.6), defined by the 4-survey cross-matched
spectroscopic sample. The fraction of catastrophic outliers, i.e. of objects
with photo-z deviating more than 2sigma from the spectroscopic value is < 3%,
leading to a sigma = 0.035 after their removal, over the same redshift range.
The method is made available to the community through the DAMEWARE web
application.Comment: 38 pages, Submitted to ApJ in February 2013; Accepted by ApJ in May
201
- …