53 research outputs found
On building ensembles of stacked denoising auto-encoding classifiers and their further improvement
To aggregate diverse learners and to train deep architectures are the two principal avenues towards increasing the expressive capabilities of neural networks. Therefore, their combinations merit attention. In this contribution, we study how to apply some conventional diversity methods-bagging and label switching- to a general deep machine, the stacked denoising auto-encoding classifier, in order to solve a number of appropriately selected image recognition problems. The main conclusion of our work is that binarizing multi-class problems is the key to obtain benefit from those diversity methods. Additionally, we check that adding other kinds of performance improvement procedures, such as pre-emphasizing training samples and elastic distortion mechanisms, further increases the quality of the results. In particular, an appropriate combination of all the above methods leads us to reach a new absolute record in classifying MNIST handwritten digits. These facts reveal that there are clear opportunities for designing more powerful classifiers by means of combining different improvement techniques. (C) 2017 Elsevier B.V. All rights reserved.This work has been partly supported by research grants CASI- CAM-CM ( S2013/ICE-2845, Madrid Community) and Macro-ADOBE (TEC2015-67719, MINECO-FEDER EU), as well as by the research network DAMA ( TIN2015-70308-REDT, MINECO )
Pre-emphasizing Binarized Ensembles to Improve Classification Performance
14th International Work-Conference on Artificial Neural Networks, IWANN 2017Machine ensembles are learning architectures that offer high expressive capacities and, consequently, remarkable performances. This is due to their high number of trainable parameters.In this paper, we explore and discuss whether binarization techniques are effective to improve standard diversification methods and if a simple additional trick, consisting in weighting the training examples, allows to obtain better results. Experimental results, for three selected classification problems, show that binarization permits that standard direct diversification methods (bagging, in particular) achieve better results, obtaining even more significant performance improvements when pre-emphasizing the training samples. Some research avenues that this finding opens are mentioned in the conclusions.This work has been partly supported by research grants CASI-CAM-CM (S2013/ICE-2845, DGUI-CM and FEDER) and Macro-ADOBE (TEC2015-67719-P, MINECO)
Designing mixture of deep experts
Mixture of Experts (MoE) is a classical architecture for ensembles where each
member is specialised in a given part of the input space or its expertise area.
Working in this manner, we aim to specialise the experts on smaller problems,
solving the original problem through some type of divide and conquer approach.
The goal of our research is to initially reproduce the work done by Collobert et
al[1] , 2002 followed by extending this work by using neural networks as experts
on different datasets. Specialised representations will be learned over different
aspects of the problem, and the results of the different members will be merged
according to their specific expertise. This expertise can then be learned itself by a
given network acting as a gating function.
MOE architecture composed on N expert networks. These experts are combined
via a gating network, which partition the input space accordingly. It is based on
divide and conquer strategy supervised by a gating network. Using a specialised
cost function the experts specialise in their sub-space. Using the discriminative
power of experts is much better than simply clustering. The gating network needs
to needs to learn how to assign examples to different specialists.
Such models show promise for building larger networks that are still cheap to
compute at test time, and more parallelizable at training time. We were able to
reproduce the work by the author and implemented a multi-class gater to classify
images.
We know that Neural Networks perform the best with lots of data. However,
some of our experiments require us to divide the dataset and train multiple Neural
Networks. We observe that in data deprived condition our MoE are almost on
par and compete with ensembles trained on complete data.
Keywords : Machine Learning, Multi Layer Perceptrons, Mixture of Experts, Support
Vector Machines, Divide and Conquer, Stochastic Gradient Descent, Optimization
Assessment of Cross-train Machine Learning Techniques for QoT-Estimation in agnostic Optical Networks
With the evolution of 5G technology, high definition video, virtual reality, and the internet of things (IoT), the demand for high capacity optical networks has been increasing dramatically. To support the capacity demand, low-margin optical networks engage operator interest. To engross this techno-economic interest, planning tools with higher accuracy and accurate models for the quality of transmission estimation (QoT-E) are needed. However, considering the state-of-the-art optical network’s heterogeneity, it is challenging to develop such an accurate planning tool and low-margin QoT-E models using the traditional analytical approach. Fortunately, data-driven machine-learning (ML) cognition provides a promising path. This paper reports the use of cross-trained ML-based learning methods to predict the QoT of an un-established lightpath (LP) in an agnostic network based on the retrieved data from already established LPs of an in-service network. This advanced prediction of the QoT of un-established LP in an agnostic network is a key enabler not only for the optimal planning of this network but it also provides the opportunity to automatically deploy the LPs with a minimum margin in a reliable manner. The QoT metric of the LPs are defined by the generalized signal-to-noise ratio (GSNR), which includes the effect of both amplified spontaneous emission (ASE) noise and non-linear interference (NLI) accumulation. The real field data is mimicked by using a well reliable and tested network simulation tool GNPy. Using the generated synthetic data set, supervised ML techniques such as wide deep neural network, deep neural network, multi-layer perceptron regressor, boasted tree regressor, decision tree regressor, and random forest regressor are applied, demonstrating the GSNR prediction of an un-established LP in an agnostic network with a maximum error of 0.40 dB
Recommended from our members
Understanding and improving error-correcting output coding
Error-correcting output coding (ECOC) is a method for converting a k-classsupervised learning problem into a large number L of two-class supervised learningproblems and then combining the results of these L evaluations. Previous researchhas shown that ECOC can dramatically improve the classi cation accuracy of supervisedlearning algorithms that learn to classify data points into one of k 2 classes.An investigation of why the ECOC technique works, particularly when employedwith decision tree learning algorithms, is presented.It is shown that the ECOC method is a compact form of voting amongmultiple hypotheses. The success of the voting depends on that the errors committedby each of the L learned binary functions are substantially uncorrelated.By employing the statistical notions of bias and variance, the generalizationerrors of ECOC are decomposed into bias and variance errors. Like any votingmethod, ECOC reduces variance errors. However, unlike homogeneous voting, whichsimply combines multiple runs of the same learning algorithm, ECOC can alsoreduce bias errors. It is shown that the bias errors in the individual functions are uncorrelated and that this results from non-local behavior of the learning algorithmin splitting the feature space.ECOC is also extended to provide class probability information. The problemof computing these class probabilities can be formulated as an over-constrained systemof linear equations. Least squares methods are applied to solve these equations.Accuracy of the posterior probabilities is demonstrated with overlapping classes anda simple reject option task
Diversidad en aprendizaje profundo por auto-codificación
El diseño de aprendices profundos generales se ha mantenido como reto durante décadas. En el siglo actual se está produciendo la aparición de varios nuevos –y eficaces– procedimientos para ello. Esos procedimientos incluyen los métodos representacionales, que merecen especial atención porque no solo permiten construir máquinas potentes, sino que también extraen relevantes rasgos de alto nivel de las observaciones. Los auto-codificadores expansivos reductores de ruido son (elementos de) una de las familias de máquinas representacionales profundas.
Por otra parte, los conjuntos son una alternativa sólidamente establecida para
conseguir soluciones con altas prestaciones para problemas empíricos –basados en muestras– de inferencia. Se valen de la introducción de diversidad en un grupo de aprendices. Obviamente, este es un principio que también puede aplicarse a redes neuronales profundas; pero, sorprendentemente, hay muy pocos estudios que exploran esta posibilidad.
En esta disertación doctoral se investiga si las técnicas convencionales de diversificación –incluyendo la binarización en el caso de bases de datos multiclase– permiten mejorar las prestaciones de clasificadores basados en auto-codificadores expansivos con reducción de ruido. Se usan tanto “Bagging” como “Switching”, junto con esquemas de binarización uno-contra-uno y de códigos de salida correctores de errores, sobre dos tipos básicos de arquitecturas: T, que tiene una unidad de auto-codificación común, y G, que también diversifica ese elemento representacional. Los resultados experimentales confirman que –si se incluye la binarización– la combinación de diversidad y profundidad conduce a mejores prestaciones, especialmente con
las arquitecturas T.
Para completar la exploración sobre posibles mejoras, se analiza también la aplicación de formas flexibles de pre-énfasis. Tales formas proporcionan por sí solas mejoras de prestaciones, pero las mejoras son muy importantes cuando el pre-énfasis se combina con la diversificación, en especial si se emplean diferentes parámetros de pre-énfasis a diferentes dicotomías en los problemas multiclase. Una distorsión elástica convencional permite alcanzar resultados récord.
Estos resultados no son tan solo relevantes “per se”, sino que abren una vía de
prometedoras líneas de investigación, las cuales se exponen en el capítulo final de esta tesis.Designing general deep learners has remained as a challenge along decades. The present century sees the emergence of several new effective procedures for it. Among them, representational methods merit particular attention, because they not only serve to build powerful machines, but also extract relevant high-level features of the observations. Expansive denoising auto-encoders are (elements of) one of such representational deep machine families.
On the other hand, ensembles are a well established alternative to get high performance solutions for empirical –sample based– inference problems. They are principled on introducing diversity in a number of different learners. Obviously, this is a principle which can also be applied to deep neural networks, but, surprisingly, there are very few studies exploring this possibility.
In this doctoral dissertation, we investigate if conventional diversification techniques –including binarization for multiclass databases– further improve the performance of expansive denoising auto-encoder based classifiers. Both “Bagging” and “Switching” are used, as well as one-versus-one and error-correcting-output-code binarization schemes, with two basic types of architectures: T, which has a common auto-encoding unit, and G, which also diversifies that representational element. The experimental results confirm that –if binarization is included– combining diversity and depth offers significant performance advantages, specially with T architectures.
To complete the exploration on improving denoising auto-encoding based classifiers, the application of flexible enough pre-emphasis functions is also analyzed. Using this kind of pre-emphasis provides performance advantages by itself, but the advantages are very important when pre-emphasis is combined with diversification, specially if different emphasis parameters are applied to different dichotomies in multiclass problems. A conventional elastic distortion allows record results.
These results are not only relevant by themselves, but they open a series of
promising research avenues, that are presented in the final chapter of this thesis.Programa Oficial de Doctorado en Multimedia y ComunicacionesPresidente: Antonio Artés Rodríguez.- Secretario: Sancho Salcedo Sanz.- Vocal: Pedro Antonio Gutiérrez Peñ
- …