16 research outputs found
The Generalized Group Lasso
International Joint Conference on Neural Networks (IJCNN), celebrada en 2015 en Killarney, Ireland© 2015 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.In this paper the Generalized Lasso model of R. Tibshirani is extended to consider multidimensional features (or groups of features) à la Group Lasso, by substituting the ℓ 1 norm of the regularizer by the ℓ 2,1 norm. The resultant model is called Generalized Group Lasso (GenGL), and it contains as particular cases the already known Group Lasso and Group Fused Lasso (GFL), but also new models as the Graph-Guided Group Fused Lasso, or the trend filtering for multidimensional features. We show how to solve them efficiently combining FISTA iterations with the Proximal Operator of the corresponding regularizer, which we compute using a dual formulation. Moreover, GenGL makes possible to introduce a new approach to Group Total Variation, the regularizer of GFL, that results in a training much faster than that of previous methods.With partial support from Spain’s grants TIN2013-42351-
P and S2013/ICE-2845 CASI-CAM-CM, and of the UAM–
ADIC Chair for Data Science and Machine Learning. The
authors gratefully acknowledge the use of the facilities of
Centro de Computaci´on Cient´ıfica (CCC) at UAM
Enforcing Group Structure through the Group Fused Lasso
We introduce the Group Total Variation (GTV) regularizer, a modification
of Total Variation that uses the 2,1 norm instead of the 1 one to deal with multidimensional features. When used as the only regularizer, GTV can be applied jointly
with iterative convex optimization algorithms such as FISTA. This requires to compute its proximal operator which we derive using a dual formulation. GTV can also
be combined with a Group Lasso (GL) regularizer, leading to what we call Group
Fused Lasso (GFL) whose proximal operator can now be computed combining the
GTV and GL proximals through proximal Dykstra algorithm. We will illustrate how
to apply GFL in strongly structured but ill-posed regression problems as well as the
use of GTV to denoise colour images.Acknowledgements With partial support from Spain’s grant TIN2010-21575-C02-01 and the
UAM–ADIC Chair for Machine Learning
Convex Formulation for Kernel PCA and its Use in Semi-Supervised Learning
© 2017 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.In this brief, kernel principal component analysis (KPCA) is reinterpreted as the solution to a convex optimization problem. Actually, there is a constrained convex problem for each principal component, so that the constraints guarantee that the principal component is indeed a solution, and not a mere saddle point. Although these insights do not imply any algorithmic improvement, they can be used to further understand the method, formulate possible extensions, and properly address them. As an example, a new convex optimization problem for semisupervised classification is proposed, which seems particularly well suited whenever the number of known labels is small. Our formulation resembles a least squares support vector machine problem with a regularization parameter multiplied by a negative sign, combined with a variational principle for KPCA. Our primal optimization principle for semisupervised learning is solved in terms of the Lagrange multipliers. Numerical experiments in several classification tasks illustrate the performance of the proposed model in problems with only a few labeled data.The authors thank the following organizations. • EU: The research leading
to these results has received funding from the European Research Council
under the European Union’s Seventh Framework Programme (FP7/2007-2013)
/ ERC AdG A-DATADRIVE-B (290923). This paper reflects only the authors’
views, the Union is not liable for any use that may be made of the contained
information. • Research Council KUL: GOA/10/09 MaNet, CoE PFV/10/002
(OPTEC), BIL12/11T; PhD/Postdoc grants. • Flemish Government: – FWO:
G.0377.12 (Structured systems), G.088114N (Tensor based data similarity);
PhD/Postdoc grants. – IWT: SBO POM (100031); PhD/Postdoc grants.
• iMinds Medical Information Technologies SBO 2014. • Belgian Federal
Science Policy Office: IUAP P7/19 (DYSCO, Dynamical systems, control
and optimization, 2012-2017)
Convex Multi-Task Learning with Neural Networks
Multi-Task Learning aims at improving the learning process by solving different tasks simultaneously. The approaches to Multi-Task Learning can be categorized as feature-learning, regularization-based and combination strategies. Feature-learning approximations are more natural for deep models while regularization-based ones are usually designed for shallow ones, but we can see examples of both for shallow and deep models. However, the combination approach has been tested on shallow models exclusively. Here we propose a Multi-Task combination approach for Neural Networks, describe the training procedure, test it in four different multi-task image datasets and show improvements in the performance over other strategies.The authors acknowledge financial support from the European Regional Development
Fund and the Spanish State Research Agency of the Ministry of Economy,
Industry, and Competitiveness under the projects TIN2016-76406-P (AEI/FEDER,
UE) and PID2019-106827GB-I00. They also thank the UAM–ADIC Chair for Data
Science and Machine Learning and gratefully acknowledge the use of the facilities of
Centro de Computación Científica (CCC) at UAM
Group Fused Lasso
We introduce the Group Total Variation (GTV) regularizer, a modification of Total Variation that uses the ℓ2,1 norm instead of the ℓ1 one to deal with multidimensional features. When used as the only regularizer, GTV can be applied jointly with iterative convex optimization algorithms such as FISTA. This requires to compute its proximal operator which we derive using a dual formulation. GTV can also be combined with a Group Lasso (GL) regularizer, leading to what we call Group Fused Lasso (GFL) whose proximal operator can now be computed combining the GTV and GL proximals through Dykstra algorithm. We will illustrate how to apply GFL in strongly structured but ill-posed regression problems as well as the use of GTV to denoise colour images.With partial support from Spain's grant TIN2010-21575-
C02-01 and the UAM{ADIC Chair for Machine Learning. The rst author is
supported by the FPU{MEC grant AP2008-00167
Structure Learning in Deep Multi-Task Models
Multi-Task Learning (MTL) aims at improving the learning process by solving different tasks simultaneously. Two general approaches for neural MTL are hard and soft information sharing during training. Here we propose two new approaches to neural MTL. The first one uses a common model to enforce a soft sharing learning of the tasks considered. The second one adds a graph Laplacian term to a hard sharing neural model with the goal of detecting existing but a priori unknown task relations. We will test both tasks on real and synthetic datasets and show that either one can improve on other MTL neural models.The authors acknowledge support from the European Regional Development Fund
and the Spanish State Research Agency of the Ministry of Economy, Industry, and
Competitiveness under the project PID2019-106827GB-I00. They also thank the
UAM–ADIC Chair for Data Science and Machine Learning and gratefully acknowledge
the use of the facilities of Centro de Computación Científica (CCC) at UAM
Sparse methods for wind energy prediction
© 2012 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.International Joint Conference on Neural Networks (IJCNN), celebrado en 2012 en Brisbane, QLD, AustraliaIn this work we will analyze and apply to the prediction of wind energy some of the best known regularized linear regression algorithms, such as Ordinary Least Squares, Ridge Regression and, particularly, Lasso, Group Lasso and Elastic-Net that also seek to impose a certain degree of sparseness on the final models. To achieve this goal, some of them introduce a non-differentiable regularization term that requires special techniques to solve the corresponding optimization problem that will yield the final model. Proximal Algorithms have been recently introduced precisely to handle this kind of optimization problems, and so we will briefly review how to apply them in regularized linear regression. Moreover, the proximal method FISTA will be used when applying the non-differentiable models to the problem of predicting the global wind energy production in Spain, using as inputs numerical weather forecasts for the entire Iberian peninsula. Our results show how some of the studied sparsity-inducing models are able to produce a coherent selection of features, attaining similar performance to a baseline model using expert information, while making use of less data features.The authors of the paper acknowledge partial support from
grant TIN2010-21575-C02-01 of the TIN Subprogram from
Spain’s MICINN and of the C´atedra UAM-IIC en Modelado
y Predicci´on. The first author is also supported by the FPU–
MEC grant AP2008-00167. We also thank Red E´ectrica de
Espa˜na, Spain’s TSO, for providing historic wind energy dat
Sparse Linear Wind Farm Energy Forecast
In this work we will apply sparse linear regression methods to forecast wind farm energy production using numerical weather prediction (NWP) features over several pressure levels, a problem where pattern dimension can become very large. We shall place sparse regression in the context of proximal optimization, which we shall briefly review, and we shall show how sparse methods outperform other models while at the same time shedding light on the most relevant NWP features and on their predictive structure.With partial support from grant TIN2010-21575-C02-01
of Spain's Ministerio de Econom a y Competitividad and the UAM{ADIC Chair
for Machine Learning in Modelling and Prediction. The rst author is supported
by the FPU{MEC grant AP2008-00167. We thank our colleague Alvaro Barbero
for the software used in this work
Faster SVM training via conjugate SMO
We propose an improved version of the SMO algorithm for training classification and regression SVMs, based on a Conjugate Descent procedure. This new approach only involves a modest increase on the com- putational cost of each iteration but, in turn, usually results in a substantial decrease in the number of iterations required to converge to a given precision. Besides, we prove convergence of the iterates of this new Conjugate SMO as well as a linear rate when the kernel matrix is positive definite. We have im- plemented Conjugate SMO within the LIBSVM library and show experimentally that it is faster for many hyper-parameter configurations, being often a better option than second order SMO when performing a grid-search for SVM tuning
Functional diffusion maps
Nowadays many real-world datasets can be considered as functional, in the sense that the processes which generate them are continuous. A fundamental property of this type of data is that in theory they belong to an infinite-dimensional space. Although in practice we usually receive finite observations, they are still high-dimensional and hence dimensionality reduction methods are crucial. In this vein, the main state-of-the-art method for functional data analysis is Functional PCA. Nevertheless, this classic technique assumes that the data lie in a linear manifold, and hence it could have problems when this hypothesis is not fulfilled. In this research, attention has been placed on a non-linear manifold learning method: Diffusion Maps. The article explains how to extend this multivariate method to functional data and compares its behavior against Functional PCA over different simulated and real example