221 research outputs found
Elastic-Net Regularization in Learning Theory
Within the framework of statistical learning theory we analyze in detail the
so-called elastic-net regularization scheme proposed by Zou and Hastie for the
selection of groups of correlated variables. To investigate on the statistical
properties of this scheme and in particular on its consistency properties, we
set up a suitable mathematical framework. Our setting is random-design
regression where we allow the response variable to be vector-valued and we
consider prediction functions which are linear combination of elements ({\em
features}) in an infinite-dimensional dictionary. Under the assumption that the
regression function admits a sparse representation on the dictionary, we prove
that there exists a particular ``{\em elastic-net representation}'' of the
regression function such that, if the number of data increases, the elastic-net
estimator is consistent not only for prediction but also for variable/feature
selection. Our results include finite-sample bounds and an adaptive scheme to
select the regularization parameter. Moreover, using convex analysis tools, we
derive an iterative thresholding algorithm for computing the elastic-net
solution which is different from the optimization procedure originally proposed
by Zou and HastieComment: 32 pages, 3 figure
The Social Sustainability of the Infrastructures: A Case Study in the Liguria Region
One of the indicators that measures the economic development of a territory is its infrastructural endowment (road, rail, etc.). The presence of roads, railways, and airports are essential elements in creating the optimal conditions for the establishment or development of productive activities and economic growth; and also to generate benefits. However, the presence of infrastructure can have strong impacts on the environment and the living conditions of the population and infrastructure can be subject to actions related to contrast and opposition. Therefore, in parallel with the economic and environmental sustainability assessment, it is essential to decide whether or not to build new infrastructure. In addition, social sustainability is also pursued on the basis of an assessment that takes into account various aspects that relate the work to the population, also in order to identify the most satisfactory design solution. Alongside the adopted methodology, the assessment must be identified suitable criteria which are capable of taking into account the various impacts generated by the infrastructure, not only of an economic and environmental type, but also social and attributed relative importance (or weight) that is congruous with the correct balance of the three aspects of sustainability. This contribution deals with the identification of criteria for assessing the social sustainability of infrastructure projects, by taking as reference the 24 infrastructure projects in the planning and construction phase in the Liguria Region that make use of the Regional Law n. 39/2007 on the "Regional Strategic Intervention Programs-P.R.I.S." (Regional Strategic Intervention Programs); which guarantees citizens affected by the infrastructure. In this research work, the selection is performed through the involvement of local stakeholders as well as the subjects and institutions that operate within the decision-making process of a work (designers, technicians from public administrations). The selected criteria are then weighted through the pairwise comparison method used in the multi-criteria technique of ThomasSaaty-Analytic Hierarchy Process (AHP). The goal is to identify the useful criteria for assessing social sustainability and the weights attributed by the various parties involved in the decision-making process by citizens directly or indirectly affected by the infrastructure
Statistical and Computational Trade-Offs in Kernel K-Means
We investigate the efficiency of k-means in terms of both statistical and computational requirements. More precisely, we study a Nystrom approach to kernel k-means. We analyze the statistical properties of the proposed method and show that it achieves the same accuracy of exact kernel k-means with only a fraction of computations. Indeed, we prove under basic assumptions that sampling
oot pn Nystrom landmarks allows to greatly reduce computational costs without incurring in any loss of accuracy. To the best of our knowledge this is the first result of this kind for unsupervised learning
Learning with SGD and Random Features
Sketching and stochastic gradient methods are arguably the most common techniques to derive efficient large scale learning algorithms. In this paper, we investigate their application in the context of nonparametric statistical learning. More precisely, we study the estimator defined by stochastic gradient with mini batches and random features. The latter can be seen as form of nonlinear sketching and used to define approximate kernel methods. The considered estimator is not explicitly penalized/constrained and regularization is implicit. Indeed, our study highlights how different parameters, such as number of features, iterations, step-size and mini-batch size control the learning properties of the solutions. We do this by deriving optimal finite sample bounds, under standard assumptions. The obtained results are corroborated and illustrated by numerical experiments
Adaptive Kernel Methods Using the Balancing Principle
The regularization parameter choice is a fundamental problem in supervised learning since the performance of most algorithms crucially depends on the choice of one or more of such parameters. In particular a main theoretical issue regards the amount of prior knowledge on the problem needed to suitably choose the regularization parameter and obtain learning rates. In this paper we present a strategy, the balancing principle, to choose the regularization parameter without knowledge of the regularity of the target function. Such a choice adaptively achieves the best error rate. Our main result applies to regularization algorithms in reproducing kernel Hilbert space with the square loss, though we also study how a similar principle can be used in other situations. As a straightforward corollary we can immediately derive adaptive parameter choice for various kernel methods recently studied. Numerical experiments with the proposed parameter choice rules are also presented
Convergence of the forward-backward algorithm: beyond the worst-case with the help of geometry
We provide a comprehensive study of the convergence of the forward-backward algorithm under suitable geometric conditions, such as conditioning or Łojasiewicz properties. These geometrical notions are usually local by nature, and may fail to describe the fine geometry of objective
functions relevant in inverse problems and signal processing, that have a nice behaviour on manifolds, or sets open with respect to a weak topology. Motivated by this observation, we revisit those
geometric notions over arbitrary sets. In turn, this allows us to present several new results as well
as collect in a unified view a variety of results scattered in the literature. Our contributions include
the analysis of infinite dimensional convex minimization problems, showing the first Łojasiewicz
inequality for a quadratic function associated to a compact operator, and the derivation of new linear rates for problems arising from inverse problems with low-complexity priors. Our approach
allows to establish unexpected connections between geometry and a priori conditions in inverse
problems, such as source conditions, or restricted isometry properties
On Fast Leverage Score Sampling and Optimal Learning
Leverage score sampling provides an appealing way to perform approximate computations for large matrices. Indeed, it allows to derive faithful approximations with a complexity adapted to the problem at hand. Yet, performing leverage scores sampling is a challenge in its own right requiring further approximations. In this paper, we study the problem of leverage score sampling for positive definite matrices defined by a kernel. Our contribution is twofold. First we provide a novel algorithm for leverage score sampling and second, we exploit the proposed method in statistical learning by deriving a novel solver for kernel ridge regression. Our main technical contribution is showing that the proposed algorithms are currently the most efficient and accurate for these problems
Speeding-up Object Detection Training for Robotics with FALKON
Latest deep learning methods for object detection provide remarkable performance, but have limits when used in robotic applications. One of the most relevant issues is the long training time, which is due to the large size and imbalance of the associated training sets, characterized by few positive and a large number of negative examples (i.e. background). Proposed approaches are based on end-to-end learning by back-propagation [22] or kernel methods trained with Hard Negatives Mining on top of deep features [8]. These solutions are effective, but prohibitively slow for on-line applications.In this paper we propose a novel pipeline for object detection that overcomes this problem and provides comparable performance, with a 60x training speedup. Our pipeline combines (i) the Region Proposal Network and the deep feature extractor from [22] to efficiently select candidate RoIs and encode them into powerful representations, with (ii) the FALKON [23] algorithm, a novel kernel-based method that allows fast training on large scale problems (millions of points). We address the size and imbalance of training data by exploiting the stochastic subsampling intrinsic into the method and a novel, fast, bootstrapping approach.We assess the effectiveness of the approach on a standard Computer Vision dataset (PASCAL VOC 2007 [5]) and demonstrate its applicability to a real robotic scenario with the iCubWorld Transformations [18] dataset
Manifold Structured Prediction
Structured prediction provides a general framework to deal with supervised problems where the outputs have semantically rich structure. While classical approaches
consider finite, albeit potentially huge, output spaces, in this paper we discuss how
structured prediction can be extended to a continuous scenario. Specifically, we
study a structured prediction approach to manifold valued regression. We characterize a class of problems for which the considered approach is statistically consistent
and study how geometric optimization can be used to compute the corresponding
estimator. Promising experimental results on both simulated and real data complete
our stud
Structured Prediction for CRISP Inverse Kinematics Learning With Misspecified Robot Models
With the recent advances in machine learning, problems that traditionally would require accurate modeling to be solved analytically can now be successfully approached with data-driven strategies. Among these, computing the inverse kinematics of a redundant robot arm poses a significant challenge due to the non-linear structure of the robot, the hard joint constraints and the non-invertible kinematics map. Moreover, most learning algorithms consider a completely data-driven approach, while often useful information on the structure of the robot is available and should be positively exploited. In this work, we present a simple, yet effective, approach for learning the inverse kinematics. We introduce a structured prediction algorithm that combines a data-driven strategy with the model provided by a forward kinematics function – even when this function is misspecified – to accurately solve the problem. The proposed approach ensures that predicted joint configurations are well within the robot's constraints. We also provide statistical guarantees on the generalization properties of our estimator as well as an empirical evaluation of its performance on trajectory reconstruction tasks
- …