475 research outputs found
Does generalization performance of regularization learning depend on ? A negative example
-regularization has been demonstrated to be an attractive technique in
machine learning and statistical modeling. It attempts to improve the
generalization (prediction) capability of a machine (model) through
appropriately shrinking its coefficients. The shape of a estimator
differs in varying choices of the regularization order . In particular,
leads to the LASSO estimate, while corresponds to the smooth
ridge regression. This makes the order a potential tuning parameter in
applications. To facilitate the use of -regularization, we intend to
seek for a modeling strategy where an elaborative selection on is
avoidable. In this spirit, we place our investigation within a general
framework of -regularized kernel learning under a sample dependent
hypothesis space (SDHS). For a designated class of kernel functions, we show
that all estimators for attain similar generalization
error bounds. These estimated bounds are almost optimal in the sense that up to
a logarithmic factor, the upper and lower bounds are asymptotically identical.
This finding tentatively reveals that, in some modeling contexts, the choice of
might not have a strong impact in terms of the generalization capability.
From this perspective, can be arbitrarily specified, or specified merely by
other no generalization criteria like smoothness, computational complexity,
sparsity, etc..Comment: 35 pages, 3 figure
In-network Sparsity-regularized Rank Minimization: Algorithms and Applications
Given a limited number of entries from the superposition of a low-rank matrix
plus the product of a known fat compression matrix times a sparse matrix,
recovery of the low-rank and sparse components is a fundamental task subsuming
compressed sensing, matrix completion, and principal components pursuit. This
paper develops algorithms for distributed sparsity-regularized rank
minimization over networks, when the nuclear- and -norm are used as
surrogates to the rank and nonzero entry counts of the sought matrices,
respectively. While nuclear-norm minimization has well-documented merits when
centralized processing is viable, non-separability of the singular-value sum
challenges its distributed minimization. To overcome this limitation, an
alternative characterization of the nuclear norm is adopted which leads to a
separable, yet non-convex cost minimized via the alternating-direction method
of multipliers. The novel distributed iterations entail reduced-complexity
per-node tasks, and affordable message passing among single-hop neighbors.
Interestingly, upon convergence the distributed (non-convex) estimator provably
attains the global optimum of its centralized counterpart, regardless of
initialization. Several application domains are outlined to highlight the
generality and impact of the proposed framework. These include unveiling
traffic anomalies in backbone networks, predicting networkwide path latencies,
and mapping the RF ambiance using wireless cognitive radios. Simulations with
synthetic and real network data corroborate the convergence of the novel
distributed algorithm, and its centralized performance guarantees.Comment: 30 pages, submitted for publication on the IEEE Trans. Signal Proces
Tensor Networks for Dimensionality Reduction and Large-Scale Optimizations. Part 2 Applications and Future Perspectives
Part 2 of this monograph builds on the introduction to tensor networks and
their operations presented in Part 1. It focuses on tensor network models for
super-compressed higher-order representation of data/parameters and related
cost functions, while providing an outline of their applications in machine
learning and data analytics. A particular emphasis is on the tensor train (TT)
and Hierarchical Tucker (HT) decompositions, and their physically meaningful
interpretations which reflect the scalability of the tensor network approach.
Through a graphical approach, we also elucidate how, by virtue of the
underlying low-rank tensor approximations and sophisticated contractions of
core tensors, tensor networks have the ability to perform distributed
computations on otherwise prohibitively large volumes of data/parameters,
thereby alleviating or even eliminating the curse of dimensionality. The
usefulness of this concept is illustrated over a number of applied areas,
including generalized regression and classification (support tensor machines,
canonical correlation analysis, higher order partial least squares),
generalized eigenvalue decomposition, Riemannian optimization, and in the
optimization of deep neural networks. Part 1 and Part 2 of this work can be
used either as stand-alone separate texts, or indeed as a conjoint
comprehensive review of the exciting field of low-rank tensor networks and
tensor decompositions.Comment: 232 page
Tensor Networks for Dimensionality Reduction and Large-Scale Optimizations. Part 2 Applications and Future Perspectives
Part 2 of this monograph builds on the introduction to tensor networks and
their operations presented in Part 1. It focuses on tensor network models for
super-compressed higher-order representation of data/parameters and related
cost functions, while providing an outline of their applications in machine
learning and data analytics. A particular emphasis is on the tensor train (TT)
and Hierarchical Tucker (HT) decompositions, and their physically meaningful
interpretations which reflect the scalability of the tensor network approach.
Through a graphical approach, we also elucidate how, by virtue of the
underlying low-rank tensor approximations and sophisticated contractions of
core tensors, tensor networks have the ability to perform distributed
computations on otherwise prohibitively large volumes of data/parameters,
thereby alleviating or even eliminating the curse of dimensionality. The
usefulness of this concept is illustrated over a number of applied areas,
including generalized regression and classification (support tensor machines,
canonical correlation analysis, higher order partial least squares),
generalized eigenvalue decomposition, Riemannian optimization, and in the
optimization of deep neural networks. Part 1 and Part 2 of this work can be
used either as stand-alone separate texts, or indeed as a conjoint
comprehensive review of the exciting field of low-rank tensor networks and
tensor decompositions.Comment: 232 page
Local Rademacher Complexity-based Learning Guarantees for Multi-Task Learning
We show a Talagrand-type concentration inequality for Multi-Task Learning
(MTL), using which we establish sharp excess risk bounds for MTL in terms of
distribution- and data-dependent versions of the Local Rademacher Complexity
(LRC). We also give a new bound on the LRC for norm regularized as well as
strongly convex hypothesis classes, which applies not only to MTL but also to
the standard i.i.d. setting. Combining both results, one can now easily derive
fast-rate bounds on the excess risk for many prominent MTL methods,
including---as we demonstrate---Schatten-norm, group-norm, and
graph-regularized MTL. The derived bounds reflect a relationship akeen to a
conservation law of asymptotic convergence rates. This very relationship allows
for trading off slower rates w.r.t. the number of tasks for faster rates with
respect to the number of available samples per task, when compared to the rates
obtained via a traditional, global Rademacher analysis.Comment: In this version, some arguments and results (of the previous version)
have been corrected, or modifie
Sparse Iterative Learning Control with Application to a Wafer Stage: Achieving Performance, Resource Efficiency, and Task Flexibility
Trial-varying disturbances are a key concern in Iterative Learning Control
(ILC) and may lead to inefficient and expensive implementations and severe
performance deterioration. The aim of this paper is to develop a general
framework for optimization-based ILC that allows for enforcing additional
structure, including sparsity. The proposed method enforces sparsity in a
generalized setting through convex relaxations using norms. The
proposed ILC framework is applied to the optimization of sampling sequences for
resource efficient implementation, trial-varying disturbance attenuation, and
basis function selection. The framework has a large potential in control
applications such as mechatronics, as is confirmed through an application on a
wafer stage.Comment: 12 pages, 14 figure
HIGH DIMENSIONAL VARIABLE SELECTION VIA PENALIZED LIKELIHOOD FOR GENERALIZED LINEAR MODELS
Variable selection is fundamental to high dimensional statistical modeling. In this study, penalized likelihood methods are examined to simultaneously estimate parameters and select variables for generalized linear models. We focus on the variable selection and parameter estimation properties rather than the prediction properties of the estimators and are more interested in situations where the number of parameters diverges with the sample size. We prove the parameter estimation consistency of several widely used penalized likelihood estimators for generalized linear models. We define the relaxed sense and prove that it loosens the regularity and sparsity conditions of the parameter estimation and variable selection consistency. We propose a bootstrap method that can greatly improve the variable selection performances and reduce false discovery rates. We conduct simulation studies to compare the variable selection and parameter estimation properties of these penalized likelihood estimators for logistic models. We then illustrate our methods on gene expression data
- …