Search CORE

475 research outputs found

Does generalization performance of $l^q$ regularization learning depend on $q$ ? A negative example

Author: Fang Jian
Lin Shaobo
Xu Chen
Zeng Jingshan
Publication venue
Publication date: 24/07/2013
Field of study

l^q

-regularization has been demonstrated to be an attractive technique in machine learning and statistical modeling. It attempts to improve the generalization (prediction) capability of a machine (model) through appropriately shrinking its coefficients. The shape of a

l^q

estimator differs in varying choices of the regularization order

q

. In particular,

l^1

leads to the LASSO estimate, while

l^{2}

corresponds to the smooth ridge regression. This makes the order

q

a potential tuning parameter in applications. To facilitate the use of

l^{q}

-regularization, we intend to seek for a modeling strategy where an elaborative selection on

q

is avoidable. In this spirit, we place our investigation within a general framework of

l^{q}

-regularized kernel learning under a sample dependent hypothesis space (SDHS). For a designated class of kernel functions, we show that all

l^{q}

estimators for

0< q < \infty

attain similar generalization error bounds. These estimated bounds are almost optimal in the sense that up to a logarithmic factor, the upper and lower bounds are asymptotically identical. This finding tentatively reveals that, in some modeling contexts, the choice of

q

might not have a strong impact in terms of the generalization capability. From this perspective,

q

can be arbitrarily specified, or specified merely by other no generalization criteria like smoothness, computational complexity, sparsity, etc..Comment: 35 pages, 3 figure

arXiv.org e-Print Archive

In-network Sparsity-regularized Rank Minimization: Algorithms and Applications

Author: Giannakis Georgios B.
Mardani Morteza
Mateos Gonzalo
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 07/03/2012
Field of study

Given a limited number of entries from the superposition of a low-rank matrix plus the product of a known fat compression matrix times a sparse matrix, recovery of the low-rank and sparse components is a fundamental task subsuming compressed sensing, matrix completion, and principal components pursuit. This paper develops algorithms for distributed sparsity-regularized rank minimization over networks, when the nuclear- and

\ell_1

-norm are used as surrogates to the rank and nonzero entry counts of the sought matrices, respectively. While nuclear-norm minimization has well-documented merits when centralized processing is viable, non-separability of the singular-value sum challenges its distributed minimization. To overcome this limitation, an alternative characterization of the nuclear norm is adopted which leads to a separable, yet non-convex cost minimized via the alternating-direction method of multipliers. The novel distributed iterations entail reduced-complexity per-node tasks, and affordable message passing among single-hop neighbors. Interestingly, upon convergence the distributed (non-convex) estimator provably attains the global optimum of its centralized counterpart, regardless of initialization. Several application domains are outlined to highlight the generality and impact of the proposed framework. These include unveiling traffic anomalies in backbone networks, predicting networkwide path latencies, and mapping the RF ambiance using wireless cognitive radios. Simulations with synthetic and real network data corroborate the convergence of the novel distributed algorithm, and its centralized performance guarantees.Comment: 30 pages, submitted for publication on the IEEE Trans. Signal Proces

arXiv.org e-Print Archive

CiteSeerX

Crossref

Tensor Networks for Dimensionality Reduction and Large-Scale Optimizations. Part 2 Applications and Future Perspectives

Author: Cichocki A.
Lee N.
Mandic D.
Oseledets I. V.
Phan A-H.
Sugiyama M.
Zhao Q.
Publication venue: 'Now Publishers'
Publication date: 01/01/2017
Field of study

Part 2 of this monograph builds on the introduction to tensor networks and their operations presented in Part 1. It focuses on tensor network models for super-compressed higher-order representation of data/parameters and related cost functions, while providing an outline of their applications in machine learning and data analytics. A particular emphasis is on the tensor train (TT) and Hierarchical Tucker (HT) decompositions, and their physically meaningful interpretations which reflect the scalability of the tensor network approach. Through a graphical approach, we also elucidate how, by virtue of the underlying low-rank tensor approximations and sophisticated contractions of core tensors, tensor networks have the ability to perform distributed computations on otherwise prohibitively large volumes of data/parameters, thereby alleviating or even eliminating the curse of dimensionality. The usefulness of this concept is illustrated over a number of applied areas, including generalized regression and classification (support tensor machines, canonical correlation analysis, higher order partial least squares), generalized eigenvalue decomposition, Riemannian optimization, and in the optimization of deep neural networks. Part 1 and Part 2 of this work can be used either as stand-alone separate texts, or indeed as a conjoint comprehensive review of the exciting field of low-rank tensor networks and tensor decompositions.Comment: 232 page

arXiv.org e-Print Archive

Crossref

Tensor Networks for Dimensionality Reduction and Large-Scale Optimizations. Part 2 Applications and Future Perspectives

Author: Cichocki A.
Phan A-H.
Zhao Q.
Lee N.
Oseledets I. V.
Sugiyama M.
Mandic D.
Publication venue
Publication date: 01/01/2017
Field of study

arXiv.org e-Print Archive

Crossref

FigShare

Local Rademacher Complexity-based Learning Guarantees for Multi-Task Learning

Author: Anagnostopoulos Georgios
Kloft Marius
Lei Yunwen
Mollaghasemi Mansooreh
Yousefi Niloofar
Publication venue
Publication date: 09/02/2017
Field of study

We show a Talagrand-type concentration inequality for Multi-Task Learning (MTL), using which we establish sharp excess risk bounds for MTL in terms of distribution- and data-dependent versions of the Local Rademacher Complexity (LRC). We also give a new bound on the LRC for norm regularized as well as strongly convex hypothesis classes, which applies not only to MTL but also to the standard i.i.d. setting. Combining both results, one can now easily derive fast-rate bounds on the excess risk for many prominent MTL methods, including---as we demonstrate---Schatten-norm, group-norm, and graph-regularized MTL. The derived bounds reflect a relationship akeen to a conservation law of asymptotic convergence rates. This very relationship allows for trading off slower rates w.r.t. the number of tasks for faster rates with respect to the number of available samples per task, when compared to the rates obtained via a traditional, global Rademacher analysis.Comment: In this version, some arguments and results (of the previous version) have been corrected, or modifie

arXiv.org e-Print Archive

University of Birmingham Research Portal

University of Central Florida (UCF): STARS (Showcase of Text, Archives, Research & Scholarship)

Sparse Iterative Learning Control with Application to a Wafer Stage: Achieving Performance, Resource Efficiency, and Task Flexibility

Author: Oomen Tom
Rojas Cristian R.
Publication venue: 'Elsevier BV'
Publication date: 01/01/2017
Field of study

Trial-varying disturbances are a key concern in Iterative Learning Control (ILC) and may lead to inefficient and expensive implementations and severe performance deterioration. The aim of this paper is to develop a general framework for optimization-based ILC that allows for enforcing additional structure, including sparsity. The proposed method enforces sparsity in a generalized setting through convex relaxations using

\ell_1

norms. The proposed ILC framework is applied to the optimization of sampling sequences for resource efficient implementation, trial-varying disturbance attenuation, and basis function selection. The framework has a large potential in control applications such as mechatronics, as is confirmed through an application on a wafer stage.Comment: 12 pages, 14 figure

arXiv.org e-Print Archive

Repository TU/e

Pure OAI Repository

HIGH DIMENSIONAL VARIABLE SELECTION VIA PENALIZED LIKELIHOOD FOR GENERALIZED LINEAR MODELS

Author: Qi Wenjing
Publication venue
Publication date: 14/01/2015
Field of study

Variable selection is fundamental to high dimensional statistical modeling. In this study, penalized likelihood methods are examined to simultaneously estimate parameters and select variables for generalized linear models. We focus on the variable selection and parameter estimation properties rather than the prediction properties of the estimators and are more interested in situations where the number of parameters diverges with the sample size. We prove the parameter estimation consistency of several widely used penalized likelihood estimators for generalized linear models. We define the relaxed sense and prove that it loosens the regularity and sparsity conditions of the parameter estimation and variable selection consistency. We propose a bootstrap method that can greatly improve the variable selection performances and reduce false discovery rates. We conduct simulation studies to compare the variable selection and parameter estimation properties of these penalized likelihood estimators for logistic models. We then illustrate our methods on gene expression data

D-Scholarship@Pitt