475 research outputs found

    Does generalization performance of lql^q regularization learning depend on qq? A negative example

    Full text link
    lql^q-regularization has been demonstrated to be an attractive technique in machine learning and statistical modeling. It attempts to improve the generalization (prediction) capability of a machine (model) through appropriately shrinking its coefficients. The shape of a lql^q estimator differs in varying choices of the regularization order qq. In particular, l1l^1 leads to the LASSO estimate, while l2l^{2} corresponds to the smooth ridge regression. This makes the order qq a potential tuning parameter in applications. To facilitate the use of lql^{q}-regularization, we intend to seek for a modeling strategy where an elaborative selection on qq is avoidable. In this spirit, we place our investigation within a general framework of lql^{q}-regularized kernel learning under a sample dependent hypothesis space (SDHS). For a designated class of kernel functions, we show that all lql^{q} estimators for 0<q<∞0< q < \infty attain similar generalization error bounds. These estimated bounds are almost optimal in the sense that up to a logarithmic factor, the upper and lower bounds are asymptotically identical. This finding tentatively reveals that, in some modeling contexts, the choice of qq might not have a strong impact in terms of the generalization capability. From this perspective, qq can be arbitrarily specified, or specified merely by other no generalization criteria like smoothness, computational complexity, sparsity, etc..Comment: 35 pages, 3 figure

    In-network Sparsity-regularized Rank Minimization: Algorithms and Applications

    Full text link
    Given a limited number of entries from the superposition of a low-rank matrix plus the product of a known fat compression matrix times a sparse matrix, recovery of the low-rank and sparse components is a fundamental task subsuming compressed sensing, matrix completion, and principal components pursuit. This paper develops algorithms for distributed sparsity-regularized rank minimization over networks, when the nuclear- and â„“1\ell_1-norm are used as surrogates to the rank and nonzero entry counts of the sought matrices, respectively. While nuclear-norm minimization has well-documented merits when centralized processing is viable, non-separability of the singular-value sum challenges its distributed minimization. To overcome this limitation, an alternative characterization of the nuclear norm is adopted which leads to a separable, yet non-convex cost minimized via the alternating-direction method of multipliers. The novel distributed iterations entail reduced-complexity per-node tasks, and affordable message passing among single-hop neighbors. Interestingly, upon convergence the distributed (non-convex) estimator provably attains the global optimum of its centralized counterpart, regardless of initialization. Several application domains are outlined to highlight the generality and impact of the proposed framework. These include unveiling traffic anomalies in backbone networks, predicting networkwide path latencies, and mapping the RF ambiance using wireless cognitive radios. Simulations with synthetic and real network data corroborate the convergence of the novel distributed algorithm, and its centralized performance guarantees.Comment: 30 pages, submitted for publication on the IEEE Trans. Signal Proces

    Tensor Networks for Dimensionality Reduction and Large-Scale Optimizations. Part 2 Applications and Future Perspectives

    Full text link
    Part 2 of this monograph builds on the introduction to tensor networks and their operations presented in Part 1. It focuses on tensor network models for super-compressed higher-order representation of data/parameters and related cost functions, while providing an outline of their applications in machine learning and data analytics. A particular emphasis is on the tensor train (TT) and Hierarchical Tucker (HT) decompositions, and their physically meaningful interpretations which reflect the scalability of the tensor network approach. Through a graphical approach, we also elucidate how, by virtue of the underlying low-rank tensor approximations and sophisticated contractions of core tensors, tensor networks have the ability to perform distributed computations on otherwise prohibitively large volumes of data/parameters, thereby alleviating or even eliminating the curse of dimensionality. The usefulness of this concept is illustrated over a number of applied areas, including generalized regression and classification (support tensor machines, canonical correlation analysis, higher order partial least squares), generalized eigenvalue decomposition, Riemannian optimization, and in the optimization of deep neural networks. Part 1 and Part 2 of this work can be used either as stand-alone separate texts, or indeed as a conjoint comprehensive review of the exciting field of low-rank tensor networks and tensor decompositions.Comment: 232 page

    Tensor Networks for Dimensionality Reduction and Large-Scale Optimizations. Part 2 Applications and Future Perspectives

    Full text link
    Part 2 of this monograph builds on the introduction to tensor networks and their operations presented in Part 1. It focuses on tensor network models for super-compressed higher-order representation of data/parameters and related cost functions, while providing an outline of their applications in machine learning and data analytics. A particular emphasis is on the tensor train (TT) and Hierarchical Tucker (HT) decompositions, and their physically meaningful interpretations which reflect the scalability of the tensor network approach. Through a graphical approach, we also elucidate how, by virtue of the underlying low-rank tensor approximations and sophisticated contractions of core tensors, tensor networks have the ability to perform distributed computations on otherwise prohibitively large volumes of data/parameters, thereby alleviating or even eliminating the curse of dimensionality. The usefulness of this concept is illustrated over a number of applied areas, including generalized regression and classification (support tensor machines, canonical correlation analysis, higher order partial least squares), generalized eigenvalue decomposition, Riemannian optimization, and in the optimization of deep neural networks. Part 1 and Part 2 of this work can be used either as stand-alone separate texts, or indeed as a conjoint comprehensive review of the exciting field of low-rank tensor networks and tensor decompositions.Comment: 232 page

    Local Rademacher Complexity-based Learning Guarantees for Multi-Task Learning

    Full text link
    We show a Talagrand-type concentration inequality for Multi-Task Learning (MTL), using which we establish sharp excess risk bounds for MTL in terms of distribution- and data-dependent versions of the Local Rademacher Complexity (LRC). We also give a new bound on the LRC for norm regularized as well as strongly convex hypothesis classes, which applies not only to MTL but also to the standard i.i.d. setting. Combining both results, one can now easily derive fast-rate bounds on the excess risk for many prominent MTL methods, including---as we demonstrate---Schatten-norm, group-norm, and graph-regularized MTL. The derived bounds reflect a relationship akeen to a conservation law of asymptotic convergence rates. This very relationship allows for trading off slower rates w.r.t. the number of tasks for faster rates with respect to the number of available samples per task, when compared to the rates obtained via a traditional, global Rademacher analysis.Comment: In this version, some arguments and results (of the previous version) have been corrected, or modifie

    Sparse Iterative Learning Control with Application to a Wafer Stage: Achieving Performance, Resource Efficiency, and Task Flexibility

    Get PDF
    Trial-varying disturbances are a key concern in Iterative Learning Control (ILC) and may lead to inefficient and expensive implementations and severe performance deterioration. The aim of this paper is to develop a general framework for optimization-based ILC that allows for enforcing additional structure, including sparsity. The proposed method enforces sparsity in a generalized setting through convex relaxations using â„“1\ell_1 norms. The proposed ILC framework is applied to the optimization of sampling sequences for resource efficient implementation, trial-varying disturbance attenuation, and basis function selection. The framework has a large potential in control applications such as mechatronics, as is confirmed through an application on a wafer stage.Comment: 12 pages, 14 figure

    HIGH DIMENSIONAL VARIABLE SELECTION VIA PENALIZED LIKELIHOOD FOR GENERALIZED LINEAR MODELS

    Get PDF
    Variable selection is fundamental to high dimensional statistical modeling. In this study, penalized likelihood methods are examined to simultaneously estimate parameters and select variables for generalized linear models. We focus on the variable selection and parameter estimation properties rather than the prediction properties of the estimators and are more interested in situations where the number of parameters diverges with the sample size. We prove the parameter estimation consistency of several widely used penalized likelihood estimators for generalized linear models. We define the relaxed sense and prove that it loosens the regularity and sparsity conditions of the parameter estimation and variable selection consistency. We propose a bootstrap method that can greatly improve the variable selection performances and reduce false discovery rates. We conduct simulation studies to compare the variable selection and parameter estimation properties of these penalized likelihood estimators for logistic models. We then illustrate our methods on gene expression data
    • …
    corecore