2,435 research outputs found

    Representation Learning: A Review and New Perspectives

    Full text link
    The success of machine learning algorithms generally depends on data representation, and we hypothesize that this is because different representations can entangle and hide more or less the different explanatory factors of variation behind the data. Although specific domain knowledge can be used to help design representations, learning with generic priors can also be used, and the quest for AI is motivating the design of more powerful representation-learning algorithms implementing such priors. This paper reviews recent work in the area of unsupervised feature learning and deep learning, covering advances in probabilistic models, auto-encoders, manifold learning, and deep networks. This motivates longer-term unanswered questions about the appropriate objectives for learning good representations, for computing representations (i.e., inference), and the geometrical connections between representation learning, density estimation and manifold learning

    Regression with Linear Factored Functions

    Full text link
    Many applications that use empirically estimated functions face a curse of dimensionality, because the integrals over most function classes must be approximated by sampling. This paper introduces a novel regression-algorithm that learns linear factored functions (LFF). This class of functions has structural properties that allow to analytically solve certain integrals and to calculate point-wise products. Applications like belief propagation and reinforcement learning can exploit these properties to break the curse and speed up computation. We derive a regularized greedy optimization scheme, that learns factored basis functions during training. The novel regression algorithm performs competitively to Gaussian processes on benchmark tasks, and the learned LFF functions are with 4-9 factored basis functions on average very compact.Comment: Under review as conference paper at ECML/PKDD 201

    Transfer learning through greedy subset selection

    Get PDF
    We study the binary transfer learning problem, focusing on how to select sources from a large pool and how to combine them to yield a good performance on a target task. In particular, we consider the transfer learning setting where one does not have direct access to the source data, but rather employs the source hypotheses trained from them. Building on the literature on the best subset selection problem, we propose an efficient algorithm that selects relevant source hypotheses and feature dimensions simultaneously. On three computer vision datasets we achieve state-of-the-art results, substantially outperforming transfer learning and popular feature selection baselines in a small-sample setting. Also, we theoretically prove that, under reasonable assumptions on the source hypotheses, our algorithm can learn effectively from few examples

    Tensor Networks for Dimensionality Reduction and Large-Scale Optimizations. Part 2 Applications and Future Perspectives

    Full text link
    Part 2 of this monograph builds on the introduction to tensor networks and their operations presented in Part 1. It focuses on tensor network models for super-compressed higher-order representation of data/parameters and related cost functions, while providing an outline of their applications in machine learning and data analytics. A particular emphasis is on the tensor train (TT) and Hierarchical Tucker (HT) decompositions, and their physically meaningful interpretations which reflect the scalability of the tensor network approach. Through a graphical approach, we also elucidate how, by virtue of the underlying low-rank tensor approximations and sophisticated contractions of core tensors, tensor networks have the ability to perform distributed computations on otherwise prohibitively large volumes of data/parameters, thereby alleviating or even eliminating the curse of dimensionality. The usefulness of this concept is illustrated over a number of applied areas, including generalized regression and classification (support tensor machines, canonical correlation analysis, higher order partial least squares), generalized eigenvalue decomposition, Riemannian optimization, and in the optimization of deep neural networks. Part 1 and Part 2 of this work can be used either as stand-alone separate texts, or indeed as a conjoint comprehensive review of the exciting field of low-rank tensor networks and tensor decompositions.Comment: 232 page

    Tensor Networks for Dimensionality Reduction and Large-Scale Optimizations. Part 2 Applications and Future Perspectives

    Full text link
    Part 2 of this monograph builds on the introduction to tensor networks and their operations presented in Part 1. It focuses on tensor network models for super-compressed higher-order representation of data/parameters and related cost functions, while providing an outline of their applications in machine learning and data analytics. A particular emphasis is on the tensor train (TT) and Hierarchical Tucker (HT) decompositions, and their physically meaningful interpretations which reflect the scalability of the tensor network approach. Through a graphical approach, we also elucidate how, by virtue of the underlying low-rank tensor approximations and sophisticated contractions of core tensors, tensor networks have the ability to perform distributed computations on otherwise prohibitively large volumes of data/parameters, thereby alleviating or even eliminating the curse of dimensionality. The usefulness of this concept is illustrated over a number of applied areas, including generalized regression and classification (support tensor machines, canonical correlation analysis, higher order partial least squares), generalized eigenvalue decomposition, Riemannian optimization, and in the optimization of deep neural networks. Part 1 and Part 2 of this work can be used either as stand-alone separate texts, or indeed as a conjoint comprehensive review of the exciting field of low-rank tensor networks and tensor decompositions.Comment: 232 page

    Improved Multi-Task Learning Based on Local Rademacher Analysis

    Get PDF
    Considering a single prediction task at a time is the most commonly paradigm in machine learning practice. This methodology, however, ignores the potentially relevant information that might be available in other related tasks in the same domain. This becomes even more critical where facing the lack of a sufficient amount of data in a prediction task of an individual subject may lead to deteriorated generalization performance. In such cases, learning multiple related tasks together might offer a better performance by allowing tasks to leverage information from each other. Multi-Task Learning (MTL) is a machine learning framework, which learns multiple related tasks simultaneously to overcome data scarcity limitations of Single Task Learning (STL), and therefore, it results in an improved performance. Although MTL has been actively investigated by the machine learning community, there are only a few studies examining the theoretical justification of this learning framework. The focus of previous studies is on providing learning guarantees in the form of generalization error bounds. The study of generalization bounds is considered as an important problem in machine learning, and, more specifically, in statistical learning theory. This importance is twofold: (1) generalization bounds provide an upper-tail confidence interval for the true risk of a learning algorithm the latter of which cannot be precisely calculated due to its dependency to some unknown distribution P from which the data are drawn, (2) this type of bounds can also be employed as model selection tools, which lead to identifying more accurate learning models. The generalization error bounds are typically expressed in terms of the empirical risk of the learning hypothesis along with a complexity measure of that hypothesis. Although different complexity measures can be used in deriving error bounds, Rademacher complexity has received considerable attention in recent years, due to its superiority to other complexity measures. In fact, Rademacher complexity can potentially lead to tighter error bounds compared to the ones obtained by other complexity measures. However, one shortcoming of the general notion of Rademacher complexity is that it provides a global complexity estimate of the learning hypothesis space, which does not take into consideration the fact that learning algorithms, by design, select functions belonging to a more favorable subset of this space and, therefore, they yield better performing models than the worst case. To overcome the limitation of global Rademacher complexity, a more nuanced notion of Rademacher complexity, the so-called local Rademacher complexity, has been considered, which leads to sharper learning bounds, and as such, compared to its global counterpart, guarantees faster convergence rates in terms of number of samples. Also, considering the fact that locally-derived bounds are expected to be tighter than globally-derived ones, they can motivate better (more accurate) model selection algorithms. While the previous MTL studies provide generalization bounds based on some other complexity measures, in this dissertation, we prove excess risk bounds for some popular kernel-based MTL hypothesis spaces based on the Local Rademacher Complexity (LRC) of those hypotheses. We show that these local bounds have faster convergence rates compared to the previous Global Rademacher Complexity (GRC)-based bounds. We then use our LRC-based MTL bounds to design a new kernel-based MTL model, which enjoys strong learning guarantees. Moreover, we develop an optimization algorithm to solve our new MTL formulation. Finally, we run simulations on experimental data that compare our MTL model to some classical Multi-Task Multiple Kernel Learning (MT-MKL) models designed based on the GRCs. Since the local Rademacher complexities are expected to be tighter than the global ones, our new model is also expected to exhibit better performance compared to the GRC-based models

    Scalable Greedy Algorithms for Transfer Learning

    Full text link
    In this paper we consider the binary transfer learning problem, focusing on how to select and combine sources from a large pool to yield a good performance on a target task. Constraining our scenario to real world, we do not assume the direct access to the source data, but rather we employ the source hypotheses trained from them. We propose an efficient algorithm that selects relevant source hypotheses and feature dimensions simultaneously, building on the literature on the best subset selection problem. Our algorithm achieves state-of-the-art results on three computer vision datasets, substantially outperforming both transfer learning and popular feature selection baselines in a small-sample setting. We also present a randomized variant that achieves the same results with the computational cost independent from the number of source hypotheses and feature dimensions. Also, we theoretically prove that, under reasonable assumptions on the source hypotheses, our algorithm can learn effectively from few examples
    • …
    corecore