18,667 research outputs found

    Model selection of polynomial kernel regression

    Full text link
    Polynomial kernel regression is one of the standard and state-of-the-art learning strategies. However, as is well known, the choices of the degree of polynomial kernel and the regularization parameter are still open in the realm of model selection. The first aim of this paper is to develop a strategy to select these parameters. On one hand, based on the worst-case learning rate analysis, we show that the regularization term in polynomial kernel regression is not necessary. In other words, the regularization parameter can decrease arbitrarily fast when the degree of the polynomial kernel is suitable tuned. On the other hand,taking account of the implementation of the algorithm, the regularization term is required. Summarily, the effect of the regularization term in polynomial kernel regression is only to circumvent the " ill-condition" of the kernel matrix. Based on this, the second purpose of this paper is to propose a new model selection strategy, and then design an efficient learning algorithm. Both theoretical and experimental analysis show that the new strategy outperforms the previous one. Theoretically, we prove that the new learning strategy is almost optimal if the regression function is smooth. Experimentally, it is shown that the new strategy can significantly reduce the computational burden without loss of generalization capability.Comment: 29 pages, 4 figure

    Improved Dropout for Shallow and Deep Learning

    Full text link
    Dropout has been witnessed with great success in training deep neural networks by independently zeroing out the outputs of neurons at random. It has also received a surge of interest for shallow learning, e.g., logistic regression. However, the independent sampling for dropout could be suboptimal for the sake of convergence. In this paper, we propose to use multinomial sampling for dropout, i.e., sampling features or neurons according to a multinomial distribution with different probabilities for different features/neurons. To exhibit the optimal dropout probabilities, we analyze the shallow learning with multinomial dropout and establish the risk bound for stochastic optimization. By minimizing a sampling dependent factor in the risk bound, we obtain a distribution-dependent dropout with sampling probabilities dependent on the second order statistics of the data distribution. To tackle the issue of evolving distribution of neurons in deep learning, we propose an efficient adaptive dropout (named \textbf{evolutional dropout}) that computes the sampling probabilities on-the-fly from a mini-batch of examples. Empirical studies on several benchmark datasets demonstrate that the proposed dropouts achieve not only much faster convergence and but also a smaller testing error than the standard dropout. For example, on the CIFAR-100 data, the evolutional dropout achieves relative improvements over 10\% on the prediction performance and over 50\% on the convergence speed compared to the standard dropout.Comment: In NIPS 201

    Sparse covariance estimation in heterogeneous samples

    Full text link
    Standard Gaussian graphical models (GGMs) implicitly assume that the conditional independence among variables is common to all observations in the sample. However, in practice, observations are usually collected form heterogeneous populations where such assumption is not satisfied, leading in turn to nonlinear relationships among variables. To tackle these problems we explore mixtures of GGMs; in particular, we consider both infinite mixture models of GGMs and infinite hidden Markov models with GGM emission distributions. Such models allow us to divide a heterogeneous population into homogenous groups, with each cluster having its own conditional independence structure. The main advantage of considering infinite mixtures is that they allow us easily to estimate the number of number of subpopulations in the sample. As an illustration, we study the trends in exchange rate fluctuations in the pre-Euro era. This example demonstrates that the models are very flexible while providing extremely interesting interesting insights into real-life applications
    • …
    corecore