8,232 research outputs found

    An Analysis of Active Learning With Uniform Feature Noise

    Full text link
    In active learning, the user sequentially chooses values for feature XX and an oracle returns the corresponding label YY. In this paper, we consider the effect of feature noise in active learning, which could arise either because XX itself is being measured, or it is corrupted in transmission to the oracle, or the oracle returns the label of a noisy version of the query point. In statistics, feature noise is known as "errors in variables" and has been studied extensively in non-active settings. However, the effect of feature noise in active learning has not been studied before. We consider the well-known Berkson errors-in-variables model with additive uniform noise of width Οƒ\sigma. Our simple but revealing setting is that of one-dimensional binary classification setting where the goal is to learn a threshold (point where the probability of a ++ label crosses half). We deal with regression functions that are antisymmetric in a region of size Οƒ\sigma around the threshold and also satisfy Tsybakov's margin condition around the threshold. We prove minimax lower and upper bounds which demonstrate that when Οƒ\sigma is smaller than the minimiax active/passive noiseless error derived in \cite{CN07}, then noise has no effect on the rates and one achieves the same noiseless rates. For larger Οƒ\sigma, the \textit{unflattening} of the regression function on convolution with uniform noise, along with its local antisymmetry around the threshold, together yield a behaviour where noise \textit{appears} to be beneficial. Our key result is that active learning can buy significant improvement over a passive strategy even in the presence of feature noise.Comment: 24 pages, 2 figures, published in the proceedings of the 17th International Conference on Artificial Intelligence and Statistics (AISTATS), 201

    Fast Convergence Rate of Multiple Kernel Learning with Elastic-net Regularization

    Full text link
    We investigate the learning rate of multiple kernel leaning (MKL) with elastic-net regularization, which consists of an β„“1\ell_1-regularizer for inducing the sparsity and an β„“2\ell_2-regularizer for controlling the smoothness. We focus on a sparse setting where the total number of kernels is large but the number of non-zero components of the ground truth is relatively small, and prove that elastic-net MKL achieves the minimax learning rate on the β„“2\ell_2-mixed-norm ball. Our bound is sharper than the convergence rates ever shown, and has a property that the smoother the truth is, the faster the convergence rate is.Comment: 21 pages, 0 figur

    Fast learning rate of multiple kernel learning: Trade-off between sparsity and smoothness

    Full text link
    We investigate the learning rate of multiple kernel learning (MKL) with β„“1\ell_1 and elastic-net regularizations. The elastic-net regularization is a composition of an β„“1\ell_1-regularizer for inducing the sparsity and an β„“2\ell_2-regularizer for controlling the smoothness. We focus on a sparse setting where the total number of kernels is large, but the number of nonzero components of the ground truth is relatively small, and show sharper convergence rates than the learning rates have ever shown for both β„“1\ell_1 and elastic-net regularizations. Our analysis reveals some relations between the choice of a regularization function and the performance. If the ground truth is smooth, we show a faster convergence rate for the elastic-net regularization with less conditions than β„“1\ell_1-regularization; otherwise, a faster convergence rate for the β„“1\ell_1-regularization is shown.Comment: Published in at http://dx.doi.org/10.1214/13-AOS1095 the Annals of Statistics (http://www.imstat.org/aos/) by the Institute of Mathematical Statistics (http://www.imstat.org). arXiv admin note: text overlap with arXiv:1103.043

    S2: An Efficient Graph Based Active Learning Algorithm with Application to Nonparametric Classification

    Full text link
    This paper investigates the problem of active learning for binary label prediction on a graph. We introduce a simple and label-efficient algorithm called S2 for this task. At each step, S2 selects the vertex to be labeled based on the structure of the graph and all previously gathered labels. Specifically, S2 queries for the label of the vertex that bisects the *shortest shortest* path between any pair of oppositely labeled vertices. We present a theoretical estimate of the number of queries S2 needs in terms of a novel parametrization of the complexity of binary functions on graphs. We also present experimental results demonstrating the performance of S2 on both real and synthetic data. While other graph-based active learning algorithms have shown promise in practice, our algorithm is the first with both good performance and theoretical guarantees. Finally, we demonstrate the implications of the S2 algorithm to the theory of nonparametric active learning. In particular, we show that S2 achieves near minimax optimal excess risk for an important class of nonparametric classification problems.Comment: A version of this paper appears in the Conference on Learning Theory (COLT) 201
    • …
    corecore