929 research outputs found

    Noise-adaptive Margin-based Active Learning and Lower Bounds under Tsybakov Noise Condition

    Full text link
    We present a simple noise-robust margin-based active learning algorithm to find homogeneous (passing the origin) linear separators and analyze its error convergence when labels are corrupted by noise. We show that when the imposed noise satisfies the Tsybakov low noise condition (Mammen, Tsybakov, and others 1999; Tsybakov 2004) the algorithm is able to adapt to unknown level of noise and achieves optimal statistical rate up to poly-logarithmic factors. We also derive lower bounds for margin based active learning algorithms under Tsybakov noise conditions (TNC) for the membership query synthesis scenario (Angluin 1988). Our result implies lower bounds for the stream based selective sampling scenario (Cohn 1990) under TNC for some fairly simple data distributions. Quite surprisingly, we show that the sample complexity cannot be improved even if the underlying data distribution is as simple as the uniform distribution on the unit ball. Our proof involves the construction of a well separated hypothesis set on the d-dimensional unit ball along with carefully designed label distributions for the Tsybakov noise condition. Our analysis might provide insights for other forms of lower bounds as well.Comment: 16 pages, 2 figures. An abridged version to appear in Thirtieth AAAI Conference on Artificial Intelligence (AAAI), which is held in Phoenix, AZ USA in 201

    Learning by mirror averaging

    Get PDF
    Given a finite collection of estimators or classifiers, we study the problem of model selection type aggregation, that is, we construct a new estimator or classifier, called aggregate, which is nearly as good as the best among them with respect to a given risk criterion. We define our aggregate by a simple recursive procedure which solves an auxiliary stochastic linear programming problem related to the original nonlinear one and constitutes a special case of the mirror averaging algorithm. We show that the aggregate satisfies sharp oracle inequalities under some general assumptions. The results are applied to several problems including regression, classification and density estimation.Comment: Published in at http://dx.doi.org/10.1214/07-AOS546 the Annals of Statistics (http://www.imstat.org/aos/) by the Institute of Mathematical Statistics (http://www.imstat.org

    Discussion of ``2004 IMS Medallion Lecture: Local Rademacher complexities and oracle inequalities in risk minimization'' by V. Koltchinskii

    Full text link
    Discussion of ``2004 IMS Medallion Lecture: Local Rademacher complexities and oracle inequalities in risk minimization'' by V. Koltchinskii [arXiv:0708.0083]Comment: Published at http://dx.doi.org/10.1214/009053606000001064 in the Annals of Statistics (http://www.imstat.org/aos/) by the Institute of Mathematical Statistics (http://www.imstat.org

    An {l1,l2,l∞}\{l_1,l_2,l_{\infty}\}-Regularization Approach to High-Dimensional Errors-in-variables Models

    Full text link
    Several new estimation methods have been recently proposed for the linear regression model with observation error in the design. Different assumptions on the data generating process have motivated different estimators and analysis. In particular, the literature considered (1) observation errors in the design uniformly bounded by some δˉ\bar \delta, and (2) zero mean independent observation errors. Under the first assumption, the rates of convergence of the proposed estimators depend explicitly on δˉ\bar \delta, while the second assumption has been applied when an estimator for the second moment of the observational error is available. This work proposes and studies two new estimators which, compared to other procedures for regression models with errors in the design, exploit an additional l∞l_{\infty}-norm regularization. The first estimator is applicable when both (1) and (2) hold but does not require an estimator for the second moment of the observational error. The second estimator is applicable under (2) and requires an estimator for the second moment of the observation error. Importantly, we impose no assumption on the accuracy of this pilot estimator, in contrast to the previously known procedures. As the recent proposals, we allow the number of covariates to be much larger than the sample size. We establish the rates of convergence of the estimators and compare them with the bounds obtained for related estimators in the literature. These comparisons show interesting insights on the interplay of the assumptions and the achievable rates of convergence

    Exponential convergence of testing error for stochastic gradient methods

    Get PDF
    We consider binary classification problems with positive definite kernels and square loss, and study the convergence rates of stochastic gradient methods. We show that while the excess testing loss (squared loss) converges slowly to zero as the number of observations (and thus iterations) goes to infinity, the testing error (classification error) converges exponentially fast if low-noise conditions are assumed

    Estimation of high-dimensional low-rank matrices

    Full text link
    Suppose that we observe entries or, more generally, linear combinations of entries of an unknown m×Tm\times T-matrix AA corrupted by noise. We are particularly interested in the high-dimensional setting where the number mTmT of unknown entries can be much larger than the sample size NN. Motivated by several applications, we consider estimation of matrix AA under the assumption that it has small rank. This can be viewed as dimension reduction or sparsity assumption. In order to shrink toward a low-rank representation, we investigate penalized least squares estimators with a Schatten-pp quasi-norm penalty term, p≤1p\leq1. We study these estimators under two possible assumptions---a modified version of the restricted isometry condition and a uniform bound on the ratio "empirical norm induced by the sampling operator/Frobenius norm." The main results are stated as nonasymptotic upper bounds on the prediction risk and on the Schatten-qq risk of the estimators, where q∈[p,2]q\in[p,2]. The rates that we obtain for the prediction risk are of the form rm/Nrm/N (for m=Tm=T), up to logarithmic factors, where rr is the rank of AA. The particular examples of multi-task learning and matrix completion are worked out in detail. The proofs are based on tools from the theory of empirical processes. As a by-product, we derive bounds for the kkth entropy numbers of the quasi-convex Schatten class embeddings SpM↪S2MS_p^M\hookrightarrow S_2^M, p<1p<1, which are of independent interest.Comment: Published in at http://dx.doi.org/10.1214/10-AOS860 the Annals of Statistics (http://www.imstat.org/aos/) by the Institute of Mathematical Statistics (http://www.imstat.org

    Gradient-free optimization of highly smooth functions: improved analysis and a new algorithm

    Full text link
    This work studies minimization problems with zero-order noisy oracle information under the assumption that the objective function is highly smooth and possibly satisfies additional properties. We consider two kinds of zero-order projected gradient descent algorithms, which differ in the form of the gradient estimator. The first algorithm uses a gradient estimator based on randomization over the â„“2\ell_2 sphere due to Bach and Perchet (2016). We present an improved analysis of this algorithm on the class of highly smooth and strongly convex functions studied in the prior work, and we derive rates of convergence for two more general classes of non-convex functions. Namely, we consider highly smooth functions satisfying the Polyak-{\L}ojasiewicz condition and the class of highly smooth functions with no additional property. The second algorithm is based on randomization over the â„“1\ell_1 sphere, and it extends to the highly smooth setting the algorithm that was recently proposed for Lipschitz convex functions in Akhavan et al. (2022). We show that, in the case of noiseless oracle, this novel algorithm enjoys better bounds on bias and variance than the â„“2\ell_2 randomization and the commonly used Gaussian randomization algorithms, while in the noisy case both â„“1\ell_1 and â„“2\ell_2 algorithms benefit from similar improved theoretical guarantees. The improvements are achieved thanks to a new proof techniques based on Poincar\'e type inequalities for uniform distributions on the â„“1\ell_1 or â„“2\ell_2 spheres. The results are established under weak (almost adversarial) assumptions on the noise. Moreover, we provide minimax lower bounds proving optimality or near optimality of the obtained upper bounds in several cases

    Pac-bayesian bounds for sparse regression estimation with exponential weights

    Get PDF
    We consider the sparse regression model where the number of parameters pp is larger than the sample size nn. The difficulty when considering high-dimensional problems is to propose estimators achieving a good compromise between statistical and computational performances. The BIC estimator for instance performs well from the statistical point of view \cite{BTW07} but can only be computed for values of pp of at most a few tens. The Lasso estimator is solution of a convex minimization problem, hence computable for large value of pp. However stringent conditions on the design are required to establish fast rates of convergence for this estimator. Dalalyan and Tsybakov \cite{arnak} propose a method achieving a good compromise between the statistical and computational aspects of the problem. Their estimator can be computed for reasonably large pp and satisfies nice statistical properties under weak assumptions on the design. However, \cite{arnak} proposes sparsity oracle inequalities in expectation for the empirical excess risk only. In this paper, we propose an aggregation procedure similar to that of \cite{arnak} but with improved statistical performances. Our main theoretical result is a sparsity oracle inequality in probability for the true excess risk for a version of exponential weight estimator. We also propose a MCMC method to compute our estimator for reasonably large values of pp.Comment: 19 page

    Efficient Active Learning Halfspaces with Tsybakov Noise: A Non-convex Optimization Approach

    Full text link
    We study the problem of computationally and label efficient PAC active learning dd-dimensional halfspaces with Tsybakov Noise~\citep{tsybakov2004optimal} under structured unlabeled data distributions. Inspired by~\cite{diakonikolas2020learning}, we prove that any approximate first-order stationary point of a smooth nonconvex loss function yields a halfspace with a low excess error guarantee. In light of the above structural result, we design a nonconvex optimization-based algorithm with a label complexity of O~(d(1ϵ)8−6α3α−1)\tilde{O}(d (\frac{1}{\epsilon})^{\frac{8-6\alpha}{3\alpha-1}})\footnote{In the main body of this work, we use O~(⋅),Θ~(⋅)\tilde{O}(\cdot), \tilde{\Theta}(\cdot) to hide factors of the form \polylog(d, \frac{1}{\epsilon}, \frac{1}{\delta})}, under the assumption that the Tsybakov noise parameter α∈(13,1]\alpha \in (\frac13, 1], which narrows down the gap between the label complexities of the previously known efficient passive or active algorithms~\citep{diakonikolas2020polynomial,zhang2021improved} and the information-theoretic lower bound in this setting.Comment: 29 page
    • …
    corecore