3 research outputs found

    Complexity of pattern classes and Lipschitz property

    No full text
    Rademacher and Gaussian complexities are successfully used in learning theory for measuring the capacity of the class of functions to be learned. One of the most important properties for these complexities is their Lipschitz property: a composition of a class of functions with a fixed Lipschitz function may increase its complexity by at most twice the Lipschitz constant. The proof of this property is non-trivial (in contrast to the other properties) and it is believed that the proof in the Gaussian case is conceptually more difficult then the one for the Rademacher case. In this paper we give a detailed prove of the Lipschitz property for the Rademacher case and generalize the same idea to an arbitrary complexity (including the Gaussian). We also discuss a related topic about the Rademacher complexity of a class consisting of all the Lipschitz functions with a given Lipschitz constant. We show that the complexity is surprisingly low in the one-dimensional case. The question for higher dimensions remains open

    Supervised Learning with Similarity Functions

    Full text link
    We address the problem of general supervised learning when data can only be accessed through an (indefinite) similarity function between data points. Existing work on learning with indefinite kernels has concentrated solely on binary/multi-class classification problems. We propose a model that is generic enough to handle any supervised learning task and also subsumes the model previously proposed for classification. We give a "goodness" criterion for similarity functions w.r.t. a given supervised learning task and then adapt a well-known landmarking technique to provide efficient algorithms for supervised learning using "good" similarity functions. We demonstrate the effectiveness of our model on three important super-vised learning problems: a) real-valued regression, b) ordinal regression and c) ranking where we show that our method guarantees bounded generalization error. Furthermore, for the case of real-valued regression, we give a natural goodness definition that, when used in conjunction with a recent result in sparse vector recovery, guarantees a sparse predictor with bounded generalization error. Finally, we report results of our learning algorithms on regression and ordinal regression tasks using non-PSD similarity functions and demonstrate the effectiveness of our algorithms, especially that of the sparse landmark selection algorithm that achieves significantly higher accuracies than the baseline methods while offering reduced computational costs.Comment: To appear in the proceedings of NIPS 2012, 30 page
    corecore