10,636 research outputs found

    Sharp generalization error bounds for randomly-projected classifiers

    Get PDF
    We derive sharp bounds on the generalization error of a generic linear classifier trained by empirical risk minimization on randomly projected data. We make no restrictive assumptions (such as sparsity or separability) on the data: Instead we use the fact that, in a classification setting, the question of interest is really ā€˜what is the effect of random projection on the predicted class labels?ā€™ and we therefore derive the exact probability of ā€˜label flippingā€™ under Gaussian random projection in order to quantify this effect precisely in our bounds

    Random perturbation of low rank matrices: Improving classical bounds

    Full text link
    Matrix perturbation inequalities, such as Weyl's theorem (concerning the singular values) and the Davis-Kahan theorem (concerning the singular vectors), play essential roles in quantitative science; in particular, these bounds have found application in data analysis as well as related areas of engineering and computer science. In many situations, the perturbation is assumed to be random, and the original matrix has certain structural properties (such as having low rank). We show that, in this scenario, classical perturbation results, such as Weyl and Davis-Kahan, can be improved significantly. We believe many of our new bounds are close to optimal and also discuss some applications.Comment: 28 pages, 1 figure. Updated introduction and reference

    Building Confidential and Efficient Query Services in the Cloud with RASP Data Perturbation

    Full text link
    With the wide deployment of public cloud computing infrastructures, using clouds to host data query services has become an appealing solution for the advantages on scalability and cost-saving. However, some data might be sensitive that the data owner does not want to move to the cloud unless the data confidentiality and query privacy are guaranteed. On the other hand, a secured query service should still provide efficient query processing and significantly reduce the in-house workload to fully realize the benefits of cloud computing. We propose the RASP data perturbation method to provide secure and efficient range query and kNN query services for protected data in the cloud. The RASP data perturbation method combines order preserving encryption, dimensionality expansion, random noise injection, and random projection, to provide strong resilience to attacks on the perturbed data and queries. It also preserves multidimensional ranges, which allows existing indexing techniques to be applied to speedup range query processing. The kNN-R algorithm is designed to work with the RASP range query algorithm to process the kNN queries. We have carefully analyzed the attacks on data and queries under a precisely defined threat model and realistic security assumptions. Extensive experiments have been conducted to show the advantages of this approach on efficiency and security.Comment: 18 pages, to appear in IEEE TKDE, accepted in December 201

    Bandit Online Optimization Over the Permutahedron

    Full text link
    The permutahedron is the convex polytope with vertex set consisting of the vectors (Ļ€(1),ā€¦,Ļ€(n))(\pi(1),\dots, \pi(n)) for all permutations (bijections) Ļ€\pi over {1,ā€¦,n}\{1,\dots, n\}. We study a bandit game in which, at each step tt, an adversary chooses a hidden weight weight vector sts_t, a player chooses a vertex Ļ€t\pi_t of the permutahedron and suffers an observed loss of āˆ‘i=1nĻ€(i)st(i)\sum_{i=1}^n \pi(i) s_t(i). A previous algorithm CombBand of Cesa-Bianchi et al (2009) guarantees a regret of O(nTlogā”n)O(n\sqrt{T \log n}) for a time horizon of TT. Unfortunately, CombBand requires at each step an nn-by-nn matrix permanent approximation to within improved accuracy as TT grows, resulting in a total running time that is super linear in TT, making it impractical for large time horizons. We provide an algorithm of regret O(n3/2T)O(n^{3/2}\sqrt{T}) with total time complexity O(n3T)O(n^3T). The ideas are a combination of CombBand and a recent algorithm by Ailon (2013) for online optimization over the permutahedron in the full information setting. The technical core is a bound on the variance of the Plackett-Luce noisy sorting process's "pseudo loss". The bound is obtained by establishing positive semi-definiteness of a family of 3-by-3 matrices generated from rational functions of exponentials of 3 parameters
    • ā€¦
    corecore