10,636 research outputs found
Sharp generalization error bounds for randomly-projected classifiers
We derive sharp bounds on the generalization error of a generic linear classifier trained by empirical risk minimization on randomly projected data. We make no restrictive assumptions (such as sparsity or separability) on the data: Instead we use the fact that, in a classification setting, the question of interest is really āwhat is the effect of random projection on the predicted class labels?ā and we therefore derive the exact probability of ālabel flippingā under Gaussian random projection in order to quantify this effect precisely in our bounds
Random perturbation of low rank matrices: Improving classical bounds
Matrix perturbation inequalities, such as Weyl's theorem (concerning the
singular values) and the Davis-Kahan theorem (concerning the singular vectors),
play essential roles in quantitative science; in particular, these bounds have
found application in data analysis as well as related areas of engineering and
computer science.
In many situations, the perturbation is assumed to be random, and the
original matrix has certain structural properties (such as having low rank). We
show that, in this scenario, classical perturbation results, such as Weyl and
Davis-Kahan, can be improved significantly. We believe many of our new bounds
are close to optimal and also discuss some applications.Comment: 28 pages, 1 figure. Updated introduction and reference
Building Confidential and Efficient Query Services in the Cloud with RASP Data Perturbation
With the wide deployment of public cloud computing infrastructures, using
clouds to host data query services has become an appealing solution for the
advantages on scalability and cost-saving. However, some data might be
sensitive that the data owner does not want to move to the cloud unless the
data confidentiality and query privacy are guaranteed. On the other hand, a
secured query service should still provide efficient query processing and
significantly reduce the in-house workload to fully realize the benefits of
cloud computing. We propose the RASP data perturbation method to provide secure
and efficient range query and kNN query services for protected data in the
cloud. The RASP data perturbation method combines order preserving encryption,
dimensionality expansion, random noise injection, and random projection, to
provide strong resilience to attacks on the perturbed data and queries. It also
preserves multidimensional ranges, which allows existing indexing techniques to
be applied to speedup range query processing. The kNN-R algorithm is designed
to work with the RASP range query algorithm to process the kNN queries. We have
carefully analyzed the attacks on data and queries under a precisely defined
threat model and realistic security assumptions. Extensive experiments have
been conducted to show the advantages of this approach on efficiency and
security.Comment: 18 pages, to appear in IEEE TKDE, accepted in December 201
Bandit Online Optimization Over the Permutahedron
The permutahedron is the convex polytope with vertex set consisting of the
vectors for all permutations (bijections) over
. We study a bandit game in which, at each step , an
adversary chooses a hidden weight weight vector , a player chooses a
vertex of the permutahedron and suffers an observed loss of
.
A previous algorithm CombBand of Cesa-Bianchi et al (2009) guarantees a
regret of for a time horizon of . Unfortunately,
CombBand requires at each step an -by- matrix permanent approximation to
within improved accuracy as grows, resulting in a total running time that
is super linear in , making it impractical for large time horizons.
We provide an algorithm of regret with total time
complexity . The ideas are a combination of CombBand and a recent
algorithm by Ailon (2013) for online optimization over the permutahedron in the
full information setting. The technical core is a bound on the variance of the
Plackett-Luce noisy sorting process's "pseudo loss". The bound is obtained by
establishing positive semi-definiteness of a family of 3-by-3 matrices
generated from rational functions of exponentials of 3 parameters
- ā¦