14,827 research outputs found
Spectral Norm of Random Kernel Matrices with Applications to Privacy
Kernel methods are an extremely popular set of techniques used for many
important machine learning and data analysis applications. In addition to
having good practical performances, these methods are supported by a
well-developed theory. Kernel methods use an implicit mapping of the input data
into a high dimensional feature space defined by a kernel function, i.e., a
function returning the inner product between the images of two data points in
the feature space. Central to any kernel method is the kernel matrix, which is
built by evaluating the kernel function on a given sample dataset.
In this paper, we initiate the study of non-asymptotic spectral theory of
random kernel matrices. These are n x n random matrices whose (i,j)th entry is
obtained by evaluating the kernel function on and , where
are a set of n independent random high-dimensional vectors. Our
main contribution is to obtain tight upper bounds on the spectral norm (largest
eigenvalue) of random kernel matrices constructed by commonly used kernel
functions based on polynomials and Gaussian radial basis.
As an application of these results, we provide lower bounds on the distortion
needed for releasing the coefficients of kernel ridge regression under
attribute privacy, a general privacy notion which captures a large class of
privacy definitions. Kernel ridge regression is standard method for performing
non-parametric regression that regularly outperforms traditional regression
approaches in various domains. Our privacy distortion lower bounds are the
first for any kernel technique, and our analysis assumes realistic scenarios
for the input, unlike all previous lower bounds for other release problems
which only hold under very restrictive input settings.Comment: 16 pages, 1 Figur
Differentially Private Release and Learning of Threshold Functions
We prove new upper and lower bounds on the sample complexity of differentially private algorithms for releasing approximate answers to
threshold functions. A threshold function over a totally ordered domain
evaluates to if , and evaluates to otherwise. We
give the first nontrivial lower bound for releasing thresholds with
differential privacy, showing that the task is impossible
over an infinite domain , and moreover requires sample complexity , which grows with the size of the domain. Inspired by the
techniques used to prove this lower bound, we give an algorithm for releasing
thresholds with samples. This improves the
previous best upper bound of (Beimel et al., RANDOM
'13).
Our sample complexity upper and lower bounds also apply to the tasks of
learning distributions with respect to Kolmogorov distance and of properly PAC
learning thresholds with differential privacy. The lower bound gives the first
separation between the sample complexity of properly learning a concept class
with differential privacy and learning without privacy. For
properly learning thresholds in dimensions, this lower bound extends to
.
To obtain our results, we give reductions in both directions from releasing
and properly learning thresholds and the simpler interior point problem. Given
a database of elements from , the interior point problem asks for an
element between the smallest and largest elements in . We introduce new
recursive constructions for bounding the sample complexity of the interior
point problem, as well as further reductions and techniques for proving
impossibility results for other basic problems in differential privacy.Comment: 43 page
Learning Coverage Functions and Private Release of Marginals
We study the problem of approximating and learning coverage functions. A
function is a coverage function, if
there exists a universe with non-negative weights for each
and subsets of such that . Alternatively, coverage functions can be described
as non-negative linear combinations of monotone disjunctions. They are a
natural subclass of submodular functions and arise in a number of applications.
We give an algorithm that for any , given random and uniform
examples of an unknown coverage function , finds a function that
approximates within factor on all but -fraction of the
points in time . This is the first fully-polynomial
algorithm for learning an interesting class of functions in the demanding PMAC
model of Balcan and Harvey (2011). Our algorithms are based on several new
structural properties of coverage functions. Using the results in (Feldman and
Kothari, 2014), we also show that coverage functions are learnable agnostically
with excess -error over all product and symmetric
distributions in time . In contrast, we show that,
without assumptions on the distribution, learning coverage functions is at
least as hard as learning polynomial-size disjoint DNF formulas, a class of
functions for which the best known algorithm runs in time
(Klivans and Servedio, 2004).
As an application of our learning results, we give simple
differentially-private algorithms for releasing monotone conjunction counting
queries with low average error. In particular, for any , we obtain
private release of -way marginals with average error in time
Exploiting Metric Structure for Efficient Private Query Release
We consider the problem of privately answering queries defined on databases
which are collections of points belonging to some metric space. We give simple,
computationally efficient algorithms for answering distance queries defined
over an arbitrary metric. Distance queries are specified by points in the
metric space, and ask for the average distance from the query point to the
points contained in the database, according to the specified metric. Our
algorithms run efficiently in the database size and the dimension of the space,
and operate in both the online query release setting, and the offline setting
in which they must in polynomial time generate a fixed data structure which can
answer all queries of interest. This represents one of the first subclasses of
linear queries for which efficient algorithms are known for the private query
release problem, circumventing known hardness results for generic linear
queries
- …