176 research outputs found
Almost Optimal Distribution-Free Junta Testing
We consider the problem of testing whether an unknown n-variable Boolean function is a k-junta in the distribution-free property testing model, where the distance between functions is measured with respect to an arbitrary and unknown probability distribution over {0,1}^n. Chen, Liu, Servedio, Sheng and Xie [Zhengyang Liu et al., 2018] showed that the distribution-free k-junta testing can be performed, with one-sided error, by an adaptive algorithm that makes O~(k^2)/epsilon queries. In this paper, we give a simple two-sided error adaptive algorithm that makes O~(k/epsilon) queries
Almost Optimal Testers for Concise Representations
We give improved and almost optimal testers for several classes of Boolean functions on n variables that have concise representation in the uniform and distribution-free model. Classes, such as k-Junta, k-Linear, s-Term DNF, s-Term Monotone DNF, r-DNF, Decision List, r-Decision List, size-s Decision Tree, size-s Boolean Formula, size-s Branching Program, s-Sparse Polynomial over the binary field and functions with Fourier Degree at most d.
The approach is new and combines ideas from Diakonikolas et al. [Ilias Diakonikolas et al., 2007], Bshouty [Nader H. Bshouty, 2018], Goldreich et al. [Oded Goldreich et al., 1998], and learning theory. The method can be extended to several other classes of functions over any domain that can be approximated by functions with a small number of relevant variables
Minimizing Hitting Time between Disparate Groups with Shortcut Edges
Structural bias or segregation of networks refers to situations where two or
more disparate groups are present in the network, so that the groups are highly
connected internally, but loosely connected to each other. In many cases it is
of interest to increase the connectivity of disparate groups so as to, e.g.,
minimize social friction, or expose individuals to diverse viewpoints. A
commonly-used mechanism for increasing the network connectivity is to add edge
shortcuts between pairs of nodes. In many applications of interest, edge
shortcuts typically translate to recommendations, e.g., what video to watch, or
what news article to read next. The problem of reducing structural bias or
segregation via edge shortcuts has recently been studied in the literature, and
random walks have been an essential tool for modeling navigation and
connectivity in the underlying networks. Existing methods, however, either do
not offer approximation guarantees, or engineer the objective so that it
satisfies certain desirable properties that simplify the optimization~task. In
this paper we address the problem of adding a given number of shortcut edges in
the network so as to directly minimize the average hitting time and the maximum
hitting time between two disparate groups. Our algorithm for minimizing average
hitting time is a greedy bicriteria that relies on supermodularity. In
contrast, maximum hitting time is not supermodular. Despite, we develop an
approximation algorithm for that objective as well, by leveraging connections
with average hitting time and the asymmetric k-center problem.Comment: To appear in KDD 202
Greedy Minimization of Weakly Supermodular Set Functions
Many problems in data mining and unsupervised machine learning take the form of minimizing a set function with cardinality constraints. More explicitly, denote by [n] the set {1,...,n} and let f(S) be a function from 2^[n] to R+. Our goal is to minimize f(S) subject to |S| <= k. These problems include clustering and covering problems as well as sparse regression, matrix approximation problems and many others. These combinatorial problems are hard to minimize in general. Finding good (e.g. constant factor) approximate solutions for them requires significant sophistication and highly specialized algorithms.
In this paper we analyze the behavior of the greedy algorithm to all of these problems. We start by claiming that the functions above are special. A trivial observation is that they are non-negative and non-increasing, that is, f(S) >= f(union(S,T)) >= 0 for any S and T. This immediately shows that expanding solution sets is (at least potentially) beneficial in terms of reducing the function value. But, monotonicity is not sufficient to ensure that any number of greedy extensions of a given solution would significantly reduce the objective function
Pseudorandomness via the discrete Fourier transform
We present a new approach to constructing unconditional pseudorandom
generators against classes of functions that involve computing a linear
function of the inputs. We give an explicit construction of a pseudorandom
generator that fools the discrete Fourier transforms of linear functions with
seed-length that is nearly logarithmic (up to polyloglog factors) in the input
size and the desired error parameter. Our result gives a single pseudorandom
generator that fools several important classes of tests computable in logspace
that have been considered in the literature, including halfspaces (over general
domains), modular tests and combinatorial shapes. For all these classes, our
generator is the first that achieves near logarithmic seed-length in both the
input length and the error parameter. Getting such a seed-length is a natural
challenge in its own right, which needs to be overcome in order to derandomize
RL - a central question in complexity theory.
Our construction combines ideas from a large body of prior work, ranging from
a classical construction of [NN93] to the recent gradually increasing
independence paradigm of [KMN11, CRSW13, GMRTV12], while also introducing some
novel analytic machinery which might find other applications
- …