Search CORE

54 research outputs found

Testing +/- 1-Weight Halfspaces

Author: A. Hajnal
E. Fischer
H. Block
J. Shawe-Taylor
M. Minsky
O. Goldreich
S. Kulkarni
V.V. Petrov
W. Feller
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2009
Field of study

We consider the problem of testing whether a Boolean function f:{ − 1,1} [superscript n] →{ − 1,1} is a ±1-weight halfspace, i.e. a function of the form f(x) = sgn(w [subscript 1] x [subscript 1] + w [subscript 2] x [subscript 2 ]+ ⋯ + w [subscript n] x [subscript n] ) where the weights w i take values in { − 1,1}. We show that the complexity of this problem is markedly different from the problem of testing whether f is a general halfspace with arbitrary weights. While the latter can be done with a number of queries that is independent of n [7], to distinguish whether f is a ±-weight halfspace versus ε-far from all such halfspaces we prove that nonadaptive algorithms must make Ω(logn) queries. We complement this lower bound with a sublinear upper bound showing that

O(\sqrt{n}\cdot

poly

(\frac{1}{\epsilon}))

queries suffice

CiteSeerX

DSpace@MIT

Crossref

Testing k-wise independent distributions

Author: Xie Ning, Ph. D. Massachusetts Institute of Technology
Publication venue: Massachusetts Institute of Technology
Publication date: 01/01/2012
Field of study

Thesis (Ph. D.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2012.Cataloged from PDF version of thesis.Includes bibliographical references (p. 119-123).A probability distribution over {0, 1}' is k-wise independent if its restriction to any k coordinates is uniform. More generally, a discrete distribution D over E1 x ... x E, is called (non-uniform) k-wise independent if for any subset of k indices {ii, . . . , ik} and for any zi E Ei 1, .. , Zk E Eik , PrX~D [Xi 1 - - -Xi, = Z1 .. z] = PrX-D[Xi 1 = zi] ... PrX~D [Xik = Zk]. k-wise independent distributions look random "locally" to an observer of only k coordinates, even though they may be far from random "globally". Because of this key feature, k-wise independent distributions are important concepts in probability, complexity, and algorithm design. In this thesis, we study the problem of testing (non-uniform) k-wise independent distributions over product spaces. For the problem of distinguishing k-wise independent distributions supported on the Boolean cube from those that are 6-far in statistical distance from any k-wise independent distribution, we upper bound the number of required samples by O(nk/6 2 ) and lower bound it by Q (n 2 /6) (these bounds hold for constant k, and essentially the same bounds hold for general k). To achieve these bounds, we use novel Fourier analysis techniques to relate a distribution's statistical distance from k-wise independence to its biases, a measure of the parity imbalance it induces on a set of variables. The relationships we derive are tighter than previously known, and may be of independent interest. We then generalize our results to distributions over larger domains. For the uniform case we show an upper bound on the distance between a distribution D from k-wise independent distributions in terms of the sum of Fourier coefficients of D at vectors of weight at most k. For the non-uniform case, we give a new characterization of distributions being k-wise independent and further show that such a characterization is robust based on our results for the uniform case. Our results yield natural testing algorithms for k-wise independence with time and sample complexity sublinear in terms of the support size of the distribution when k is a constant. The main technical tools employed include discrete Fourier transform and the theory of linear systems of congruences.by Ning Xie.Ph.D

DSpace@MIT

Query-Efficient Computation in Property Testing and Learning Theory

Author: Garcia Soriano D. (David)
Publication venue
Publication date: 25/04/2012
Field of study

CWI's Institutional Repository

Some Communication Complexity Results and their Applications

Author: Brody Joshua E
Publication venue: Dartmouth Digital Commons
Publication date: 01/11/2010
Field of study

Communication Complexity represents one of the premier techniques for proving lower bounds in theoretical computer science. Lower bounds on communication problems can be leveraged to prove lower bounds in several different areas. In this work, we study three different communication complexity problems. The lower bounds for these problems have applications in circuit complexity, wireless sensor networks, and streaming algorithms. First, we study the multiparty pointer jumping problem. We present the first nontrivial upper bound for this problem. We also provide a suite of strong lower bounds under several restricted classes of protocols. Next, we initiate the study of several non-monotone functions in the distributed functional monitoring setting and provide several lower bounds. In particular, we give a generic adversarial technique and show that when deletions are allowed, no nontrivial protocol is possible. Finally, we study the Gap-Hamming-Distance problem and give tight lower bounds for protocols that use a constant number of messages. As a result, we take a well-known lower bound for one-pass streaming algorithms for a host of problems and extend it so it applies to streaming algorithms that use a constant number of passes

Dartmouth Digital Commons (Dartmouth College)

Analyzing massive datasets with missing entries: models and algorithms

Author: Varma Nithin
Publication venue
Publication date: 24/02/2020
Field of study

We initiate a systematic study of computational models to analyze algorithms for massive datasets with missing or erased entries and study the relationship of our models with existing algorithmic models for large datasets. We focus on algorithms whose inputs are naturally represented as functions, codewords, or graphs. First, we generalize the property testing model, one of the most widely studied models of sublinear-time algorithms, to account for the presence of adversarially erased function values. We design efficient erasure-resilient property testing algorithms for several fundamental properties of real-valued functions such as monotonicity, Lipschitz property, convexity, and linearity. We then investigate the problems of local decoding and local list decoding of codewords containing erasures. We show that, in some cases, these problems are strictly easier than the corresponding problems of decoding codewords containing errors. Moreover, we use this understanding to show a separation between our erasure-resilient property testing model and the (error) tolerant property testing model. The philosophical message of this separation is that errors occurring in large datasets are, in general, harder to deal with, than erasures. Finally, we develop models and notions to reason about algorithms that are intended to run on large graphs with missing edges. While running algorithms on large graphs containing several missing edges, it is desirable to output solutions that are close to the solutions output when there are no missing edges. With this motivation, we define average sensitivity, a robustness metric for graph algorithms. We discuss various useful features of our definition and design approximation algorithms with good average sensitivity bounds for several optimization problems on graphs. We also define a model of erasure-resilient sublinear-time graph algorithms and design an efficient algorithm for testing connectivity of graphs

Boston University Institutional Repository (OpenBU)

Recommended from our members

Testing Convexity and Acyclicity, and New Constructions for Dense Graph Embeddings

Author: Sun Timothy
Publication venue: 'Columbia University Libraries/Information Services'
Publication date: 01/01/2019
Field of study

Property testing, especially that of geometric and graph properties, is an ongoing area of research. In this thesis, we present a result from each of the two areas. For the problem of convexity testing in high dimensions, we give nearly matching upper and lower bounds for the sample complexity of algorithms have one-sided and two-sided error, where algorithms only have access to labeled samples independently drawn from the standard multivariate Gaussian. In the realm of graph property testing, we give an improved lower bound for testing acyclicity in directed graphs of bounded degree. Central to the area of topological graph theory is the genus parameter, but the complexity of determining the genus of a graph is poorly understood when graphs become nearly complete. We summarize recent progress in understanding the space of minimum genus embeddings of such dense graphs. In particular, we classify all possible face distributions realizable by minimum genus embeddings of complete graphs, present new constructions for genus embeddings of the complete graphs, and find unified constructions for minimum triangulations of surfaces

Columbia University Academic Commons

Online Learning in Dynamically Changing Environments

Author: Grama Ananth
Szpankowski Wojciech
Wu Changlong
Publication venue
Publication date: 31/01/2023
Field of study

We study the problem of online learning and online regret minimization when samples are drawn from a general unknown non-stationary process. We introduce the concept of a dynamic changing process with cost

K

, where the conditional marginals of the process can vary arbitrarily, but that the number of different conditional marginals is bounded by

K

over

T

rounds. For such processes we prove a tight (upto

\sqrt{\log T}

factor) bound

O(\sqrt{KT\cdot\mathsf{VC}(\mathcal{H})\log T})

for the expected worst case regret of any finite VC-dimensional class

\mathcal{H}

under absolute loss (i.e., the expected miss-classification loss). We then improve this bound for general mixable losses, by establishing a tight (up to

\log^3 T

factor) regret bound

O(K\cdot\mathsf{VC}(\mathcal{H})\log^3 T)

. We extend these results to general smooth adversary processes with unknown reference measure by showing a sub-linear regret bound for

1

-dimensional threshold functions under a general bounded convex loss. Our results can be viewed as a first step towards regret analysis with non-stationary samples in the distribution blind (universal) regime. This also brings a new viewpoint that shifts the study of complexity of the hypothesis classes to the study of the complexity of processes generating data.Comment: Submitte

arXiv.org e-Print Archive

LIPIcs, Volume 251, ITCS 2023, Complete Volume

Author: Tauman Kalai Yael
Publication venue: LIPIcs - Leibniz International Proceedings in Informatics. 14th Innovations in Theoretical Computer Science Conference (ITCS 2023)
Publication date: 01/01/2023
Field of study

LIPIcs, Volume 251, ITCS 2023, Complete Volum

Dagstuhl Research Online Publication Server

Doctor of Philosophy

Author: Daruki Samira
Publication venue: University of Utah
Publication date: 01/01/2018
Field of study

dissertationThe contributions of this dissertation are centered around designing new algorithms in the general area of sublinear algorithms such as streaming, core sets and sublinear verification, with a special interest in problems arising from data analysis including data summarization, clustering, matrix problems and massive graphs. In the first part, we focus on summaries and coresets, which are among the main techniques for designing sublinear algorithms for massive data sets. We initiate the study of coresets for uncertain data and study coresets for various types of range counting queries on uncertain data. We focus mainly on the indecisive model of locational uncertainty since it comes up frequently in real-world applications when multiple readings of the same object are made. In this model, each uncertain point has a probability density describing its location, defined as

k

distinct locations. Our goal is to construct a subset of the uncertain points, including their locational uncertainty, so that range counting queries can be answered by examining only this subset. For each type of query we provide coreset constructions with approximation-size trade-offs. We show that random sampling can be used to construct each type of coreset, and we also provide significantly improved bounds using discrepancy-based techniques on axis-aligned range queries. In the second part, we focus on designing sublinear-space algorithms for approximate computations on massive graphs. In particular, we consider graph MAXCUT and correlation clustering problems and develop sampling based approaches to construct truly sublinear (

o(n)

) sized coresets for graphs that have polynomial (i.e.,

n^{\delta}

for any

\delta >0

) average degree. Our technique is based on analyzing properties of random induced subprograms of the linear program formulations of the problems. We demonstrate this technique with two examples. Firstly, we present a sublinear sized core set to approximate the value of the MAX CUT in a graph to a

(1+\epsilon)

factor. To the best of our knowledge, all the known methods in this regime rely crucially on near-regularity assumptions. Secondly, we apply the same framework to construct a sublinear-sized coreset for correlation clustering. Our coreset construction also suggests 2-pass streaming algorithms for computing the MAX CUT and correlation clustering objective values which are left as future work at the time of writing this dissertation. Finally, we focus on streaming verification algorithms as another model for designing sublinear algorithms. We give the first polylog space and sublinear (in number of edges) communication protocols for any streaming verification problems in graphs. We present efficient streaming interactive proofs that can verify maximum matching exactly. Our results cover all flavors of matchings (bipartite/ nonbipartite and weighted). In addition, we also present streaming verifiers for approximate metric TSP and exact triangle counting, as well as for graph primitives such as the number of connected components, bipartiteness, minimum spanning tree and connectivity. In particular, these are the first results for weighted matchings and for metric TSP in any streaming verification model. Our streaming verifiers use only polylogarithmic space while exchanging only polylogarithmic communication with the prover in addition to the output size of the relevant solution. We also initiate a study of streaming interactive proofs (SIPs) for problems in data analysis and present efficient SIPs for some fundamental problems. We present protocols for clustering and shape fitting including minimum enclosing ball (MEB), width of a point set,

k

-centers and

k

-slab problem. We also present protocols for fundamental matrix analysis problems: We provide an improved protocol for rectangular matrix problems, which in turn can be used to verify

k

(approximate) eigenvectors of an

n \times n

integer matrix

A

. In general our solutions use polylogarithmic rounds of communication and polylogarithmic total communication and verifier space

The University of Utah: J. Willard Marriott Digital Library