19 research outputs found
Approximate degree in classical and quantum computing
In this book, the authors survey what is known about a particularly natural notion of approximation by polynomials, capturing pointwise approximation over the real numbers.FG-2022-18482 - Alfred P. Sloan Foundation; CNS-2046425 - National Science Foundation; CCF-1947889 - National Science FoundationAccepted manuscrip
Recommended from our members
On the Learnability of Monotone Functions
A longstanding lacuna in the field of computational learning theory is the learnability of succinctly representable monotone Boolean functions, i.e., functions that preserve the given order of the input. This thesis makes significant progress towards understanding both the possibilities and the limitations of learning various classes of monotone functions by carefully considering the complexity measures used to evaluate them. We show that Boolean functions computed by polynomial-size monotone circuits are hard to learn assuming the existence of one-way functions. Having shown the hardness of learning general polynomial-size monotone circuits, we show that the class of Boolean functions computed by polynomial-size depth-3 monotone circuits are hard to learn using statistical queries. As a counterpoint, we give a statistical query learning algorithm that can learn random polynomial-size depth-2 monotone circuits (i.e., monotone DNF formulas). As a preliminary step towards a fully polynomial-time, proper learning algorithm for learning polynomial-size monotone decision trees, we also show the relationship between the average depth of a monotone decision tree, its average sensitivity, and its variance. Finally, we return to monotone DNF formulas, and we show that they are teachable (a different model of learning) in the average case. We also show that non-monotone DNF formulas, juntas, and sparse GF2 formulas are teachable in the average case
Stability is Stable: Connections between Replicability, Privacy, and Adaptive Generalization
The notion of replicable algorithms was introduced in Impagliazzo et al.
[STOC '22] to describe randomized algorithms that are stable under the
resampling of their inputs. More precisely, a replicable algorithm gives the
same output with high probability when its randomness is fixed and it is run on
a new i.i.d. sample drawn from the same distribution. Using replicable
algorithms for data analysis can facilitate the verification of published
results by ensuring that the results of an analysis will be the same with high
probability, even when that analysis is performed on a new data set.
In this work, we establish new connections and separations between
replicability and standard notions of algorithmic stability. In particular, we
give sample-efficient algorithmic reductions between perfect generalization,
approximate differential privacy, and replicability for a broad class of
statistical problems. Conversely, we show any such equivalence must break down
computationally: there exist statistical problems that are easy under
differential privacy, but that cannot be solved replicably without breaking
public-key cryptography. Furthermore, these results are tight: our reductions
are statistically optimal, and we show that any computational separation
between DP and replicability must imply the existence of one-way functions.
Our statistical reductions give a new algorithmic framework for translating
between notions of stability, which we instantiate to answer several open
questions in replicability and privacy. This includes giving sample-efficient
replicable algorithms for various PAC learning, distribution estimation, and
distribution testing problems, algorithmic amplification of in
approximate DP, conversions from item-level to user-level privacy, and the
existence of private agnostic-to-realizable learning reductions under
structured distributions.Comment: STOC 2023, minor typos fixe
On Tolerant Testing and Tolerant Junta Testing
Over the past few decades property testing has became an active field of study in theoretical computer science. The algorithmic task is to determine, given access to an unknown large object (e.g., function, graph, probability distribution), whether it has some fixed property, or it is far from any object having the property. The approximate nature of these algorithms allows in many cases to achieve a significant saving in running time, and obtain \emph{sublinear} running time. Nevertheless, in various settings and applications, accepting only inputs that exactly have a certain property is too restrictive, and it is more beneficial to distinguish between inputs that are close to having the property, and those that are far from it. The framework of \emph{tolerant} testing tackles this exact problem. In this thesis, we will focus on one of the most fundamental properties of Boolean functions: the property of being a \emph{-junta} (i.e., being dependent on at most variables).
The first chapter focuses on algorithms for tolerant junta testing. In particular, we show that there exists a \poly(k) query algorithm distinguishing functions close to -juntas and functions that are far from -juntas. We also show how to obtain a trade-off between the ``tolerance" of the algorithm and its query complexity.
The second chapter focuses on establishing a query lower bound for tolerant junta testing. In particular, we show that any non-adaptive tolerant junta tester, is required to make at least \Omega(k^2/\polylog k) queries.
The third chapter considers tolerant testing in a more general context, and asks whether tolerant testing is strictly harder than standard testing. In particular, we show that for any constant , there exists a property \calP_\ell such that \calP_\ell can be tested in queries, but any tolerant tester for \calP_\ell is required to make at least queries (where denote the times iterated log function).
The final chapter focuses on applications. We show how to leverage the techniques developed in previous chapters to obtain results on tolerant isomorphism testing, unateness testing, and erasure resilient testing
Data-Driven Recommender Systems: Sequences of recommendations
This document is about some scalable and reliable methods for recommender systems from a machine learner point of view. In particular it adresses some difficulties from the non stationary case
LIPIcs, Volume 251, ITCS 2023, Complete Volume
LIPIcs, Volume 251, ITCS 2023, Complete Volum
Recommended from our members
Privacy and the Complexity of Simple Queries
As both the scope and scale of data collection increases, an increasingly large amount of sensitive personal information is being analyzed. In this thesis, we study the feasibility of effectively carrying out such analyses while respecting the privacy concerns of all parties involved. In particular, we consider algorithms that satisfy differential privacy [30], a stringent notion of privacy that guarantees no individual’s data has a significant influence on the information released about the database. Over the past decade, there has been tremendous progress in understanding when accurate data analysis is compatible with differential privacy, with both elegant algorithms and striking impossibility results. However, if we ask further when accurate and computationally efficient data analysis is compatible with differential privacy then our understanding lags far behind. In this thesis, we make several contributions to understanding the complexity of differentially private data analysis: We show a sharp upper bound on the number of linear queries that can be accurately answered while satisfying differential privacy by an efficient algorithm, assuming the existence of cryptographic traitor-tracing schemes. We show even stronger computational barriers for algorithms that generate private synthetic data—a new database that consists of “fake” records but preserves certain statistical properties of the original database. Under cryptographic assumptions, any efficient differentially private algorithm that generates synthetic data cannot preserve even extremely simple properties of the database, even the pairwise correlations between the attributes. On the positive side, we design new algorithms for the widely-used class of marginal queries that are faster and require less data. Computational inefficiency is not the only barrier to effective privacy-preserving data analysis. Another potential obstacle is that many of the existing differentially private algorithms do not guarantee privacy for the data analyst, which would lead researchers with sensitive or proprietary queries to seek other means of access to the database. We also contribute to our understanding of privacy for the analyst: We design new algorithms for answering large sets of queries that guarantee differential privacy for the database and ensure differential privacy for the analysts, even if all other analysts collude.Engineering and Applied Science
Untangling hotel industry’s inefficiency: An SFA approach applied to a renowned Portuguese hotel chain
The present paper explores the technical efficiency of four hotels from Teixeira Duarte Group - a renowned Portuguese hotel chain. An efficiency ranking is established from these four hotel units located in Portugal using Stochastic Frontier Analysis. This methodology allows to discriminate between measurement error and systematic inefficiencies in the estimation process enabling to investigate the main inefficiency causes. Several suggestions concerning efficiency improvement are undertaken for each hotel studied.info:eu-repo/semantics/publishedVersio