160,706 research outputs found
Nearly optimal solutions for the Chow Parameters Problem and low-weight approximation of halfspaces
The \emph{Chow parameters} of a Boolean function
are its degree-0 and degree-1 Fourier coefficients. It has been known
since 1961 (Chow, Tannenbaum) that the (exact values of the) Chow parameters of
any linear threshold function uniquely specify within the space of all
Boolean functions, but until recently (O'Donnell and Servedio) nothing was
known about efficient algorithms for \emph{reconstructing} (exactly or
approximately) from exact or approximate values of its Chow parameters. We
refer to this reconstruction problem as the \emph{Chow Parameters Problem.}
Our main result is a new algorithm for the Chow Parameters Problem which,
given (sufficiently accurate approximations to) the Chow parameters of any
linear threshold function , runs in time \tilde{O}(n^2)\cdot
(1/\eps)^{O(\log^2(1/\eps))} and with high probability outputs a
representation of an LTF that is \eps-close to . The only previous
algorithm (O'Donnell and Servedio) had running time \poly(n) \cdot
2^{2^{\tilde{O}(1/\eps^2)}}.
As a byproduct of our approach, we show that for any linear threshold
function over , there is a linear threshold function which
is \eps-close to and has all weights that are integers at most \sqrt{n}
\cdot (1/\eps)^{O(\log^2(1/\eps))}. This significantly improves the best
previous result of Diakonikolas and Servedio which gave a \poly(n) \cdot
2^{\tilde{O}(1/\eps^{2/3})} weight bound, and is close to the known lower
bound of (1/\eps)^{\Omega(\log \log (1/\eps))}\} (Goldberg,
Servedio). Our techniques also yield improved algorithms for related problems
in learning theory
Approximate F_2-Sketching of Valuation Functions
We study the problem of constructing a linear sketch of minimum dimension that allows approximation of a given real-valued function f : F_2^n - > R with small expected squared error. We develop a general theory of linear sketching for such functions through which we analyze their dimension for most commonly studied types of valuation functions: additive, budget-additive, coverage, alpha-Lipschitz submodular and matroid rank functions. This gives a characterization of how many bits of information have to be stored about the input x so that one can compute f under additive updates to its coordinates.
Our results are tight in most cases and we also give extensions to the distributional version of the problem where the input x in F_2^n is generated uniformly at random. Using known connections with dynamic streaming algorithms, both upper and lower bounds on dimension obtained in our work extend to the space complexity of algorithms evaluating f(x) under long sequences of additive updates to the input x presented as a stream. Similar results hold for simultaneous communication in a distributed setting
Bayesian emulation for optimization in multi-step portfolio decisions
We discuss the Bayesian emulation approach to computational solution of
multi-step portfolio studies in financial time series. "Bayesian emulation for
decisions" involves mapping the technical structure of a decision analysis
problem to that of Bayesian inference in a purely synthetic "emulating"
statistical model. This provides access to standard posterior analytic,
simulation and optimization methods that yield indirect solutions of the
decision problem. We develop this in time series portfolio analysis using
classes of economically and psychologically relevant multi-step ahead portfolio
utility functions. Studies with multivariate currency, commodity and stock
index time series illustrate the approach and show some of the practical
utility and benefits of the Bayesian emulation methodology.Comment: 24 pages, 7 figures, 2 table
Bottom-k and Priority Sampling, Set Similarity and Subset Sums with Minimal Independence
We consider bottom-k sampling for a set X, picking a sample S_k(X) consisting
of the k elements that are smallest according to a given hash function h. With
this sample we can estimate the relative size f=|Y|/|X| of any subset Y as
|S_k(X) intersect Y|/k. A standard application is the estimation of the Jaccard
similarity f=|A intersect B|/|A union B| between sets A and B. Given the
bottom-k samples from A and B, we construct the bottom-k sample of their union
as S_k(A union B)=S_k(S_k(A) union S_k(B)), and then the similarity is
estimated as |S_k(A union B) intersect S_k(A) intersect S_k(B)|/k.
We show here that even if the hash function is only 2-independent, the
expected relative error is O(1/sqrt(fk)). For fk=Omega(1) this is within a
constant factor of the expected relative error with truly random hashing.
For comparison, consider the classic approach of kxmin-wise where we use k
hash independent functions h_1,...,h_k, storing the smallest element with each
hash function. For kxmin-wise there is an at least constant bias with constant
independence, and it is not reduced with larger k. Recently Feigenblat et al.
showed that bottom-k circumvents the bias if the hash function is 8-independent
and k is sufficiently large. We get down to 2-independence for any k. Our
result is based on a simply union bound, transferring generic concentration
bounds for the hashing scheme to the bottom-k sample, e.g., getting stronger
probability error bounds with higher independence.
For weighted sets, we consider priority sampling which adapts efficiently to
the concrete input weights, e.g., benefiting strongly from heavy-tailed input.
This time, the analysis is much more involved, but again we show that generic
concentration bounds can be applied.Comment: A short version appeared at STOC'1
- …