14 research outputs found

    A Spectral Bound on Hypergraph Discrepancy

    Get PDF
    Let H\mathcal{H} be a tt-regular hypergraph on nn vertices and mm edges. Let MM be the m×nm \times n incidence matrix of H\mathcal{H} and let us denote λ=maxv1,v=1Mv\lambda =\max_{v \perp \overline{1},\|v\| = 1}\|Mv\|. We show that the discrepancy of H\mathcal{H} is O(t+λ)O(\sqrt{t} + \lambda). As a corollary, this gives us that for every tt, the discrepancy of a random tt-regular hypergraph with nn vertices and mnm \geq n edges is almost surely O(t)O(\sqrt{t}) as nn grows. The proof also gives a polynomial time algorithm that takes a hypergraph as input and outputs a coloring with the above guarantee.Comment: 18 pages. arXiv admin note: substantial text overlap with arXiv:1811.01491, several changes to the presentatio

    Extending the Centerpoint Theorem to Multiple Points

    Get PDF
    The centerpoint theorem is a well-known and widely used result in discrete geometry. It states that for any point set P of n points in R^d, there is a point c, not necessarily from P, such that each halfspace containing c contains at least n/(d+1) points of P. Such a point c is called a centerpoint, and it can be viewed as a generalization of a median to higher dimensions. In other words, a centerpoint can be interpreted as a good representative for the point set P. But what if we allow more than one representative? For example in one-dimensional data sets, often certain quantiles are chosen as representatives instead of the median. We present a possible extension of the concept of quantiles to higher dimensions. The idea is to find a set Q of (few) points such that every halfspace that contains one point of Q contains a large fraction of the points of P and every halfspace that contains more of Q contains an even larger fraction of P. This setting is comparable to the well-studied concepts of weak epsilon-nets and weak epsilon-approximations, where it is stronger than the former but weaker than the latter. We show that for any point set of size n in R^d and for any positive alpha_1,...,alpha_k where alpha_1 <= alpha_2 <= ... <= alpha_k and for every i,j with i+j <= k+1 we have that (d-1)alpha_k+alpha_i+alpha_j <= 1, we can find Q of size k such that each halfspace containing j points of Q contains least alpha_j n points of P. For two-dimensional point sets we further show that for every alpha and beta with alpha <= beta and alpha+beta <= 2/3 we can find Q with |Q|=3 such that each halfplane containing one point of Q contains at least alpha n of the points of P and each halfplane containing all of Q contains at least beta n points of P. All these results generalize to the setting where P is any mass distribution. For the case where P is a point set in R^2 and |Q|=2, we provide algorithms to find such points in time O(n log^3 n)

    Optimal Approximation of Zonoids and Uniform Approximation by Shallow Neural Networks

    Full text link
    We study the following two related problems. The first is to determine to what error an arbitrary zonoid in Rd+1\mathbb{R}^{d+1} can be approximated in the Hausdorff distance by a sum of nn line segments. The second is to determine optimal approximation rates in the uniform norm for shallow ReLUk^k neural networks on their variation spaces. The first of these problems has been solved for d2,3d\neq 2,3, but when d=2,3d=2,3 a logarithmic gap between the best upper and lower bounds remains. We close this gap, which completes the solution in all dimensions. For the second problem, our techniques significantly improve upon existing approximation rates when k1k\geq 1, and enable uniform approximation of both the target function and its derivatives

    Discrepancy, chaining and subgaussian processes

    Full text link
    We show that for a typical coordinate projection of a subgaussian class of functions, the infimum over signs inf(ϵi)supfFi=1kϵif(Xi)\inf_{(\epsilon_i)}{\sup_{f\in F}}|{\sum_{i=1}^k\epsilon_i}f(X_i)| is asymptotically smaller than the expectation over signs as a function of the dimension kk, if the canonical Gaussian process indexed by FF is continuous. To that end, we establish a bound on the discrepancy of an arbitrary subset of Rk\mathbb {R}^k using properties of the canonical Gaussian process the set indexes, and then obtain quantitative structural information on a typical coordinate projection of a subgaussian class.Comment: Published in at http://dx.doi.org/10.1214/10-AOP575 the Annals of Probability (http://www.imstat.org/aop/) by the Institute of Mathematical Statistics (http://www.imstat.org

    Boosting Simple Learners

    Get PDF
    Boosting is a celebrated machine learning approach which is based on the idea of combining weak and moderately inaccurate hypotheses to a strong and accurate one. We study boosting under the assumption that the weak hypotheses belong to a class of bounded capacity. This assumption is inspired by the common convention that weak hypotheses are "rules-of-thumbs" from an "easy-to-learn class". (Schapire and Freund '12, Shalev-Shwartz and Ben-David '14.) Formally, we assume the class of weak hypotheses has a bounded VC dimension. We focus on two main questions: (i) Oracle Complexity: How many weak hypotheses are needed in order to produce an accurate hypothesis? We design a novel boosting algorithm and demonstrate that it circumvents a classical lower bound by Freund and Schapire ('95, '12). Whereas the lower bound shows that Ω(1/γ2)\Omega({1}/{\gamma^2}) weak hypotheses with γ\gamma-margin are sometimes necessary, our new method requires only O~(1/γ)\tilde{O}({1}/{\gamma}) weak hypothesis, provided that they belong to a class of bounded VC dimension. Unlike previous boosting algorithms which aggregate the weak hypotheses by majority votes, the new boosting algorithm uses more complex ("deeper") aggregation rules. We complement this result by showing that complex aggregation rules are in fact necessary to circumvent the aforementioned lower bound. (ii) Expressivity: Which tasks can be learned by boosting weak hypotheses from a bounded VC class? Can complex concepts that are "far away" from the class be learned? Towards answering the first question we identify a combinatorial-geometric parameter which captures the expressivity of base-classes in boosting. As a corollary we provide an affirmative answer to the second question for many well-studied classes, including half-spaces and decision stumps. Along the way, we establish and exploit connections with Discrepancy Theory.Comment: A minor revision according to STOC review
    corecore