9 research outputs found

    No-substitution k-means Clustering with Adversarial Order

    Full text link
    We investigate kk-means clustering in the online no-substitution setting when the input arrives in \emph{arbitrary} order. In this setting, points arrive one after another, and the algorithm is required to instantly decide whether to take the current point as a center before observing the next point. Decisions are irrevocable. The goal is to minimize both the number of centers and the kk-means cost. Previous works in this setting assume that the input's order is random, or that the input's aspect ratio is bounded. It is known that if the order is arbitrary and there is no assumption on the input, then any algorithm must take all points as centers. Moreover, assuming a bounded aspect ratio is too restrictive -- it does not include natural input generated from mixture models. We introduce a new complexity measure that quantifies the difficulty of clustering a dataset arriving in arbitrary order. We design a new random algorithm and prove that if applied on data with complexity dd, the algorithm takes O(dlog⁑(n)klog⁑(k))O(d\log(n) k\log(k)) centers and is an O(k3)O(k^3)-approximation. We also prove that if the data is sampled from a ``natural" distribution, such as a mixture of kk Gaussians, then the new complexity measure is equal to O(k2log⁑(n))O(k^2\log(n)). This implies that for data generated from those distributions, our new algorithm takes only poly(klog⁑(n))\text{poly}(k\log(n)) centers and is a poly(k)\text{poly}(k)-approximation. In terms of negative results, we prove that the number of centers needed to achieve an α\alpha-approximation is at least Ω(dklog⁑(nα))\Omega\left(\frac{d}{k\log(n\alpha)}\right).Comment: accepted to ALT 202

    What relations are reliably embeddable in Euclidean space?

    Full text link
    We consider the problem of embedding a relation, represented as a directed graph, into Euclidean space. For three types of embeddings motivated by the recent literature on knowledge graphs, we obtain characterizations of which relations they are able to capture, as well as bounds on the minimal dimensionality and precision needed.Comment: submitted to COLT 201

    Sample Complexity of Adversarially Robust Linear Classification on Separated Data

    Full text link
    We consider the sample complexity of learning with adversarial robustness. Most prior theoretical results for this problem have considered a setting where different classes in the data are close together or overlapping. Motivated by some real applications, we consider, in contrast, the well-separated case where there exists a classifier with perfect accuracy and robustness, and show that the sample complexity narrates an entirely different story. Specifically, for linear classifiers, we show a large class of well-separated distributions where the expected robust loss of any algorithm is at least Ξ©(dn)\Omega(\frac{d}{n}), whereas the max margin algorithm has expected standard loss O(1n)O(\frac{1}{n}). This shows a gap in the standard and robust losses that cannot be obtained via prior techniques. Additionally, we present an algorithm that, given an instance where the robustness radius is much smaller than the gap between the classes, gives a solution with expected robust loss is O(1n)O(\frac{1}{n}). This shows that for very well-separated data, convergence rates of O(1n)O(\frac{1}{n}) are achievable, which is not the case otherwise. Our results apply to robustness measured in any β„“p\ell_p norm with p>1p > 1 (including p=∞p = \infty)

    Consistent Non-Parametric Methods for Maximizing Robustness

    Full text link
    Learning classifiers that are robust to adversarial examples has received a great deal of recent attention. A major drawback of the standard robust learning framework is there is an artificial robustness radius rr that applies to all inputs. This ignores the fact that data may be highly heterogeneous, in which case it is plausible that robustness regions should be larger in some regions of data, and smaller in others. In this paper, we address this limitation by proposing a new limit classifier, called the neighborhood optimal classifier, that extends the Bayes optimal classifier outside its support by using the label of the closest in-support point. We then argue that this classifier maximizes the size of its robustness regions subject to the constraint of having accuracy equal to the Bayes optimal. We then present sufficient conditions under which general non-parametric methods that can be represented as weight functions converge towards this limit, and show that both nearest neighbors and kernel classifiers satisfy them under certain conditions.Comment: accepted to Nuerips 202

    Theoretical Foundations of Trustworthy Machine Learning

    No full text

    Structure from Voltage

    Full text link
    Effective resistance (ER) is an attractive way to interrogate the structure of graphs. It is an alternative to computing the eigen-vectors of the graph Laplacian. Graph laplacians are used to find low dimensional structures in high dimensional data. Here too, ER based analysis has advantages over eign-vector based methods. Unfortunately Von Luxburg et al. (2010) show that, when vertices correspond to a sample from a distribution over a metric space, the limit of the ER between distant points converges to a trivial quantity that holds no information about the structure of the graph. We show that by using scaling resistances in a graph with nn vertices by n2n^2, one gets a meaningful limit of the voltages and of effective resistances. We also show that by adding a "ground" node to a metric graph one gets a simple and natural way to compute all of the distances from a chosen point to all other points

    Effective resistance in metric spaces

    Full text link
    Effective resistance (ER) is an attractive way to interrogate the structure of graphs. It is an alternative to computing the eigenvectors of the graph Laplacian. One attractive application of ER is to point clouds, i.e. graphs whose vertices correspond to IID samples from a distribution over a metric space. Unfortunately, it was shown that the ER between any two points converges to a trivial quantity that holds no information about the graph's structure as the size of the sample increases to infinity. In this study, we show that this trivial solution can be circumvented by considering a region-based ER between pairs of small regions rather than pairs of points and by scaling the edge weights appropriately with respect to the underlying density in each region. By keeping the regions fixed, we show analytically that the region-based ER converges to a non-trivial limit as the number of points increases to infinity. Namely the ER on a metric space. We support our theoretical findings with numerical experiments
    corecore