13 research outputs found
Approximate Clustering with Same-Cluster Queries
Ashtiani et al. proposed a Semi-Supervised Active Clustering framework (SSAC), where the learner is allowed to make adaptive queries to a domain expert. The queries are of the kind "do two given points belong to the same optimal cluster?", where the answers to these queries are assumed to be consistent with a unique optimal solution. There are many clustering contexts where such same cluster queries are feasible. Ashtiani et al. exhibited the power of such queries by showing that any instance of the k-means clustering problem, with additional margin assumption, can be solved efficiently if one is allowed to make O(k^2 log{k} + k log{n}) same-cluster queries. This is interesting since the k-means problem, even with the margin assumption, is NP-hard.
In this paper, we extend the work of Ashtiani et al. to the approximation setting by showing that a few of such same-cluster queries enables one to get a polynomial-time (1+eps)-approximation algorithm for the k-means problem without any margin assumption on the input dataset. Again, this is interesting since the k-means problem is NP-hard to approximate within a factor (1+c) for a fixed constant 0 < c < 1. The number of same-cluster queries used by the algorithm is poly(k/eps) which is independent of the size n of the dataset. Our algorithm is based on the D^2-sampling technique, also known as the k-means++ seeding algorithm. We also give a conditional lower bound on the number of same-cluster queries showing that if the Exponential Time Hypothesis (ETH) holds, then any such efficient query algorithm needs to make Omega (k/poly log k) same-cluster queries. Our algorithm can be extended for the case where the query answers are wrong with some bounded probability. Another result we show for the k-means++ seeding is that a small modification of the k-means++ seeding within the SSAC framework converts it to a constant factor approximation algorithm instead of the well known O(log k)-approximation algorithm
Towards Fast Computation of Certified Robustness for ReLU Networks
Verifying the robustness property of a general Rectified Linear Unit (ReLU)
network is an NP-complete problem [Katz, Barrett, Dill, Julian and Kochenderfer
CAV17]. Although finding the exact minimum adversarial distortion is hard,
giving a certified lower bound of the minimum distortion is possible. Current
available methods of computing such a bound are either time-consuming or
delivering low quality bounds that are too loose to be useful. In this paper,
we exploit the special structure of ReLU networks and provide two
computationally efficient algorithms Fast-Lin and Fast-Lip that are able to
certify non-trivial lower bounds of minimum distortions, by bounding the ReLU
units with appropriate linear functions Fast-Lin, or by bounding the local
Lipschitz constant Fast-Lip. Experiments show that (1) our proposed methods
deliver bounds close to (the gap is 2-3X) exact minimum distortion found by
Reluplex in small MNIST networks while our algorithms are more than 10,000
times faster; (2) our methods deliver similar quality of bounds (the gap is
within 35% and usually around 10%; sometimes our bounds are even better) for
larger networks compared to the methods based on solving linear programming
problems but our algorithms are 33-14,000 times faster; (3) our method is
capable of solving large MNIST and CIFAR networks up to 7 layers with more than
10,000 neurons within tens of seconds on a single CPU core.
In addition, we show that, in fact, there is no polynomial time algorithm
that can approximately find the minimum adversarial distortion of a
ReLU network with a approximation ratio unless
=, where is the number of neurons in the network.Comment: Tsui-Wei Weng and Huan Zhang contributed equall