576 research outputs found
Discrete-Continuous ADMM for Transductive Inference in Higher-Order MRFs
This paper introduces a novel algorithm for transductive inference in
higher-order MRFs, where the unary energies are parameterized by a variable
classifier. The considered task is posed as a joint optimization problem in the
continuous classifier parameters and the discrete label variables. In contrast
to prior approaches such as convex relaxations, we propose an advantageous
decoupling of the objective function into discrete and continuous subproblems
and a novel, efficient optimization method related to ADMM. This approach
preserves integrality of the discrete label variables and guarantees global
convergence to a critical point. We demonstrate the advantages of our approach
in several experiments including video object segmentation on the DAVIS data
set and interactive image segmentation
Similarity Learning for Provably Accurate Sparse Linear Classification
In recent years, the crucial importance of metrics in machine learning
algorithms has led to an increasing interest for optimizing distance and
similarity functions. Most of the state of the art focus on learning
Mahalanobis distances (requiring to fulfill a constraint of positive
semi-definiteness) for use in a local k-NN algorithm. However, no theoretical
link is established between the learned metrics and their performance in
classification. In this paper, we make use of the formal framework of good
similarities introduced by Balcan et al. to design an algorithm for learning a
non PSD linear similarity optimized in a nonlinear feature space, which is then
used to build a global linear classifier. We show that our approach has uniform
stability and derive a generalization bound on the classification error.
Experiments performed on various datasets confirm the effectiveness of our
approach compared to state-of-the-art methods and provide evidence that (i) it
is fast, (ii) robust to overfitting and (iii) produces very sparse classifiers.Comment: Appears in Proceedings of the 29th International Conference on
Machine Learning (ICML 2012
Machine learning with Lipschitz classifiers
Magdeburg, Univ., Fak. für Elektrotechnik und Informationstechnik, Diss., 2010André Stuhlsat
Low Budget Active Learning via Wasserstein Distance: An Integer Programming Approach
Given restrictions on the availability of data, active learning is the
process of training a model with limited labeled data by selecting a core
subset of an unlabeled data pool to label. Although selecting the most useful
points for training is an optimization problem, the scale of deep learning data
sets forces most selection strategies to employ efficient heuristics. Instead,
we propose a new integer optimization problem for selecting a core set that
minimizes the discrete Wasserstein distance from the unlabeled pool. We
demonstrate that this problem can be tractably solved with a Generalized
Benders Decomposition algorithm. Our strategy requires high-quality latent
features which we obtain by unsupervised learning on the unlabeled pool.
Numerical results on several data sets show that our optimization approach is
competitive with baselines and particularly outperforms them in the low budget
regime where less than one percent of the data set is labeled
Classification with Asymmetric Label Noise: Consistency and Maximal Denoising
In many real-world classification problems, the labels of training examples
are randomly corrupted. Most previous theoretical work on classification with
label noise assumes that the two classes are separable, that the label noise is
independent of the true class label, or that the noise proportions for each
class are known. In this work, we give conditions that are necessary and
sufficient for the true class-conditional distributions to be identifiable.
These conditions are weaker than those analyzed previously, and allow for the
classes to be nonseparable and the noise levels to be asymmetric and unknown.
The conditions essentially state that a majority of the observed labels are
correct and that the true class-conditional distributions are "mutually
irreducible," a concept we introduce that limits the similarity of the two
distributions. For any label noise problem, there is a unique pair of true
class-conditional distributions satisfying the proposed conditions, and we
argue that this pair corresponds in a certain sense to maximal denoising of the
observed distributions.
Our results are facilitated by a connection to "mixture proportion
estimation," which is the problem of estimating the maximal proportion of one
distribution that is present in another. We establish a novel rate of
convergence result for mixture proportion estimation, and apply this to obtain
consistency of a discrimination rule based on surrogate loss minimization.
Experimental results on benchmark data and a nuclear particle classification
problem demonstrate the efficacy of our approach
Asymmetric Certified Robustness via Feature-Convex Neural Networks
Recent works have introduced input-convex neural networks (ICNNs) as learning
models with advantageous training, inference, and generalization properties
linked to their convex structure. In this paper, we propose a novel
feature-convex neural network architecture as the composition of an ICNN with a
Lipschitz feature map in order to achieve adversarial robustness. We consider
the asymmetric binary classification setting with one "sensitive" class, and
for this class we prove deterministic, closed-form, and easily-computable
certified robust radii for arbitrary -norms. We theoretically justify
the use of these models by characterizing their decision region geometry,
extending the universal approximation theorem for ICNN regression to the
classification setting, and proving a lower bound on the probability that such
models perfectly fit even unstructured uniformly distributed data in
sufficiently high dimensions. Experiments on Malimg malware classification and
subsets of MNIST, Fashion-MNIST, and CIFAR-10 datasets show that feature-convex
classifiers attain state-of-the-art certified -radii as well as
substantial - and -radii while being far more
computationally efficient than any competitive baseline.Comment: 37th Conference on Neural Information Processing Systems (NeurIPS
2023
- …