399 research outputs found
Scalable large margin pairwise learning algorithms
2019 Summer.Includes bibliographical references.Classification is a major task in machine learning and data mining applications. Many of these applications involve building a classification model using a large volume of imbalanced data. In such an imbalanced learning scenario, the area under the ROC curve (AUC) has proven to be a reliable performance measure to evaluate a classifier. Therefore, it is desirable to develop scalable learning algorithms that maximize the AUC metric directly. The kernelized AUC maximization machines have established a superior generalization ability compared to linear AUC machines. However, the computational cost of the kernelized machines hinders their scalability. To address this problem, we propose a large-scale nonlinear AUC maximization algorithm that learns a batch linear classifier on approximate feature space computed via the k-means Nyström method. The proposed algorithm is shown empirically to achieve comparable AUC classification performance or even better than the kernel AUC machines, while its training time is faster by several orders of magnitude. However, the computational complexity of the linear batch model compromises its scalability when training sizable datasets. Hence, we develop a second-order online AUC maximization algorithms based on a confidence-weighted model. The proposed algorithms exploit the second-order information to improve the convergence rate and implement a fixed-size buffer to address the multivariate nature of the AUC objective function. We also extend our online linear algorithms to consider an approximate feature map constructed using random Fourier features in an online setting. The results show that our proposed algorithms outperform or are at least comparable to the competing online AUC maximization methods. Despite their scalability, we notice that online first and second-order AUC maximization methods are prone to suboptimal convergence. This can be attributed to the limitation of the hypothesis space. A potential improvement can be attained by learning stochastic online variants. However, the vanilla stochastic methods also suffer from slow convergence because of the high variance introduced by the stochastic process. We address the problem of slow convergence by developing a fast convergence stochastic AUC maximization algorithm. The proposed stochastic algorithm is accelerated using a unique combination of scheduled regularization update and scheduled averaging. The experimental results show that the proposed algorithm performs better than the state-of-the-art online and stochastic AUC maximization methods in terms of AUC classification accuracy. Moreover, we develop a proximal variant of our accelerated stochastic AUC maximization algorithm. The proposed method applies the proximal operator to the hinge loss function. Therefore, it evaluates the gradient of the loss function at the approximated weight vector. Experiments on several benchmark datasets show that our proximal algorithm converges to the optimal solution faster than the previous AUC maximization algorithms
Pairwise Learning via Stagewise Training in Proximal Setting
The pairwise objective paradigms are an important and essential aspect of
machine learning. Examples of machine learning approaches that use pairwise
objective functions include differential network in face recognition, metric
learning, bipartite learning, multiple kernel learning, and maximizing of area
under the curve (AUC). Compared to pointwise learning, pairwise learning's
sample size grows quadratically with the number of samples and thus its
complexity. Researchers mostly address this challenge by utilizing an online
learning system. Recent research has, however, offered adaptive sample size
training for smooth loss functions as a better strategy in terms of convergence
and complexity, but without a comprehensive theoretical study. In a distinct
line of research, importance sampling has sparked a considerable amount of
interest in finite pointwise-sum minimization. This is because of the
stochastic gradient variance, which causes the convergence to be slowed
considerably. In this paper, we combine adaptive sample size and importance
sampling techniques for pairwise learning, with convergence guarantees for
nonsmooth convex pairwise loss functions. In particular, the model is trained
stochastically using an expanded training set for a predefined number of
iterations derived from the stability bounds. In addition, we demonstrate that
sampling opposite instances at each iteration reduces the variance of the
gradient, hence accelerating convergence. Experiments on a broad variety of
datasets in AUC maximization confirm the theoretical results.Comment: 10 Page
Fast Objective & Duality Gap Convergence for Nonconvex-Strongly-Concave Min-Max Problems
This paper focuses on stochastic methods for solving smooth non-convex
strongly-concave min-max problems, which have received increasing attention due
to their potential applications in deep learning (e.g., deep AUC maximization,
distributionally robust optimization). However, most of the existing algorithms
are slow in practice, and their analysis revolves around the convergence to a
nearly stationary point. We consider leveraging the Polyak-\L ojasiewicz (PL)
condition to design faster stochastic algorithms with stronger convergence
guarantee. Although PL condition has been utilized for designing many
stochastic minimization algorithms, their applications for non-convex min-max
optimization remain rare. In this paper, we propose and analyze a generic
framework of proximal epoch-based method with many well-known stochastic
updates embeddable. Fast convergence is established in terms of both {\bf the
primal objective gap and the duality gap}. Compared with existing studies, (i)
our analysis is based on a novel Lyapunov function consisting of the primal
objective gap and the duality gap of a regularized function, and (ii) the
results are more comprehensive with improved rates that have better dependence
on the condition number under different assumptions. We also conduct deep and
non-deep learning experiments to verify the effectiveness of our methods
Algorithmic Foundations of Empirical X-risk Minimization
This manuscript introduces a new optimization framework for machine learning
and AI, named {\bf empirical X-risk minimization (EXM)}. X-risk is a term
introduced to represent a family of compositional measures or objectives, in
which each data point is compared with a large number of items explicitly or
implicitly for defining a risk function. It includes surrogate objectives of
many widely used measures and non-decomposable losses, e.g., AUROC, AUPRC,
partial AUROC, NDCG, MAP, precision/recall at top positions, precision at a
certain recall level, listwise losses, p-norm push, top push, global
contrastive losses, etc. While these non-decomposable objectives and their
optimization algorithms have been studied in the literature of machine
learning, computer vision, information retrieval, and etc, optimizing these
objectives has encountered some unique challenges for deep learning. In this
paper, we present recent rigorous efforts for EXM with a focus on its
algorithmic foundations and its applications. We introduce a class of
algorithmic techniques for solving EXM with smooth non-convex objectives. We
formulate EXM into three special families of non-convex optimization problems
belonging to non-convex compositional optimization, non-convex min-max
optimization and non-convex bilevel optimization, respectively. For each family
of problems, we present some strong baseline algorithms and their complexities,
which will motivate further research for improving the existing results.
Discussions about the presented results and future studies are given at the
end. Efficient algorithms for optimizing a variety of X-risks are implemented
in the LibAUC library at \url{www.libauc.org}
Stochastic Variance Reduction Methods for Saddle-Point Problems
We consider convex-concave saddle-point problems where the objective
functions may be split in many components, and extend recent stochastic
variance reduction methods (such as SVRG or SAGA) to provide the first
large-scale linearly convergent algorithms for this class of problems which is
common in machine learning. While the algorithmic extension is straightforward,
it comes with challenges and opportunities: (a) the convex minimization
analysis does not apply and we use the notion of monotone operators to prove
convergence, showing in particular that the same algorithm applies to a larger
class of problems, such as variational inequalities, (b) there are two notions
of splits, in terms of functions, or in terms of partial derivatives, (c) the
split does need to be done with convex-concave terms, (d) non-uniform sampling
is key to an efficient algorithm, both in theory and practice, and (e) these
incremental algorithms can be easily accelerated using a simple extension of
the "catalyst" framework, leading to an algorithm which is always superior to
accelerated batch algorithms.Comment: Neural Information Processing Systems (NIPS), 2016, Barcelona, Spai
Efficient Cross-Device Federated Learning Algorithms for Minimax Problems
In many machine learning applications where massive and privacy-sensitive
data are generated on numerous mobile or IoT devices, collecting data in a
centralized location may be prohibitive. Thus, it is increasingly attractive to
estimate parameters over mobile or IoT devices while keeping data localized.
Such learning setting is known as cross-device federated learning. In this
paper, we propose the first theoretically guaranteed algorithms for general
minimax problems in the cross-device federated learning setting. Our algorithms
require only a fraction of devices in each round of training, which overcomes
the difficulty introduced by the low availability of devices. The communication
overhead is further reduced by performing multiple local update steps on
clients before communication with the server, and global gradient estimates are
leveraged to correct the bias in local update directions introduced by data
heterogeneity. By developing analyses based on novel potential functions, we
establish theoretical convergence guarantees for our algorithms. Experimental
results on AUC maximization, robust adversarial network training, and GAN
training tasks demonstrate the efficiency of our algorithms
- …