282 research outputs found
Quadruply Stochastic Gradient Method for Large Scale Nonlinear Semi-Supervised Ordinal Regression AUC Optimization
Semi-supervised ordinal regression (SOR) problems are ubiquitous in
real-world applications, where only a few ordered instances are labeled and
massive instances remain unlabeled. Recent researches have shown that directly
optimizing concordance index or AUC can impose a better ranking on the data
than optimizing the traditional error rate in ordinal regression (OR) problems.
In this paper, we propose an unbiased objective function for SOR AUC
optimization based on ordinal binary decomposition approach. Besides, to handle
the large-scale kernelized learning problems, we propose a scalable algorithm
called QSORAO using the doubly stochastic gradients (DSG) framework for
functional optimization. Theoretically, we prove that our method can converge
to the optimal solution at the rate of , where is the number of
iterations for stochastic data sampling. Extensive experimental results on
various benchmark and real-world datasets also demonstrate that our method is
efficient and effective while retaining similar generalization performance.Comment: 12 pages, 9 figures, conferenc
AUC Optimization from Multiple Unlabeled Datasets
Weakly supervised learning aims to empower machine learning when the perfect
supervision is unavailable, which has drawn great attention from researchers.
Among various types of weak supervision, one of the most challenging cases is
to learn from multiple unlabeled (U) datasets with only a little knowledge of
the class priors, or U learning for short. In this paper, we study the
problem of building an AUC (area under ROC curve) optimization model from
multiple unlabeled datasets, which maximizes the pairwise ranking ability of
the classifier. We propose U-AUC, an AUC optimization approach that
converts the U data into a multi-label AUC optimization problem, and can be
trained efficiently. We show that the proposed U-AUC is effective
theoretically and empirically
A Symmetric Loss Perspective of Reliable Machine Learning
When minimizing the empirical risk in binary classification, it is a common
practice to replace the zero-one loss with a surrogate loss to make the
learning objective feasible to optimize. Examples of well-known surrogate
losses for binary classification include the logistic loss, hinge loss, and
sigmoid loss. It is known that the choice of a surrogate loss can highly
influence the performance of the trained classifier and therefore it should be
carefully chosen. Recently, surrogate losses that satisfy a certain symmetric
condition (aka., symmetric losses) have demonstrated their usefulness in
learning from corrupted labels. In this article, we provide an overview of
symmetric losses and their applications. First, we review how a symmetric loss
can yield robust classification from corrupted labels in balanced error rate
(BER) minimization and area under the receiver operating characteristic curve
(AUC) maximization. Then, we demonstrate how the robust AUC maximization method
can benefit natural language processing in the problem where we want to learn
only from relevant keywords and unlabeled documents. Finally, we conclude this
article by discussing future directions, including potential applications of
symmetric losses for reliable machine learning and the design of non-symmetric
losses that can benefit from the symmetric condition.Comment: Preprint of an Invited Review Articl
Semi-supervised novelty detection
A common setting for novelty detection assumes that labeled examples
from the nominal class are available, but that labeled examples of novelties
are unavailable. The standard (inductive) approach is to declare novelties
where the nominal density is low, which reduces the problem to density level
set estimation. In this paper, we consider the setting where an unlabeled and
possibly contaminated sample is also available at learning time. We argue
that novelty detection in this semi-supervised setting is naturally solved by
a general reduction to a binary classification problem. In particular, a
detector with a desired false positive rate can be achieved through a
reduction to Neyman-Pearson classification. Unlike the inductive approach,
semi-supervised novelty detection (SSND) yields detectors that are optimal
(e.g., statistically consistent) regardless of the distribution on novelties.
Therefore, in novelty detection, unlabeled data have a substantial impact on
the theoretical properties of the decision rule. We validate the practical
utility of SSND with an extensive experimental study. We also show that SSND
provides distribution-free, learning-theoretic solutions to two well known
problems in hypothesis testing. First, our results provide a general solution
to the general two-sample problem, that is, the problem of determining
whether two random samples arise from the same distribution. Second, a
specialization of SSND coincides with the standard -value approach to
multiple testing under the so-called random effects model. Unlike standard
rejection regions based on thresholded -values, the general SSND framework
allows for adaptation to arbitrary alternative distributions
Learning on Graphs with Out-of-Distribution Nodes
Graph Neural Networks (GNNs) are state-of-the-art models for performing
prediction tasks on graphs. While existing GNNs have shown great performance on
various tasks related to graphs, little attention has been paid to the scenario
where out-of-distribution (OOD) nodes exist in the graph during training and
inference. Borrowing the concept from CV and NLP, we define OOD nodes as nodes
with labels unseen from the training set. Since a lot of networks are
automatically constructed by programs, real-world graphs are often noisy and
may contain nodes from unknown distributions. In this work, we define the
problem of graph learning with out-of-distribution nodes. Specifically, we aim
to accomplish two tasks: 1) detect nodes which do not belong to the known
distribution and 2) classify the remaining nodes to be one of the known
classes. We demonstrate that the connection patterns in graphs are informative
for outlier detection, and propose Out-of-Distribution Graph Attention Network
(OODGAT), a novel GNN model which explicitly models the interaction between
different kinds of nodes and separate inliers from outliers during feature
propagation. Extensive experiments show that OODGAT outperforms existing
outlier detection methods by a large margin, while being better or comparable
in terms of in-distribution classification.Comment: Accepted by KDD'2
- …