5 research outputs found
Similarity-based Classification: Connecting Similarity Learning to Binary Classification
In real-world classification problems, pairwise supervision (i.e., a pair of
patterns with a binary label indicating whether they belong to the same class
or not) can often be obtained at a lower cost than ordinary class labels.
Similarity learning is a general framework to utilize such pairwise supervision
to elicit useful representations by inferring the relationship between two data
points, which encompasses various important preprocessing tasks such as metric
learning, kernel learning, graph embedding, and contrastive representation
learning. Although elicited representations are expected to perform well in
downstream tasks such as classification, little theoretical insight has been
given in the literature so far. In this paper, we reveal that a specific
formulation of similarity learning is strongly related to the objective of
binary classification, which spurs us to learn a binary classifier without
ordinary class labels---by fitting the product of real-valued prediction
functions of pairwise patterns to their similarity. Our formulation of
similarity learning does not only generalize many existing ones, but also
admits an excess risk bound showing an explicit connection to classification.
Finally, we empirically demonstrate the practical usefulness of the proposed
method on benchmark datasets.Comment: 22 page
A Novel Hybrid Ordinal Learning Model with Health Care Application
Ordinal learning (OL) is a type of machine learning models with broad utility
in health care applications such as diagnosis of different grades of a disease
(e.g., mild, modest, severe) and prediction of the speed of disease progression
(e.g., very fast, fast, moderate, slow). This paper aims to tackle a situation
when precisely labeled samples are limited in the training set due to cost or
availability constraints, whereas there could be an abundance of samples with
imprecise labels. We focus on imprecise labels that are intervals, i.e., one
can know that a sample belongs to an interval of labels but cannot know which
unique label it has. This situation is quite common in health care datasets due
to limitations of the diagnostic instrument, sparse clinical visits, or/and
patient dropout. Limited research has been done to develop OL models with
imprecise/interval labels. We propose a new Hybrid Ordinal Learner (HOL) to
integrate samples with both precise and interval labels to train a robust OL
model. We also develop a tractable and efficient optimization algorithm to
solve the HOL formulation. We compare HOL with several recently developed OL
methods on four benchmarking datasets, which demonstrate the superior
performance of HOL. Finally, we apply HOL to a real-world dataset for
predicting the speed of progressing to Alzheimer's Disease (AD) for individuals
with Mild Cognitive Impairment (MCI) based on a combination of multi-modality
neuroimaging and demographic/clinical datasets. HOL achieves high accuracy in
the prediction and outperforms existing methods. The capability of accurately
predicting the speed of progression to AD for each individual with MCI has the
potential for helping facilitate more individually-optimized interventional
strategies.Comment: 16 pages, 3 figures, 2 table