3,787 research outputs found
Rank-based Decomposable Losses in Machine Learning: A Survey
Recent works have revealed an essential paradigm in designing loss functions
that differentiate individual losses vs. aggregate losses. The individual loss
measures the quality of the model on a sample, while the aggregate loss
combines individual losses/scores over each training sample. Both have a common
procedure that aggregates a set of individual values to a single numerical
value. The ranking order reflects the most fundamental relation among
individual values in designing losses. In addition, decomposability, in which a
loss can be decomposed into an ensemble of individual terms, becomes a
significant property of organizing losses/scores. This survey provides a
systematic and comprehensive review of rank-based decomposable losses in
machine learning. Specifically, we provide a new taxonomy of loss functions
that follows the perspectives of aggregate loss and individual loss. We
identify the aggregator to form such losses, which are examples of set
functions. We organize the rank-based decomposable losses into eight
categories. Following these categories, we review the literature on rank-based
aggregate losses and rank-based individual losses. We describe general formulas
for these losses and connect them with existing research topics. We also
suggest future research directions spanning unexplored, remaining, and emerging
issues in rank-based decomposable losses.Comment: Accepted by IEEE Transactions on Pattern Analysis and Machine
Intelligence (TPAMI
Active Learning of Classification Models from Enriched Label-related Feedback
Our ability to learn accurate classification models from data is often limited by the number of available labeled data instances. This limitation is of particular concern when data instances need to be manually labeled by human annotators and when the labeling process carries a significant cost. Recent years witnessed increased research interest in developing methods in different directions capable of learning models from a smaller number of examples. One such direction is active learning, which finds the most informative unlabeled instances to be labeled next. Another, more recent direction showing a great promise utilizes enriched label-related feedback. In this case, such feedback from the human annotator provides additional information reflecting the relations among possible labels. The cost of such feedback is often negligible compared with the cost of instance review. The enriched label-related feedback may come in different forms. In this work, we propose, develop and study classification models for binary, multi-class and multi-label classification problems that utilize the different forms of enriched label-related feedback. We show that this new feedback can help us improve the quality of classification models compared with the standard class-label feedback. For each of the studied feedback forms, we also develop new active learning strategies for selecting the most informative unlabeled instances that are compatible with the respective feedback form, effectively combining two approaches for reducing the number of required labeled instances. We demonstrate the effectiveness of our new framework on both simulated and real-world datasets
A Survey on Metric Learning for Feature Vectors and Structured Data
The need for appropriate ways to measure the distance or similarity between
data is ubiquitous in machine learning, pattern recognition and data mining,
but handcrafting such good metrics for specific problems is generally
difficult. This has led to the emergence of metric learning, which aims at
automatically learning a metric from data and has attracted a lot of interest
in machine learning and related fields for the past ten years. This survey
paper proposes a systematic review of the metric learning literature,
highlighting the pros and cons of each approach. We pay particular attention to
Mahalanobis distance metric learning, a well-studied and successful framework,
but additionally present a wide range of methods that have recently emerged as
powerful alternatives, including nonlinear metric learning, similarity learning
and local metric learning. Recent trends and extensions, such as
semi-supervised metric learning, metric learning for histogram data and the
derivation of generalization guarantees, are also covered. Finally, this survey
addresses metric learning for structured data, in particular edit distance
learning, and attempts to give an overview of the remaining challenges in
metric learning for the years to come.Comment: Technical report, 59 pages. Changes in v2: fixed typos and improved
presentation. Changes in v3: fixed typos. Changes in v4: fixed typos and new
method
- …