65,939 research outputs found
Label Propagation for Learning with Label Proportions
Learning with Label Proportions (LLP) is the problem of recovering the
underlying true labels given a dataset when the data is presented in the form
of bags. This paradigm is particularly suitable in contexts where providing
individual labels is expensive and label aggregates are more easily obtained.
In the healthcare domain, it is a burden for a patient to keep a detailed diary
of their daily routines, but often they will be amenable to provide higher
level summaries of daily behavior. We present a novel and efficient graph-based
algorithm that encourages local smoothness and exploits the global structure of
the data, while preserving the `mass' of each bag.Comment: Accepted to MLSP 201
Classification with Asymmetric Label Noise: Consistency and Maximal Denoising
In many real-world classification problems, the labels of training examples
are randomly corrupted. Most previous theoretical work on classification with
label noise assumes that the two classes are separable, that the label noise is
independent of the true class label, or that the noise proportions for each
class are known. In this work, we give conditions that are necessary and
sufficient for the true class-conditional distributions to be identifiable.
These conditions are weaker than those analyzed previously, and allow for the
classes to be nonseparable and the noise levels to be asymmetric and unknown.
The conditions essentially state that a majority of the observed labels are
correct and that the true class-conditional distributions are "mutually
irreducible," a concept we introduce that limits the similarity of the two
distributions. For any label noise problem, there is a unique pair of true
class-conditional distributions satisfying the proposed conditions, and we
argue that this pair corresponds in a certain sense to maximal denoising of the
observed distributions.
Our results are facilitated by a connection to "mixture proportion
estimation," which is the problem of estimating the maximal proportion of one
distribution that is present in another. We establish a novel rate of
convergence result for mixture proportion estimation, and apply this to obtain
consistency of a discrimination rule based on surrogate loss minimization.
Experimental results on benchmark data and a nuclear particle classification
problem demonstrate the efficacy of our approach
Combining similarity in time and space for training set formation under concept drift
Concept drift is a challenge in supervised learning for sequential data. It describes a phenomenon when the data distributions change over time. In such a case accuracy of a classifier benefits from the selective sampling for training. We develop a method for training set selection, particularly relevant when the expected drift is gradual. Training set selection at each time step is based on the distance to the target instance. The distance function combines similarity in space and in time. The method determines an optimal training set size online at every time step using cross validation. It is a wrapper approach, it can be used plugging in different base classifiers. The proposed method shows the best accuracy in the peer group on the real and artificial drifting data. The method complexity is reasonable for the field applications
Learning to Rank based on Analogical Reasoning
Object ranking or "learning to rank" is an important problem in the realm of
preference learning. On the basis of training data in the form of a set of
rankings of objects represented as feature vectors, the goal is to learn a
ranking function that predicts a linear order of any new set of objects. In
this paper, we propose a new approach to object ranking based on principles of
analogical reasoning. More specifically, our inference pattern is formalized in
terms of so-called analogical proportions and can be summarized as follows:
Given objects , if object is known to be preferred to , and
relates to as relates to , then is (supposedly) preferred to
. Our method applies this pattern as a main building block and combines it
with ideas and techniques from instance-based learning and rank aggregation.
Based on first experimental results for data sets from various domains (sports,
education, tourism, etc.), we conclude that our approach is highly competitive.
It appears to be specifically interesting in situations in which the objects
are coming from different subdomains, and which hence require a kind of
knowledge transfer.Comment: Thirty-Second AAAI Conference on Artificial Intelligence (AAAI-18), 8
page
- ā¦