20,226 research outputs found
Recommended from our members
Learning with Partial Supervision for Clustering and Classification
In the field of machine learning, clustering and classification are two fundamental tasks. Traditionally, clustering is an unsupervised method, where no supervision about the data is available for learning; classification is a supervised task, where fully-labeled data are collected for training a classifier. In some scenarios, however, we may not have the full label but only partial supervision about the data, such as instance similarities or incomplete label assignments. In such cases, traditional clustering and classification methods do not directly apply. To address such problems, this thesis focuses on the task of learning from partial supervision for clustering and classification tasks. For clustering with partial supervision, we investigate three problems: a) constrained clustering in multi-instance multi-label learning, where the goal is to group instances into clusters that respect the background knowledge given by the bag-level labels; b) clustering with constraints, where the partial supervision is expressed as "pairwise constraints" or "relative constraints", regarding similarities about instance pairs and triplets respectively; c) active learning of pairwise constraints for clustering, where the goal is to improve the clustering with minimum human effort by iteratively querying the most informative pairs to an oracle. For classification with partial supervision, we address the problem of multi-label learning where data is associated with a latent label hierarchy and incomplete label assignments, and the goal is to simultaneously discover the latent hierarchy as well as to learn a multi-label classifier that is consistent with the hierarchy.Keywords: Classification, Partial Supervision, Active Learning, Clusterin
Multi-Target Prediction: A Unifying View on Problems and Methods
Multi-target prediction (MTP) is concerned with the simultaneous prediction
of multiple target variables of diverse type. Due to its enormous application
potential, it has developed into an active and rapidly expanding research field
that combines several subfields of machine learning, including multivariate
regression, multi-label classification, multi-task learning, dyadic prediction,
zero-shot learning, network inference, and matrix completion. In this paper, we
present a unifying view on MTP problems and methods. First, we formally discuss
commonalities and differences between existing MTP problems. To this end, we
introduce a general framework that covers the above subfields as special cases.
As a second contribution, we provide a structured overview of MTP methods. This
is accomplished by identifying a number of key properties, which distinguish
such methods and determine their suitability for different types of problems.
Finally, we also discuss a few challenges for future research
Entity Type Prediction in Knowledge Graphs using Embeddings
Open Knowledge Graphs (such as DBpedia, Wikidata, YAGO) have been recognized
as the backbone of diverse applications in the field of data mining and
information retrieval. Hence, the completeness and correctness of the Knowledge
Graphs (KGs) are vital. Most of these KGs are mostly created either via an
automated information extraction from Wikipedia snapshots or information
accumulation provided by the users or using heuristics. However, it has been
observed that the type information of these KGs is often noisy, incomplete, and
incorrect. To deal with this problem a multi-label classification approach is
proposed in this work for entity typing using KG embeddings. We compare our
approach with the current state-of-the-art type prediction method and report on
experiments with the KGs
Training Complex Models with Multi-Task Weak Supervision
As machine learning models continue to increase in complexity, collecting
large hand-labeled training sets has become one of the biggest roadblocks in
practice. Instead, weaker forms of supervision that provide noisier but cheaper
labels are often used. However, these weak supervision sources have diverse and
unknown accuracies, may output correlated labels, and may label different tasks
or apply at different levels of granularity. We propose a framework for
integrating and modeling such weak supervision sources by viewing them as
labeling different related sub-tasks of a problem, which we refer to as the
multi-task weak supervision setting. We show that by solving a matrix
completion-style problem, we can recover the accuracies of these multi-task
sources given their dependency structure, but without any labeled data, leading
to higher-quality supervision for training an end model. Theoretically, we show
that the generalization error of models trained with this approach improves
with the number of unlabeled data points, and characterize the scaling with
respect to the task and dependency structures. On three fine-grained
classification problems, we show that our approach leads to average gains of
20.2 points in accuracy over a traditional supervised approach, 6.8 points over
a majority vote baseline, and 4.1 points over a previously proposed weak
supervision method that models tasks separately
Discovering visual concept structure with sparse and incomplete tags
This work was partially supported by the China Scholarship Council, Vision Semantics Limited, and Royal Society Newton Advanced Fellowship Programme (NA150459)
Leveraging Node Attributes for Incomplete Relational Data
Relational data are usually highly incomplete in practice, which inspires us
to leverage side information to improve the performance of community detection
and link prediction. This paper presents a Bayesian probabilistic approach that
incorporates various kinds of node attributes encoded in binary form in
relational models with Poisson likelihood. Our method works flexibly with both
directed and undirected relational networks. The inference can be done by
efficient Gibbs sampling which leverages sparsity of both networks and node
attributes. Extensive experiments show that our models achieve the
state-of-the-art link prediction results, especially with highly incomplete
relational data.Comment: Appearing in ICML 201
- …