59,488 research outputs found
Multiple Instance Learning: A Survey of Problem Characteristics and Applications
Multiple instance learning (MIL) is a form of weakly supervised learning
where training instances are arranged in sets, called bags, and a label is
provided for the entire bag. This formulation is gaining interest because it
naturally fits various problems and allows to leverage weakly labeled data.
Consequently, it has been used in diverse application fields such as computer
vision and document classification. However, learning from bags raises
important challenges that are unique to MIL. This paper provides a
comprehensive survey of the characteristics which define and differentiate the
types of MIL problems. Until now, these problem characteristics have not been
formally identified and described. As a result, the variations in performance
of MIL algorithms from one data set to another are difficult to explain. In
this paper, MIL problem characteristics are grouped into four broad categories:
the composition of the bags, the types of data distribution, the ambiguity of
instance labels, and the task to be performed. Methods specialized to address
each category are reviewed. Then, the extent to which these characteristics
manifest themselves in key MIL application areas are described. Finally,
experiments are conducted to compare the performance of 16 state-of-the-art MIL
methods on selected problem characteristics. This paper provides insight on how
the problem characteristics affect MIL algorithms, recommendations for future
benchmarking and promising avenues for research
Active Discovery of Network Roles for Predicting the Classes of Network Nodes
Nodes in real world networks often have class labels, or underlying
attributes, that are related to the way in which they connect to other nodes.
Sometimes this relationship is simple, for instance nodes of the same class are
may be more likely to be connected. In other cases, however, this is not true,
and the way that nodes link in a network exhibits a different, more complex
relationship to their attributes. Here, we consider networks in which we know
how the nodes are connected, but we do not know the class labels of the nodes
or how class labels relate to the network links. We wish to identify the best
subset of nodes to label in order to learn this relationship between node
attributes and network links. We can then use this discovered relationship to
accurately predict the class labels of the rest of the network nodes.
We present a model that identifies groups of nodes with similar link
patterns, which we call network roles, using a generative blockmodel. The model
then predicts labels by learning the mapping from network roles to class labels
using a maximum margin classifier. We choose a subset of nodes to label
according to an iterative margin-based active learning strategy. By integrating
the discovery of network roles with the classifier optimisation, the active
learning process can adapt the network roles to better represent the network
for node classification. We demonstrate the model by exploring a selection of
real world networks, including a marine food web and a network of English
words. We show that, in contrast to other network classifiers, this model
achieves good classification accuracy for a range of networks with different
relationships between class labels and network links
- …