40,493 research outputs found
Multiple Instance Learning: A Survey of Problem Characteristics and Applications
Multiple instance learning (MIL) is a form of weakly supervised learning
where training instances are arranged in sets, called bags, and a label is
provided for the entire bag. This formulation is gaining interest because it
naturally fits various problems and allows to leverage weakly labeled data.
Consequently, it has been used in diverse application fields such as computer
vision and document classification. However, learning from bags raises
important challenges that are unique to MIL. This paper provides a
comprehensive survey of the characteristics which define and differentiate the
types of MIL problems. Until now, these problem characteristics have not been
formally identified and described. As a result, the variations in performance
of MIL algorithms from one data set to another are difficult to explain. In
this paper, MIL problem characteristics are grouped into four broad categories:
the composition of the bags, the types of data distribution, the ambiguity of
instance labels, and the task to be performed. Methods specialized to address
each category are reviewed. Then, the extent to which these characteristics
manifest themselves in key MIL application areas are described. Finally,
experiments are conducted to compare the performance of 16 state-of-the-art MIL
methods on selected problem characteristics. This paper provides insight on how
the problem characteristics affect MIL algorithms, recommendations for future
benchmarking and promising avenues for research
Multiple Instance Learning for Heterogeneous Images: Training a CNN for Histopathology
Multiple instance (MI) learning with a convolutional neural network enables
end-to-end training in the presence of weak image-level labels. We propose a
new method for aggregating predictions from smaller regions of the image into
an image-level classification by using the quantile function. The quantile
function provides a more complete description of the heterogeneity within each
image, improving image-level classification. We also adapt image augmentation
to the MI framework by randomly selecting cropped regions on which to apply MI
aggregation during each epoch of training. This provides a mechanism to study
the importance of MI learning. We validate our method on five different
classification tasks for breast tumor histology and provide a visualization
method for interpreting local image classifications that could lead to future
insights into tumor heterogeneity
A Comparison of Multi-instance Learning Algorithms
Motivated by various challenging real-world applications, such as drug activity prediction and image retrieval, multi-instance (MI) learning has attracted considerable interest in recent years. Compared with standard supervised learning, the MI learning task is more difficult as the label information of each training example is incomplete. Many MI algorithms have been proposed. Some of them are specifically designed for MI problems whereas others have been upgraded or adapted from standard single-instance learning algorithms. Most algorithms have been evaluated on only one or two benchmark datasets, and there is a lack of systematic comparisons of MI learning algorithms.
This thesis presents a comprehensive study of MI learning algorithms that aims to compare their performance and find a suitable way to properly address different MI problems. First, it briefly reviews the history of research on MI learning. Then it discusses five general classes of MI approaches that cover a total of 16 MI algorithms. After that, it presents empirical results for these algorithms that were obtained from 15 datasets which involve five different real-world application domains. Finally, some conclusions are drawn from these results: (1) applying suitable standard single-instance learners to MI problems can often generate the best result on the datasets that were tested, (2) algorithms exploiting the standard asymmetric MI assumption do not show significant advantages over approaches using the so-called collective assumption, and (3) different MI approaches are suitable for different application domains, and no MI algorithm works best on all MI problems
Automatic Emphysema Detection using Weakly Labeled HRCT Lung Images
A method for automatically quantifying emphysema regions using
High-Resolution Computed Tomography (HRCT) scans of patients with chronic
obstructive pulmonary disease (COPD) that does not require manually annotated
scans for training is presented. HRCT scans of controls and of COPD patients
with diverse disease severity are acquired at two different centers. Textural
features from co-occurrence matrices and Gaussian filter banks are used to
characterize the lung parenchyma in the scans. Two robust versions of multiple
instance learning (MIL) classifiers, miSVM and MILES, are investigated. The
classifiers are trained with the weak labels extracted from the forced
expiratory volume in one minute (FEV) and diffusing capacity of the lungs
for carbon monoxide (DLCO). At test time, the classifiers output a patient
label indicating overall COPD diagnosis and local labels indicating the
presence of emphysema. The classifier performance is compared with manual
annotations by two radiologists, a classical density based method, and
pulmonary function tests (PFTs). The miSVM classifier performed better than
MILES on both patient and emphysema classification. The classifier has a
stronger correlation with PFT than the density based method, the percentage of
emphysema in the intersection of annotations from both radiologists, and the
percentage of emphysema annotated by one of the radiologists. The correlation
between the classifier and the PFT is only outperformed by the second
radiologist. The method is therefore promising for facilitating assessment of
emphysema and reducing inter-observer variability.Comment: Accepted at PLoS ON
Bag-Level Aggregation for Multiple Instance Active Learning in Instance Classification Problems
A growing number of applications, e.g. video surveillance and medical image
analysis, require training recognition systems from large amounts of weakly
annotated data while some targeted interactions with a domain expert are
allowed to improve the training process. In such cases, active learning (AL)
can reduce labeling costs for training a classifier by querying the expert to
provide the labels of most informative instances. This paper focuses on AL
methods for instance classification problems in multiple instance learning
(MIL), where data is arranged into sets, called bags, that are weakly labeled.
Most AL methods focus on single instance learning problems. These methods are
not suitable for MIL problems because they cannot account for the bag structure
of data. In this paper, new methods for bag-level aggregation of instance
informativeness are proposed for multiple instance active learning (MIAL). The
\textit{aggregated informativeness} method identifies the most informative
instances based on classifier uncertainty, and queries bags incorporating the
most information. The other proposed method, called \textit{cluster-based
aggregative sampling}, clusters data hierarchically in the instance space. The
informativeness of instances is assessed by considering bag labels, inferred
instance labels, and the proportion of labels that remain to be discovered in
clusters. Both proposed methods significantly outperform reference methods in
extensive experiments using benchmark data from several application domains.
Results indicate that using an appropriate strategy to address MIAL problems
yields a significant reduction in the number of queries needed to achieve the
same level of performance as single instance AL methods
- …