14,620 research outputs found
Deep Lesion Graphs in the Wild: Relationship Learning and Organization of Significant Radiology Image Findings in a Diverse Large-scale Lesion Database
Radiologists in their daily work routinely find and annotate significant
abnormalities on a large number of radiology images. Such abnormalities, or
lesions, have collected over years and stored in hospitals' picture archiving
and communication systems. However, they are basically unsorted and lack
semantic annotations like type and location. In this paper, we aim to organize
and explore them by learning a deep feature representation for each lesion. A
large-scale and comprehensive dataset, DeepLesion, is introduced for this task.
DeepLesion contains bounding boxes and size measurements of over 32K lesions.
To model their similarity relationship, we leverage multiple supervision
information including types, self-supervised location coordinates and sizes.
They require little manual annotation effort but describe useful attributes of
the lesions. Then, a triplet network is utilized to learn lesion embeddings
with a sequential sampling strategy to depict their hierarchical similarity
structure. Experiments show promising qualitative and quantitative results on
lesion retrieval, clustering, and classification. The learned embeddings can be
further employed to build a lesion graph for various clinically useful
applications. We propose algorithms for intra-patient lesion matching and
missing annotation mining. Experimental results validate their effectiveness.Comment: Accepted by CVPR2018. DeepLesion url adde
Positive Semidefinite Metric Learning with Boosting
The learning of appropriate distance metrics is a critical problem in image
classification and retrieval. In this work, we propose a boosting-based
technique, termed \BoostMetric, for learning a Mahalanobis distance metric. One
of the primary difficulties in learning such a metric is to ensure that the
Mahalanobis matrix remains positive semidefinite. Semidefinite programming is
sometimes used to enforce this constraint, but does not scale well.
\BoostMetric is instead based on a key observation that any positive
semidefinite matrix can be decomposed into a linear positive combination of
trace-one rank-one matrices. \BoostMetric thus uses rank-one positive
semidefinite matrices as weak learners within an efficient and scalable
boosting-based learning process. The resulting method is easy to implement,
does not require tuning, and can accommodate various types of constraints.
Experiments on various datasets show that the proposed algorithm compares
favorably to those state-of-the-art methods in terms of classification accuracy
and running time.Comment: 11 pages, Twenty-Third Annual Conference on Neural Information
Processing Systems (NIPS 2009), Vancouver, Canad
Nearest Labelset Using Double Distances for Multi-label Classification
Multi-label classification is a type of supervised learning where an instance
may belong to multiple labels simultaneously. Predicting each label
independently has been criticized for not exploiting any correlation between
labels. In this paper we propose a novel approach, Nearest Labelset using
Double Distances (NLDD), that predicts the labelset observed in the training
data that minimizes a weighted sum of the distances in both the feature space
and the label space to the new instance. The weights specify the relative
tradeoff between the two distances. The weights are estimated from a binomial
regression of the number of misclassified labels as a function of the two
distances. Model parameters are estimated by maximum likelihood. NLDD only
considers labelsets observed in the training data, thus implicitly taking into
account label dependencies. Experiments on benchmark multi-label data sets show
that the proposed method on average outperforms other well-known approaches in
terms of Hamming loss, 0/1 loss, and multi-label accuracy and ranks second
after ECC on the F-measure
Multiple Instance Learning: A Survey of Problem Characteristics and Applications
Multiple instance learning (MIL) is a form of weakly supervised learning
where training instances are arranged in sets, called bags, and a label is
provided for the entire bag. This formulation is gaining interest because it
naturally fits various problems and allows to leverage weakly labeled data.
Consequently, it has been used in diverse application fields such as computer
vision and document classification. However, learning from bags raises
important challenges that are unique to MIL. This paper provides a
comprehensive survey of the characteristics which define and differentiate the
types of MIL problems. Until now, these problem characteristics have not been
formally identified and described. As a result, the variations in performance
of MIL algorithms from one data set to another are difficult to explain. In
this paper, MIL problem characteristics are grouped into four broad categories:
the composition of the bags, the types of data distribution, the ambiguity of
instance labels, and the task to be performed. Methods specialized to address
each category are reviewed. Then, the extent to which these characteristics
manifest themselves in key MIL application areas are described. Finally,
experiments are conducted to compare the performance of 16 state-of-the-art MIL
methods on selected problem characteristics. This paper provides insight on how
the problem characteristics affect MIL algorithms, recommendations for future
benchmarking and promising avenues for research
- …