563 research outputs found
Self-Supervised Learning for Spinal MRIs
A significant proportion of patients scanned in a clinical setting have
follow-up scans. We show in this work that such longitudinal scans alone can be
used as a form of 'free' self-supervision for training a deep network. We
demonstrate this self-supervised learning for the case of T2-weighted sagittal
lumbar Magnetic Resonance Images (MRIs). A Siamese convolutional neural network
(CNN) is trained using two losses: (i) a contrastive loss on whether the scan
is of the same person (i.e. longitudinal) or not, together with (ii) a
classification loss on predicting the level of vertebral bodies. The
performance of this pre-trained network is then assessed on a grading
classification task. We experiment on a dataset of 1016 subjects, 423
possessing follow-up scans, with the end goal of learning the disc degeneration
radiological gradings attached to the intervertebral discs. We show that the
performance of the pre-trained CNN on the supervised classification task is (i)
superior to that of a network trained from scratch; and (ii) requires far fewer
annotated training samples to reach an equivalent performance to that of the
network trained from scratch.Comment: 3rd Workshop on Deep Learning in Medical Image Analysi
Unsupervised Learning of Visual Representations using Videos
Is strong supervision necessary for learning a good visual representation? Do
we really need millions of semantically-labeled images to train a Convolutional
Neural Network (CNN)? In this paper, we present a simple yet surprisingly
powerful approach for unsupervised learning of CNN. Specifically, we use
hundreds of thousands of unlabeled videos from the web to learn visual
representations. Our key idea is that visual tracking provides the supervision.
That is, two patches connected by a track should have similar visual
representation in deep feature space since they probably belong to the same
object or object part. We design a Siamese-triplet network with a ranking loss
function to train this CNN representation. Without using a single image from
ImageNet, just using 100K unlabeled videos and the VOC 2012 dataset, we train
an ensemble of unsupervised networks that achieves 52% mAP (no bounding box
regression). This performance comes tantalizingly close to its
ImageNet-supervised counterpart, an ensemble which achieves a mAP of 54.4%. We
also show that our unsupervised network can perform competitively in other
tasks such as surface-normal estimation
Recommended from our members
Learning models for semantic classification of insufficient plantar pressure images
Establishing a reliable and stable model to predict a target by using insufficient labeled samples is feasible and
effective, particularly, for a sensor-generated data-set. This paper has been inspired with insufficient data-set
learning algorithms, such as metric-based, prototype networks and meta-learning, and therefore we propose
an insufficient data-set transfer model learning method. Firstly, two basic models for transfer learning are
introduced. A classification system and calculation criteria are then subsequently introduced. Secondly, a dataset
of plantar pressure for comfort shoe design is acquired and preprocessed through foot scan system; and by
using a pre-trained convolution neural network employing AlexNet and convolution neural network (CNN)-
based transfer modeling, the classification accuracy of the plantar pressure images is over 93.5%. Finally,
the proposed method has been compared to the current classifiers VGG, ResNet, AlexNet and pre-trained
CNN. Also, our work is compared with known-scaling and shifting (SS) and unknown-plain slot (PS) partition
methods on the public test databases: SUN, CUB, AWA1, AWA2, and aPY with indices of precision (tr, ts, H)
and time (training and evaluation). The proposed method for the plantar pressure classification task shows high
performance in most indices when comparing with other methods. The transfer learning-based method can be
applied to other insufficient data-sets of sensor imaging fields
Chinese Medical Question Answer Matching Based on Interactive Sentence Representation Learning
Chinese medical question-answer matching is more challenging than the
open-domain question answer matching in English. Even though the deep learning
method has performed well in improving the performance of question answer
matching, these methods only focus on the semantic information inside
sentences, while ignoring the semantic association between questions and
answers, thus resulting in performance deficits. In this paper, we design a
series of interactive sentence representation learning models to tackle this
problem. To better adapt to Chinese medical question-answer matching and take
the advantages of different neural network structures, we propose the Crossed
BERT network to extract the deep semantic information inside the sentence and
the semantic association between question and answer, and then combine with the
multi-scale CNNs network or BiGRU network to take the advantage of different
structure of neural networks to learn more semantic features into the sentence
representation. The experiments on the cMedQA V2.0 and cMedQA V1.0 dataset show
that our model significantly outperforms all the existing state-of-the-art
models of Chinese medical question answer matching
Context Embedding Networks
Low dimensional embeddings that capture the main variations of interest in
collections of data are important for many applications. One way to construct
these embeddings is to acquire estimates of similarity from the crowd. However,
similarity is a multi-dimensional concept that varies from individual to
individual. Existing models for learning embeddings from the crowd typically
make simplifying assumptions such as all individuals estimate similarity using
the same criteria, the list of criteria is known in advance, or that the crowd
workers are not influenced by the data that they see. To overcome these
limitations we introduce Context Embedding Networks (CENs). In addition to
learning interpretable embeddings from images, CENs also model worker biases
for different attributes along with the visual context i.e. the visual
attributes highlighted by a set of images. Experiments on two noisy crowd
annotated datasets show that modeling both worker bias and visual context
results in more interpretable embeddings compared to existing approaches.Comment: CVPR 2018 spotligh
- …