69,039 research outputs found
Attention-Guided Discriminative Region Localization and Label Distribution Learning for Bone Age Assessment
Bone age assessment (BAA) is clinically important as it can be used to
diagnose endocrine and metabolic disorders during child development. Existing
deep learning based methods for classifying bone age use the global image as
input, or exploit local information by annotating extra bounding boxes or key
points. However, training with the global image underutilizes discriminative
local information, while providing extra annotations is expensive and
subjective. In this paper, we propose an attention-guided approach to
automatically localize the discriminative regions for BAA without any extra
annotations. Specifically, we first train a classification model to learn the
attention maps of the discriminative regions, finding the hand region, the most
discriminative region (the carpal bones), and the next most discriminative
region (the metacarpal bones). Guided by those attention maps, we then crop the
informative local regions from the original image and aggregate different
regions for BAA. Instead of taking BAA as a general regression task, which is
suboptimal due to the label ambiguity problem in the age label space, we
propose using joint age distribution learning and expectation regression, which
makes use of the ordinal relationship among hand images with different
individual ages and leads to more robust age estimation. Extensive experiments
are conducted on the RSNA pediatric bone age data set. Using no training
annotations, our method achieves competitive results compared with existing
state-of-the-art semi-automatic deep learning-based methods that require manual
annotation. Code is available at https:
//github.com/chenchao666/Bone-Age-Assessment.Comment: codes are available at
https://github.com/chenchao666/Bone-Age-Assessmen
Modeling Label Ambiguity for Neural List-Wise Learning to Rank
List-wise learning to rank methods are considered to be the state-of-the-art.
One of the major problems with these methods is that the ambiguous nature of
relevance labels in learning to rank data is ignored. Ambiguity of relevance
labels refers to the phenomenon that multiple documents may be assigned the
same relevance label for a given query, so that no preference order should be
learned for those documents. In this paper we propose a novel sampling
technique for computing a list-wise loss that can take into account this
ambiguity. We show the effectiveness of the proposed method by training a
3-layer deep neural network. We compare our new loss function to two strong
baselines: ListNet and ListMLE. We show that our method generalizes better and
significantly outperforms other methods on the validation and test sets
Scene Parsing with Integration of Parametric and Non-parametric Models
We adopt Convolutional Neural Networks (CNNs) to be our parametric model to
learn discriminative features and classifiers for local patch classification.
Based on the occurrence frequency distribution of classes, an ensemble of CNNs
(CNN-Ensemble) are learned, in which each CNN component focuses on learning
different and complementary visual patterns. The local beliefs of pixels are
output by CNN-Ensemble. Considering that visually similar pixels are
indistinguishable under local context, we leverage the global scene semantics
to alleviate the local ambiguity. The global scene constraint is mathematically
achieved by adding a global energy term to the labeling energy function, and it
is practically estimated in a non-parametric framework. A large margin based
CNN metric learning method is also proposed for better global belief
estimation. In the end, the integration of local and global beliefs gives rise
to the class likelihood of pixels, based on which maximum marginal inference is
performed to generate the label prediction maps. Even without any
post-processing, we achieve state-of-the-art results on the challenging
SiftFlow and Barcelona benchmarks.Comment: 13 Pages, 6 figures, IEEE Transactions on Image Processing (T-IP)
201
Sufficient Conditions for Idealised Models to Have No Adversarial Examples: a Theoretical and Empirical Study with Bayesian Neural Networks
We prove, under two sufficient conditions, that idealised models can have no
adversarial examples. We discuss which idealised models satisfy our conditions,
and show that idealised Bayesian neural networks (BNNs) satisfy these. We
continue by studying near-idealised BNNs using HMC inference, demonstrating the
theoretical ideas in practice. We experiment with HMC on synthetic data derived
from MNIST for which we know the ground-truth image density, showing that
near-perfect epistemic uncertainty correlates to density under image manifold,
and that adversarial images lie off the manifold in our setting. This suggests
why MC dropout, which can be seen as performing approximate inference, has been
observed to be an effective defence against adversarial examples in practice;
We highlight failure-cases of non-idealised BNNs relying on dropout, suggesting
a new attack for dropout models and a new defence as well. Lastly, we
demonstrate the defence on a cats-vs-dogs image classification task with a
VGG13 variant
Zoom-Net: Mining Deep Feature Interactions for Visual Relationship Recognition
Recognizing visual relationships among any pair of
localized objects is pivotal for image understanding. Previous studies have
shown remarkable progress in exploiting linguistic priors or external textual
information to improve the performance. In this work, we investigate an
orthogonal perspective based on feature interactions. We show that by
encouraging deep message propagation and interactions between local object
features and global predicate features, one can achieve compelling performance
in recognizing complex relationships without using any linguistic priors. To
this end, we present two new pooling cells to encourage feature interactions:
(i) Contrastive ROI Pooling Cell, which has a unique deROI pooling that
inversely pools local object features to the corresponding area of global
predicate features. (ii) Pyramid ROI Pooling Cell, which broadcasts global
predicate features to reinforce local object features.The two cells constitute
a Spatiality-Context-Appearance Module (SCA-M), which can be further stacked
consecutively to form our final Zoom-Net.We further shed light on how one could
resolve ambiguous and noisy object and predicate annotations by
Intra-Hierarchical trees (IH-tree). Extensive experiments conducted on Visual
Genome dataset demonstrate the effectiveness of our feature-oriented approach
compared to state-of-the-art methods (Acc@1 11.42% from 8.16%) that depend on
explicit modeling of linguistic interactions. We further show that SCA-M can be
incorporated seamlessly into existing approaches to improve the performance by
a large margin. The source code will be released on
https://github.com/gjyin91/ZoomNet.Comment: 22 pages, 9 figures, accepted by ECCV 2018, the source code will be
released on https://github.com/gjyin91/ZoomNe
A Coupled Evolutionary Network for Age Estimation
Age estimation of unknown persons is a challenging pattern analysis task due
to the lacking of training data and various aging mechanisms for different
people. Label distribution learning-based methods usually make distribution
assumptions to simplify age estimation. However, age label distributions are
often complex and difficult to be modeled in a parameter way. Inspired by the
biological evolutionary mechanism, we propose a Coupled Evolutionary Network
(CEN) with two concurrent evolutionary processes: evolutionary label
distribution learning and evolutionary slack regression. Evolutionary network
learns and refines age label distributions in an iteratively learning way.
Evolutionary label distribution learning adaptively learns and constantly
refines the age label distributions without making strong assumptions on the
distribution patterns. To further utilize the ordered and continuous
information of age labels, we accordingly propose an evolutionary slack
regression to convert the discrete age label regression into the continuous age
interval regression. Experimental results on Morph, ChaLearn15 and
MegaAge-Asian datasets show the superiority of our method
Deep Active Object Recognition by Joint Label and Action Prediction
An active object recognition system has the advantage of being able to act in
the environment to capture images that are more suited for training and that
lead to better performance at test time. In this paper, we propose a deep
convolutional neural network for active object recognition that simultaneously
predicts the object label, and selects the next action to perform on the object
with the aim of improving recognition performance. We treat active object
recognition as a reinforcement learning problem and derive the cost function to
train the network for joint prediction of the object label and the action. A
generative model of object similarities based on the Dirichlet distribution is
proposed and embedded in the network for encoding the state of the system. The
training is carried out by simultaneously minimizing the label and action
prediction errors using gradient descent. We empirically show that the proposed
network is able to predict both the object label and the actions on GERMS, a
dataset for active object recognition. We compare the test label prediction
accuracy of the proposed model with Dirichlet and Naive Bayes state encoding.
The results of experiments suggest that the proposed model equipped with
Dirichlet state encoding is superior in performance, and selects images that
lead to better training and higher accuracy of label prediction at test time
Tournament Based Ranking CNN for the Cataract grading
Solving the classification problem, unbalanced number of dataset among the
classes often causes performance degradation. Especially when some classes
dominate the other classes with its large number of datasets, trained model
shows low performance in identifying the dominated classes. This is common case
when it comes to medical dataset. Because the case with a serious degree is not
quite usual, there are imbalance in number of dataset between severe case and
normal cases of diseases. Also, there is difficulty in precisely identifying
grade of medical data because of vagueness between them. To solve these
problems, we propose new architecture of convolutional neural network named
Tournament based Ranking CNN which shows remarkable performance gain in
identifying dominated classes while trading off very small accuracy loss in
dominating classes. Our Approach complemented problems that occur when method
of Ranking CNN that aggregates outputs of multiple binary neural network models
is applied to medical data. By having tournament structure in aggregating
method and using very deep pretrained binary models, our proposed model
recorded 68.36% of exact match accuracy, while Ranking CNN recorded 53.40%,
pretrained Resnet recorded 56.12% and CNN with linear regression recorded
57.48%. As a result, our proposed method is applied efficiently to cataract
grading which have ordinal labels with imbalanced number of data among classes,
also can be applied further to medical problems which have similar features to
cataract and similar dataset configuration.Comment: Submitted to ACCV 201
Learning to decompose the modes in few-mode fibers with deep convolutional neural network
We introduce deep learning technique to perform complete mode decomposition
for few-mode optical fiber for the first time. Our goal is to learn a fast and
accurate mapping from near-field beam profiles to the complete mode
coefficients, including both modal amplitudes and phases. We train the
convolutional neural network with simulated beam patterns, and evaluate the
network on both of the simulated beam data and the real beam data. In simulated
beam data testing, the correlation between the reconstructed and the ideal beam
profiles can achieve 0.9993 and 0.995 for 3-mode case and 5-mode case
respectively. While in the real 3-mode beam data testing, the average
correlation is 0.9912 and the mode decomposition can be potentially performed
at 33 Hz frequency on Graphic Processing Unit, indicating real-time processing
ability. The quantitative evaluations demonstrate the superiority of our deep
learning based approach
Distribution Aware Active Learning
Discriminative learning machines often need a large set of labeled samples
for training. Active learning (AL) settings assume that the learner has the
freedom to ask an oracle to label its desired samples. Traditional AL
algorithms heuristically choose query samples about which the current learner
is uncertain. This strategy does not make good use of the structure of the
dataset at hand and is prone to be misguided by outliers. To alleviate this
problem, we propose to distill the structural information into a probabilistic
generative model which acts as a \emph{teacher} in our model. The active
\emph{learner} uses this information effectively at each cycle of active
learning. The proposed method is generic and does not depend on the type of
learner and teacher. We then suggest a query criterion for active learning that
is aware of distribution of data and is more robust against outliers. Our
method can be combined readily with several other query criteria for active
learning. We provide the formulation and empirically show our idea via toy and
real examples
- …