13,359 research outputs found
Multiple Kernel Learning and Automatic Subspace Relevance Determination for High-dimensional Neuroimaging Data
Alzheimer's disease is a major cause of dementia. Its diagnosis requires
accurate biomarkers that are sensitive to disease stages. In this respect, we
regard probabilistic classification as a method of designing a probabilistic
biomarker for disease staging. Probabilistic biomarkers naturally support the
interpretation of decisions and evaluation of uncertainty associated with them.
In this paper, we obtain probabilistic biomarkers via Gaussian Processes.
Gaussian Processes enable probabilistic kernel machines that offer flexible
means to accomplish Multiple Kernel Learning. Exploiting this flexibility, we
propose a new variation of Automatic Relevance Determination and tackle the
challenges of high dimensionality through multiple kernels. Our research
results demonstrate that the Gaussian Process models are competitive with or
better than the well-known Support Vector Machine in terms of classification
performance even in the cases of single kernel learning. Extending the basic
scheme towards the Multiple Kernel Learning, we improve the efficacy of the
Gaussian Process models and their interpretability in terms of the known
anatomical correlates of the disease. For instance, the disease pathology
starts in and around the hippocampus and entorhinal cortex. Through the use of
Gaussian Processes and Multiple Kernel Learning, we have automatically and
efficiently determined those portions of neuroimaging data. In addition to
their interpretability, our Gaussian Process models are competitive with recent
deep learning solutions under similar settings.Comment: The material presented here is to promote the dissemination of
scholarly and technical work in a timely fashion. Data in this article are
from ADNI (adni.loni.usc.edu). As such, ADNI provided data but did not
participate in writing of this repor
Liver segmentation in CT images using three dimensional to two dimensional fully convolutional network
The need for CT scan analysis is growing for pre-diagnosis and therapy of
abdominal organs. Automatic organ segmentation of abdominal CT scan can help
radiologists analyze the scans faster and segment organ images with fewer
errors. However, existing methods are not efficient enough to perform the
segmentation process for victims of accidents and emergencies situations. In
this paper we propose an efficient liver segmentation with our 3D to 2D fully
connected network (3D-2D-FCN). The segmented mask is enhanced by means of
conditional random field on the organ's border. Consequently, we segment a
target liver in less than a minute with Dice score of 93.52.Comment: 5 pages, 2 figure
MMD GAN: Towards Deeper Understanding of Moment Matching Network
Generative moment matching network (GMMN) is a deep generative model that
differs from Generative Adversarial Network (GAN) by replacing the
discriminator in GAN with a two-sample test based on kernel maximum mean
discrepancy (MMD). Although some theoretical guarantees of MMD have been
studied, the empirical performance of GMMN is still not as competitive as that
of GAN on challenging and large benchmark datasets. The computational
efficiency of GMMN is also less desirable in comparison with GAN, partially due
to its requirement for a rather large batch size during the training. In this
paper, we propose to improve both the model expressiveness of GMMN and its
computational efficiency by introducing adversarial kernel learning techniques,
as the replacement of a fixed Gaussian kernel in the original GMMN. The new
approach combines the key ideas in both GMMN and GAN, hence we name it MMD GAN.
The new distance measure in MMD GAN is a meaningful loss that enjoys the
advantage of weak topology and can be optimized via gradient descent with
relatively small batch sizes. In our evaluation on multiple benchmark datasets,
including MNIST, CIFAR- 10, CelebA and LSUN, the performance of MMD-GAN
significantly outperforms GMMN, and is competitive with other representative
GAN works.Comment: In the Proceedings of Thirty-first Annual Conference on Neural
Information Processing Systems (NIPS 2017
Mapping Auto-context Decision Forests to Deep ConvNets for Semantic Segmentation
We consider the task of pixel-wise semantic segmentation given a small set of
labeled training images. Among two of the most popular techniques to address
this task are Decision Forests (DF) and Neural Networks (NN). In this work, we
explore the relationship between two special forms of these techniques: stacked
DFs (namely Auto-context) and deep Convolutional Neural Networks (ConvNet). Our
main contribution is to show that Auto-context can be mapped to a deep ConvNet
with novel architecture, and thereby trained end-to-end. This mapping can be
used as an initialization of a deep ConvNet, enabling training even in the face
of very limited amounts of training data. We also demonstrate an approximate
mapping back from the refined ConvNet to a second stacked DF, with improved
performance over the original. We experimentally verify that these mappings
outperform stacked DFs for two different applications in computer vision and
biology: Kinect-based body part labeling from depth images, and somite
segmentation in microscopy images of developing zebrafish. Finally, we revisit
the core mapping from a Decision Tree (DT) to a NN, and show that it is also
possible to map a fuzzy DT, with sigmoidal split decisions, to a NN. This
addresses multiple limitations of the previous mapping, and yields new insights
into the popular Rectified Linear Unit (ReLU), and more recently proposed
concatenated ReLU (CReLU), activation functions
TensorFlow Distributions
The TensorFlow Distributions library implements a vision of probability
theory adapted to the modern deep-learning paradigm of end-to-end
differentiable computation. Building on two basic abstractions, it offers
flexible building blocks for probabilistic computation. Distributions provide
fast, numerically stable methods for generating samples and computing
statistics, e.g., log density. Bijectors provide composable volume-tracking
transformations with automatic caching. Together these enable modular
construction of high dimensional distributions and transformations not possible
with previous libraries (e.g., pixelCNNs, autoregressive flows, and reversible
residual networks). They are the workhorse behind deep probabilistic
programming systems like Edward and empower fast black-box inference in
probabilistic models built on deep-network components. TensorFlow Distributions
has proven an important part of the TensorFlow toolkit within Google and in the
broader deep learning community
Fully Connected Deep Structured Networks
Convolutional neural networks with many layers have recently been shown to
achieve excellent results on many high-level tasks such as image
classification, object detection and more recently also semantic segmentation.
Particularly for semantic segmentation, a two-stage procedure is often
employed. Hereby, convolutional networks are trained to provide good local
pixel-wise features for the second step being traditionally a more global
graphical model. In this work we unify this two-stage process into a single
joint training algorithm. We demonstrate our method on the semantic image
segmentation task and show encouraging results on the challenging PASCAL VOC
2012 dataset
Probabilistic Programming with Gaussian Process Memoization
Gaussian Processes (GPs) are widely used tools in statistics, machine
learning, robotics, computer vision, and scientific computation. However,
despite their popularity, they can be difficult to apply; all but the simplest
classification or regression applications require specification and inference
over complex covariance functions that do not admit simple analytical
posteriors. This paper shows how to embed Gaussian processes in any
higher-order probabilistic programming language, using an idiom based on
memoization, and demonstrates its utility by implementing and extending classic
and state-of-the-art GP applications. The interface to Gaussian processes,
called gpmem, takes an arbitrary real-valued computational process as input and
returns a statistical emulator that automatically improve as the original
process is invoked and its input-output behavior is recorded. The flexibility
of gpmem is illustrated via three applications: (i) robust GP regression with
hierarchical hyper-parameter learning, (ii) discovering symbolic expressions
from time-series data by fully Bayesian structure learning over kernels
generated by a stochastic grammar, and (iii) a bandit formulation of Bayesian
optimization with automatic inference and action selection. All applications
share a single 50-line Python library and require fewer than 20 lines of
probabilistic code each.Comment: 36 pages, 9 figure
Kernel Mean Embedding of Distributions: A Review and Beyond
A Hilbert space embedding of a distribution---in short, a kernel mean
embedding---has recently emerged as a powerful tool for machine learning and
inference. The basic idea behind this framework is to map distributions into a
reproducing kernel Hilbert space (RKHS) in which the whole arsenal of kernel
methods can be extended to probability measures. It can be viewed as a
generalization of the original "feature map" common to support vector machines
(SVMs) and other kernel methods. While initially closely associated with the
latter, it has meanwhile found application in fields ranging from kernel
machines and probabilistic modeling to statistical inference, causal discovery,
and deep learning. The goal of this survey is to give a comprehensive review of
existing work and recent advances in this research area, and to discuss the
most challenging issues and open problems that could lead to new research
directions. The survey begins with a brief introduction to the RKHS and
positive definite kernels which forms the backbone of this survey, followed by
a thorough discussion of the Hilbert space embedding of marginal distributions,
theoretical guarantees, and a review of its applications. The embedding of
distributions enables us to apply RKHS methods to probability measures which
prompts a wide range of applications such as kernel two-sample testing,
independent testing, and learning on distributional data. Next, we discuss the
Hilbert space embedding for conditional distributions, give theoretical
insights, and review some applications. The conditional mean embedding enables
us to perform sum, product, and Bayes' rules---which are ubiquitous in
graphical model, probabilistic inference, and reinforcement learning---in a
non-parametric way. We then discuss relationships between this framework and
other related areas. Lastly, we give some suggestions on future research
directions.Comment: 147 pages; this is a version of the manuscript after the review
proces
Bayesian Convolutional Neural Networks with Bernoulli Approximate Variational Inference
Convolutional neural networks (CNNs) work well on large datasets. But
labelled data is hard to collect, and in some applications larger amounts of
data are not available. The problem then is how to use CNNs with small data --
as CNNs overfit quickly. We present an efficient Bayesian CNN, offering better
robustness to over-fitting on small data than traditional approaches. This is
by placing a probability distribution over the CNN's kernels. We approximate
our model's intractable posterior with Bernoulli variational distributions,
requiring no additional model parameters.
On the theoretical side, we cast dropout network training as approximate
inference in Bayesian neural networks. This allows us to implement our model
using existing tools in deep learning with no increase in time complexity,
while highlighting a negative result in the field. We show a considerable
improvement in classification accuracy compared to standard techniques and
improve on published state-of-the-art results for CIFAR-10.Comment: 12 pages, 3 figures, ICLR format, updated with reviewer comment
Adaptive Sampled Softmax with Kernel Based Sampling
Softmax is the most commonly used output function for multiclass problems and
is widely used in areas such as vision, natural language processing, and
recommendation. A softmax model has linear costs in the number of classes which
makes it too expensive for many real-world problems. A common approach to speed
up training involves sampling only some of the classes at each training step.
It is known that this method is biased and that the bias increases the more the
sampling distribution deviates from the output distribution. Nevertheless,
almost any recent work uses simple sampling distributions that require a large
sample size to mitigate the bias. In this work, we propose a new class of
kernel based sampling methods and develop an efficient sampling algorithm.
Kernel based sampling adapts to the model as it is trained, thus resulting in
low bias. Kernel based sampling can be easily applied to many models because it
relies only on the model's last hidden layer. We empirically study the
trade-off of bias, sampling distribution and sample size and show that kernel
based sampling results in low bias with few samples
- …