35,090 research outputs found
Sum-Product Networks for Sequence Labeling
We consider higher-order linear-chain conditional random fields (HO-LC-CRFs)
for sequence modelling, and use sum-product networks (SPNs) for representing
higher-order input- and output-dependent factors. SPNs are a recently
introduced class of deep models for which exact and efficient inference can be
performed. By combining HO-LC-CRFs with SPNs, expressive models over both the
output labels and the hidden variables are instantiated while still enabling
efficient exact inference. Furthermore, the use of higher-order factors allows
us to capture relations of multiple input segments and multiple output labels
as often present in real-world data. These relations can not be modelled by the
commonly used first-order models and higher-order models with local factors
including only a single output label. We demonstrate the effectiveness of our
proposed models for sequence labeling. In extensive experiments, we outperform
other state-of-the-art methods in optical character recognition and achieve
competitive results in phone classification
End-to-end learning potentials for structured attribute prediction
We present a structured inference approach in deep neural networks for
multiple attribute prediction. In attribute prediction, a common approach is to
learn independent classifiers on top of a good feature representation. However,
such classifiers assume conditional independence on features and do not
explicitly consider the dependency between attributes in the inference process.
We propose to formulate attribute prediction in terms of marginal inference in
the conditional random field. We model potential functions by deep neural
networks and apply the sum-product algorithm to solve for the approximate
marginal distribution in feed-forward networks. Our message passing layer
implements sparse pairwise potentials by a softplus-linear function that is
equivalent to a higher-order classifier, and learns all the model parameters by
end-to-end back propagation. The experimental results using SUN attributes and
CelebA datasets suggest that the structured inference improves the attribute
prediction performance, and possibly uncovers the hidden relationship between
attributes
End-to-end semantic face segmentation with conditional random fields as convolutional, recurrent and adversarial networks
Recent years have seen a sharp increase in the number of related yet distinct
advances in semantic segmentation. Here, we tackle this problem by leveraging
the respective strengths of these advances. That is, we formulate a conditional
random field over a four-connected graph as end-to-end trainable convolutional
and recurrent networks, and estimate them via an adversarial process.
Importantly, our model learns not only unary potentials but also pairwise
potentials, while aggregating multi-scale contexts and controlling higher-order
inconsistencies. We evaluate our model on two standard benchmark datasets for
semantic face segmentation, achieving state-of-the-art results on both of them
Fully Connected Deep Structured Networks
Convolutional neural networks with many layers have recently been shown to
achieve excellent results on many high-level tasks such as image
classification, object detection and more recently also semantic segmentation.
Particularly for semantic segmentation, a two-stage procedure is often
employed. Hereby, convolutional networks are trained to provide good local
pixel-wise features for the second step being traditionally a more global
graphical model. In this work we unify this two-stage process into a single
joint training algorithm. We demonstrate our method on the semantic image
segmentation task and show encouraging results on the challenging PASCAL VOC
2012 dataset
Neural CRF transducers for sequence labeling
Conditional random fields (CRFs) have been shown to be one of the most
successful approaches to sequence labeling. Various linear-chain neural CRFs
(NCRFs) are developed to implement the non-linear node potentials in CRFs, but
still keeping the linear-chain hidden structure. In this paper, we propose NCRF
transducers, which consists of two RNNs, one extracting features from
observations and the other capturing (theoretically infinite) long-range
dependencies between labels. Different sequence labeling methods are evaluated
over POS tagging, chunking and NER (English, Dutch). Experiment results show
that NCRF transducers achieve consistent improvements over linear-chain NCRFs
and RNN transducers across all the four tasks, and can improve state-of-the-art
results
Physics-Constrained Deep Learning for High-dimensional Surrogate Modeling and Uncertainty Quantification without Labeled Data
Surrogate modeling and uncertainty quantification tasks for PDE systems are
most often considered as supervised learning problems where input and output
data pairs are used for training. The construction of such emulators is by
definition a small data problem which poses challenges to deep learning
approaches that have been developed to operate in the big data regime. Even in
cases where such models have been shown to have good predictive capability in
high dimensions, they fail to address constraints in the data implied by the
PDE model. This paper provides a methodology that incorporates the governing
equations of the physical model in the loss/likelihood functions. The resulting
physics-constrained, deep learning models are trained without any labeled data
(e.g. employing only input data) and provide comparable predictive responses
with data-driven models while obeying the constraints of the problem at hand.
This work employs a convolutional encoder-decoder neural network approach as
well as a conditional flow-based generative model for the solution of PDEs,
surrogate model construction, and uncertainty quantification tasks. The
methodology is posed as a minimization problem of the reverse Kullback-Leibler
(KL) divergence between the model predictive density and the reference
conditional density, where the later is defined as the Boltzmann-Gibbs
distribution at a given inverse temperature with the underlying potential
relating to the PDE system of interest. The generalization capability of these
models to out-of-distribution input is considered. Quantification and
interpretation of the predictive uncertainty is provided for a number of
problems.Comment: 51 pages, 18 figures, submitted to Journal of Computational Physic
Gaussian Filter in CRF Based Semantic Segmentation
Artificial intelligence is making great changes in academy and industry with
the fast development of deep learning, which is a branch of machine learning
and statistical learning. Fully convolutional network [1] is the standard model
for semantic segmentation. Conditional random fields coded as CNN [2] or RNN
[3] and connected with FCN has been successfully applied in object detection
[4]. In this paper, we introduce a multi-resolution neural network for FCN and
apply Gaussian filter to the extended CRF kernel neighborhood and the label
image to reduce the oscillating effect of CRF neural network segmentation, thus
achieve higher precision and faster training speed.Comment: 11 pages, 9 figures, 2 table
Structured Prediction using cGANs with Fusion Discriminator
We propose the fusion discriminator, a single unified framework for
incorporating conditional information into a generative adversarial network
(GAN) for a variety of distinct structured prediction tasks, including image
synthesis, semantic segmentation, and depth estimation. Much like commonly used
convolutional neural network -- conditional Markov random field (CNN-CRF)
models, the proposed method is able to enforce higher-order consistency in the
model, but without being limited to a very specific class of potentials. The
method is conceptually simple and flexible, and our experimental results
demonstrate improvement on several diverse structured prediction tasks.Comment: 13 pages, 5 figures, 3 table
A Structured Learning Approach with Neural Conditional Random Fields for Sleep Staging
Sleep plays a vital role in human health, both mental and physical. Sleep
disorders like sleep apnea are increasing in prevalence, with the rapid
increase in factors like obesity. Sleep apnea is most commonly treated with
Continuous Positive Air Pressure (CPAP) therapy. Presently, however, there is
no mechanism to monitor a patient's progress with CPAP. Accurate detection of
sleep stages from CPAP flow signal is crucial for such a mechanism. We propose,
for the first time, an automated sleep staging model based only on the flow
signal. Deep neural networks have recently shown high accuracy on sleep staging
by eliminating handcrafted features. However, these methods focus exclusively
on extracting informative features from the input signal, without paying much
attention to the dynamics of sleep stages in the output sequence. We propose an
end-to-end framework that uses a combination of deep convolution and recurrent
neural networks to extract high-level features from raw flow signal with a
structured output layer based on a conditional random field to model the
temporal transition structure of the sleep stages. We improve upon the previous
methods by 10% using our model, that can be augmented to the previous sleep
staging deep learning methods. We also show that our method can be used to
accurately track sleep metrics like sleep efficiency calculated from sleep
stages that can be deployed for monitoring the response of CPAP therapy on
sleep apnea patients. Apart from the technical contributions, we expect this
study to motivate new research questions in sleep science.Comment: Accepted at IEEE International Conference on BigData 201
Rethinking Monocular Depth Estimation with Adversarial Training
Monocular depth estimation is an extensively studied computer vision problem
with a vast variety of applications. Deep learning-based methods have
demonstrated promise for both supervised and unsupervised depth estimation from
monocular images. Most existing approaches treat depth estimation as a
regression problem with a local pixel-wise loss function. In this work, we
innovate beyond existing approaches by using adversarial training to learn a
context-aware, non-local loss function. Such an approach penalizes the joint
configuration of predicted depth values at the patch-level instead of the
pixel-level, which allows networks to incorporate more global information. In
this framework, the generator learns a mapping between RGB images and its
corresponding depth map, while the discriminator learns to distinguish depth
map and RGB pairs from ground truth. This conditional GAN depth estimation
framework is stabilized using spectral normalization to prevent mode collapse
when learning from diverse datasets. We test this approach using a diverse set
of generators that include U-Net and joint CNN-CRF. We benchmark this approach
on the NYUv2, Make3D and KITTI datasets, and observe that adversarial training
reduces relative error by several fold, achieving state-of-the-art performance
- …