3,887 research outputs found
Structure Regularized Bidirectional Recurrent Convolutional Neural Network for Relation Classification
Relation classification is an important semantic processing task in the field
of natural language processing (NLP). In this paper, we present a novel model,
Structure Regularized Bidirectional Recurrent Convolutional Neural
Network(SR-BRCNN), to classify the relation of two entities in a sentence, and
the new dataset of Chinese Sanwen for named entity recognition and relation
classification. Some state-of-the-art systems concentrate on modeling the
shortest dependency path (SDP) between two entities leveraging convolutional or
recurrent neural networks. We further explore how to make full use of the
dependency relations information in the SDP and how to improve the model by the
method of structure regularization. We propose a structure regularized model to
learn relation representations along the SDP extracted from the forest formed
by the structure regularized dependency tree, which benefits reducing the
complexity of the whole model and helps improve the score by 10.3.
Experimental results show that our method outperforms the state-of-the-art
approaches on the Chinese Sanwen task and performs as well on the SemEval-2010
Task 8 dataset\footnote{The Chinese Sanwen corpus this paper developed and used
will be released in the further.Comment: arXiv admin note: text overlap with arXiv:1411.6243 by other author
Understand Scene Categories by Objects: A Semantic Regularized Scene Classifier Using Convolutional Neural Networks
Scene classification is a fundamental perception task for environmental
understanding in today's robotics. In this paper, we have attempted to exploit
the use of popular machine learning technique of deep learning to enhance scene
understanding, particularly in robotics applications. As scene images have
larger diversity than the iconic object images, it is more challenging for deep
learning methods to automatically learn features from scene images with less
samples. Inspired by human scene understanding based on object knowledge, we
address the problem of scene classification by encouraging deep neural networks
to incorporate object-level information. This is implemented with a
regularization of semantic segmentation. With only 5 thousand training images,
as opposed to 2.5 million images, we show the proposed deep architecture
achieves superior scene classification results to the state-of-the-art on a
publicly available SUN RGB-D dataset. In addition, performance of semantic
segmentation, the regularizer, also reaches a new record with refinement
derived from predicted scene labels. Finally, we apply our SUN RGB-D dataset
trained model to a mobile robot captured images to classify scenes in our
university demonstrating the generalization ability of the proposed algorithm
Multi-modal Face Pose Estimation with Multi-task Manifold Deep Learning
Human face pose estimation aims at estimating the gazing direction or head
postures with 2D images. It gives some very important information such as
communicative gestures, saliency detection and so on, which attracts plenty of
attention recently. However, it is challenging because of complex background,
various orientations and face appearance visibility. Therefore, a descriptive
representation of face images and mapping it to poses are critical. In this
paper, we make use of multi-modal data and propose a novel face pose estimation
method that uses a novel deep learning framework named Multi-task Manifold Deep
Learning . It is based on feature extraction with improved deep neural
networks and multi-modal mapping relationship with multi-task learning. In the
proposed deep learning based framework, Manifold Regularized Convolutional
Layers (MRCL) improve traditional convolutional layers by learning the
relationship among outputs of neurons. Besides, in the proposed mapping
relationship learning method, different modals of face representations are
naturally combined to learn the mapping function from face images to poses. In
this way, the computed mapping model with multiple tasks is improved.
Experimental results on three challenging benchmark datasets DPOSE, HPID and
BKHPD demonstrate the outstanding performance of
Representation Learning: A Review and New Perspectives
The success of machine learning algorithms generally depends on data
representation, and we hypothesize that this is because different
representations can entangle and hide more or less the different explanatory
factors of variation behind the data. Although specific domain knowledge can be
used to help design representations, learning with generic priors can also be
used, and the quest for AI is motivating the design of more powerful
representation-learning algorithms implementing such priors. This paper reviews
recent work in the area of unsupervised feature learning and deep learning,
covering advances in probabilistic models, auto-encoders, manifold learning,
and deep networks. This motivates longer-term unanswered questions about the
appropriate objectives for learning good representations, for computing
representations (i.e., inference), and the geometrical connections between
representation learning, density estimation and manifold learning
Learning Spatial-Aware Regressions for Visual Tracking
In this paper, we analyze the spatial information of deep features, and
propose two complementary regressions for robust visual tracking. First, we
propose a kernelized ridge regression model wherein the kernel value is defined
as the weighted sum of similarity scores of all pairs of patches between two
samples. We show that this model can be formulated as a neural network and thus
can be efficiently solved. Second, we propose a fully convolutional neural
network with spatially regularized kernels, through which the filter kernel
corresponding to each output channel is forced to focus on a specific region of
the target. Distance transform pooling is further exploited to determine the
effectiveness of each output channel of the convolution layer. The outputs from
the kernelized ridge regression model and the fully convolutional neural
network are combined to obtain the ultimate response. Experimental results on
two benchmark datasets validate the effectiveness of the proposed method.Comment: To appear in CVPR201
Deep Learning in Cardiology
The medical field is creating large amount of data that physicians are unable
to decipher and use efficiently. Moreover, rule-based expert systems are
inefficient in solving complicated medical tasks or for creating insights using
big data. Deep learning has emerged as a more accurate and effective technology
in a wide range of medical problems such as diagnosis, prediction and
intervention. Deep learning is a representation learning method that consists
of layers that transform the data non-linearly, thus, revealing hierarchical
relationships and structures. In this review we survey deep learning
application papers that use structured data, signal and imaging modalities from
cardiology. We discuss the advantages and limitations of applying deep learning
in cardiology that also apply in medicine in general, while proposing certain
directions as the most viable for clinical use.Comment: 27 pages, 2 figures, 10 table
Hyperspectral Image Classification with Markov Random Fields and a Convolutional Neural Network
This paper presents a new supervised classification algorithm for remotely
sensed hyperspectral image (HSI) which integrates spectral and spatial
information in a unified Bayesian framework. First, we formulate the HSI
classification problem from a Bayesian perspective. Then, we adopt a
convolutional neural network (CNN) to learn the posterior class distributions
using a patch-wise training strategy to better use the spatial information.
Next, spatial information is further considered by placing a spatial smoothness
prior on the labels. Finally, we iteratively update the CNN parameters using
stochastic gradient decent (SGD) and update the class labels of all pixel
vectors using an alpha-expansion min-cut-based algorithm. Compared with other
state-of-the-art methods, the proposed classification method achieves better
performance on one synthetic dataset and two benchmark HSI datasets in a number
of experimental settings
Deep Covariance Descriptors for Facial Expression Recognition
In this paper, covariance matrices are exploited to encode the deep
convolutional neural networks (DCNN) features for facial expression
recognition. The space geometry of the covariance matrices is that of Symmetric
Positive Definite (SPD) matrices. By performing the classification of the
facial expressions using Gaussian kernel on SPD manifold, we show that the
covariance descriptors computed on DCNN features are more efficient than the
standard classification with fully connected layers and softmax. By
implementing our approach using the VGG-face and ExpNet architectures with
extensive experiments on the Oulu-CASIA and SFEW datasets, we show that the
proposed approach achieves performance at the state of the art for facial
expression recognition
Revisit Multinomial Logistic Regression in Deep Learning: Data Dependent Model Initialization for Image Recognition
We study in this paper how to initialize the parameters of multinomial
logistic regression (a fully connected layer followed with softmax and cross
entropy loss), which is widely used in deep neural network (DNN) models for
classification problems. As logistic regression is widely known not having a
closed-form solution, it is usually randomly initialized, leading to several
deficiencies especially in transfer learning where all the layers except for
the last task-specific layer are initialized using a pre-trained model. The
deficiencies include slow convergence speed, possibility of stuck in local
minimum, and the risk of over-fitting. To address those deficiencies, we first
study the properties of logistic regression and propose a closed-form
approximate solution named regularized Gaussian classifier (RGC). Then we adopt
this approximate solution to initialize the task-specific linear layer and
demonstrate superior performance over random initialization in terms of both
accuracy and convergence speed on various tasks and datasets. For example, for
image classification, our approach can reduce the training time by 10 times and
achieve 3.2% gain in accuracy for Flickr-style classification. For object
detection, our approach can also be 10 times faster in training for the same
accuracy, or 5% better in terms of mAP for VOC 2007 with slightly longer
training.Comment: tech repor
Bidirectional-Convolutional LSTM Based Spectral-Spatial Feature Learning for Hyperspectral Image Classification
This paper proposes a novel deep learning framework named
bidirectional-convolutional long short term memory (Bi-CLSTM) network to
automatically learn the spectral-spatial feature from hyperspectral images
(HSIs). In the network, the issue of spectral feature extraction is considered
as a sequence learning problem, and a recurrent connection operator across the
spectral domain is used to address it. Meanwhile, inspired from the widely used
convolutional neural network (CNN), a convolution operator across the spatial
domain is incorporated into the network to extract the spatial feature.
Besides, to sufficiently capture the spectral information, a bidirectional
recurrent connection is proposed. In the classification phase, the learned
features are concatenated into a vector and fed to a softmax classifier via a
fully-connected operator. To validate the effectiveness of the proposed
Bi-CLSTM framework, we compare it with several state-of-the-art methods,
including the CNN framework, on three widely used HSIs. The obtained results
show that Bi-CLSTM can improve the classification performance as compared to
other methods
- …