38,325 research outputs found
Efficient Deep Feature Learning and Extraction via StochasticNets
Deep neural networks are a powerful tool for feature learning and extraction
given their ability to model high-level abstractions in highly complex data.
One area worth exploring in feature learning and extraction using deep neural
networks is efficient neural connectivity formation for faster feature learning
and extraction. Motivated by findings of stochastic synaptic connectivity
formation in the brain as well as the brain's uncanny ability to efficiently
represent information, we propose the efficient learning and extraction of
features via StochasticNets, where sparsely-connected deep neural networks can
be formed via stochastic connectivity between neurons. To evaluate the
feasibility of such a deep neural network architecture for feature learning and
extraction, we train deep convolutional StochasticNets to learn abstract
features using the CIFAR-10 dataset, and extract the learned features from
images to perform classification on the SVHN and STL-10 datasets. Experimental
results show that features learned using deep convolutional StochasticNets,
with fewer neural connections than conventional deep convolutional neural
networks, can allow for better or comparable classification accuracy than
conventional deep neural networks: relative test error decrease of ~4.5% for
classification on the STL-10 dataset and ~1% for classification on the SVHN
dataset. Furthermore, it was shown that the deep features extracted using deep
convolutional StochasticNets can provide comparable classification accuracy
even when only 10% of the training data is used for feature learning. Finally,
it was also shown that significant gains in feature extraction speed can be
achieved in embedded applications using StochasticNets. As such, StochasticNets
allow for faster feature learning and extraction performance while facilitate
for better or comparable accuracy performances.Comment: 10 pages. arXiv admin note: substantial text overlap with
arXiv:1508.0546
Time-Efficient Hybrid Approach for Facial Expression Recognition
Facial expression recognition is an emerging research area for improving human and computer interaction. This research plays a significant role in the field of social communication, commercial enterprise, law enforcement, and other computer interactions. In this paper, we propose a time-efficient hybrid design for facial expression recognition, combining image pre-processing steps and different Convolutional Neural Network (CNN) structures providing better accuracy and greatly improved training time. We are predicting seven basic emotions of human faces: sadness, happiness, disgust, anger, fear, surprise and neutral. The model performs well regarding challenging facial expression recognition where the emotion expressed could be one of several due to their quite similar facial characteristics such as anger, disgust, and sadness. The experiment to test the model was conducted across multiple databases and different facial orientations, and to the best of our knowledge, the model provided an accuracy of about 89.58% for KDEF dataset, 100% accuracy for JAFFE dataset and 71.975% accuracy for combined (KDEF + JAFFE + SFEW) dataset across these different scenarios. Performance evaluation was done by cross-validation techniques to avoid bias towards a specific set of images from a database
Structure fusion based on graph convolutional networks for semi-supervised classification
Suffering from the multi-view data diversity and complexity for
semi-supervised classification, most of existing graph convolutional networks
focus on the networks architecture construction or the salient graph structure
preservation, and ignore the the complete graph structure for semi-supervised
classification contribution. To mine the more complete distribution structure
from multi-view data with the consideration of the specificity and the
commonality, we propose structure fusion based on graph convolutional networks
(SF-GCN) for improving the performance of semi-supervised classification.
SF-GCN can not only retain the special characteristic of each view data by
spectral embedding, but also capture the common style of multi-view data by
distance metric between multi-graph structures. Suppose the linear relationship
between multi-graph structures, we can construct the optimization function of
structure fusion model by balancing the specificity loss and the commonality
loss. By solving this function, we can simultaneously obtain the fusion
spectral embedding from the multi-view data and the fusion structure as
adjacent matrix to input graph convolutional networks for semi-supervised
classification. Experiments demonstrate that the performance of SF-GCN
outperforms that of the state of the arts on three challenging datasets, which
are Cora,Citeseer and Pubmed in citation networks
Depth from Monocular Images using a Semi-Parallel Deep Neural Network (SPDNN) Hybrid Architecture
Deep neural networks are applied to a wide range of problems in recent years.
In this work, Convolutional Neural Network (CNN) is applied to the problem of
determining the depth from a single camera image (monocular depth). Eight
different networks are designed to perform depth estimation, each of them
suitable for a feature level. Networks with different pooling sizes determine
different feature levels. After designing a set of networks, these models may
be combined into a single network topology using graph optimization techniques.
This "Semi Parallel Deep Neural Network (SPDNN)" eliminates duplicated common
network layers, and can be further optimized by retraining to achieve an
improved model compared to the individual topologies. In this study, four SPDNN
models are trained and have been evaluated at 2 stages on the KITTI dataset.
The ground truth images in the first part of the experiment are provided by the
benchmark, and for the second part, the ground truth images are the depth map
results from applying a state-of-the-art stereo matching method. The results of
this evaluation demonstrate that using post-processing techniques to refine the
target of the network increases the accuracy of depth estimation on individual
mono images. The second evaluation shows that using segmentation data alongside
the original data as the input can improve the depth estimation results to a
point where performance is comparable with stereo depth estimation. The
computational time is also discussed in this study.Comment: 44 pages, 25 figure
Image Denoising with Graph-Convolutional Neural Networks
Recovering an image from a noisy observation is a key problem in signal
processing. Recently, it has been shown that data-driven approaches employing
convolutional neural networks can outperform classical model-based techniques,
because they can capture more powerful and discriminative features. However,
since these methods are based on convolutional operations, they are only
capable of exploiting local similarities without taking into account non-local
self-similarities. In this paper we propose a convolutional neural network that
employs graph-convolutional layers in order to exploit both local and non-local
similarities. The graph-convolutional layers dynamically construct
neighborhoods in the feature space to detect latent correlations in the feature
maps produced by the hidden layers. The experimental results show that the
proposed architecture outperforms classical convolutional neural networks for
the denoising task.Comment: IEEE International Conference on Image Processing (ICIP) 201
Semantic Object Parsing with Graph LSTM
By taking the semantic object parsing task as an exemplar application
scenario, we propose the Graph Long Short-Term Memory (Graph LSTM) network,
which is the generalization of LSTM from sequential data or multi-dimensional
data to general graph-structured data. Particularly, instead of evenly and
fixedly dividing an image to pixels or patches in existing multi-dimensional
LSTM structures (e.g., Row, Grid and Diagonal LSTMs), we take each
arbitrary-shaped superpixel as a semantically consistent node, and adaptively
construct an undirected graph for each image, where the spatial relations of
the superpixels are naturally used as edges. Constructed on such an adaptive
graph topology, the Graph LSTM is more naturally aligned with the visual
patterns in the image (e.g., object boundaries or appearance similarities) and
provides a more economical information propagation route. Furthermore, for each
optimization step over Graph LSTM, we propose to use a confidence-driven scheme
to update the hidden and memory states of nodes progressively till all nodes
are updated. In addition, for each node, the forgets gates are adaptively
learned to capture different degrees of semantic correlation with neighboring
nodes. Comprehensive evaluations on four diverse semantic object parsing
datasets well demonstrate the significant superiority of our Graph LSTM over
other state-of-the-art solutions.Comment: 18 page
- …