36,095 research outputs found
Rethinking Convolutional Semantic Segmentation Learning
Deep convolutional semantic segmentation (DCSS) learning doesn't converge to
an optimal local minimum with random parameters initializations; a pre-trained
model on the same domain becomes necessary to achieve convergence.In this work,
we propose a joint cooperative end-to-end learning method for DCSS. It
addresses many drawbacks with existing deep semantic segmentation learning; the
proposed approach simultaneously learn both segmentation and classification;
taking away the essential need of the pre-trained model for learning
convergence. We present an improved inception based architecture with partial
attention gating (PAG) over encoder information. The PAG also adds to achieve
faster convergence and better accuracy for segmentation task. We will show the
effectiveness of this learning on a diabetic retinopathy classification and
segmentation dataset
Shortcut-Stacked Sentence Encoders for Multi-Domain Inference
We present a simple sequential sentence encoder for multi-domain natural
language inference. Our encoder is based on stacked bidirectional LSTM-RNNs
with shortcut connections and fine-tuning of word embeddings. The overall
supervised model uses the above encoder to encode two input sentences into two
vectors, and then uses a classifier over the vector combination to label the
relationship between these two sentences as that of entailment, contradiction,
or neural. Our Shortcut-Stacked sentence encoders achieve strong improvements
over existing encoders on matched and mismatched multi-domain natural language
inference (top non-ensemble single-model result in the EMNLP RepEval 2017
Shared Task (Nangia et al., 2017)). Moreover, they achieve the new
state-of-the-art encoding result on the original SNLI dataset (Bowman et al.,
2015).Comment: EMNLP 2017 RepEval Multi-NLI Shared Task (6 pages
D-PCN: Parallel Convolutional Networks for Image Recognition via a Discriminator
In this paper, we introduce a simple but quite effective recognition
framework dubbed D-PCN, aiming at enhancing feature extracting ability of CNN.
The framework consists of two parallel CNNs, a discriminator and an extra
classifier which takes integrated features from parallel networks and gives
final prediction. The discriminator is core which drives parallel networks to
focus on different regions and learn complementary representations. The
corresponding joint training strategy is introduced which ensures the
utilization of discriminator. We validate D-PCN with several CNN models on two
benchmark datasets: CIFAR-100 and ImageNet32x32, D-PCN enhances all models. In
particular it yields state of the art performance on CIFAR-100 compared with
related works. We also conduct visualization experiment on fine-grained
Stanford Dogs dataset and verify our motivation. Additionally, we apply D-PCN
for segmentation on PASCAL VOC 2012 and also find promotion.Comment: 20 pages, 8 figures, 7 table
Parallel Convolutional Networks for Image Recognition via a Discriminator
In this paper, we introduce a simple but quite effective recognition
framework dubbed D-PCN, aiming at enhancing feature extracting ability of CNN.
The framework consists of two parallel CNNs, a discriminator and an extra
classifier which takes integrated features from parallel networks and gives
final prediction. The discriminator is core which drives parallel networks to
focus on different regions and learn different representations. The
corresponding training strategy is introduced to ensures utilization of
discriminator. We validate D-PCN with several CNN models on benchmark datasets:
CIFAR-100, and ImageNet, D-PCN enhances all models. In particular it yields
state of the art performance on CIFAR-100 compared with related works. We also
conduct visualization experiment on fine-grained Stanford Dogs dataset to
verify our motivation. Additionally, we apply D-PCN for segmentation on PASCAL
VOC 2012 and also find promotion.Comment: Accepted by ACCV 201
Identify Speakers in Cocktail Parties with End-to-End Attention
In scenarios where multiple speakers talk at the same time, it is important
to be able to identify the talkers accurately. This paper presents an
end-to-end system that integrates speech source extraction and speaker
identification, and proposes a new way to jointly optimize these two parts by
max-pooling the speaker predictions along the channel dimension. Residual
attention permits us to learn spectrogram masks that are optimized for the
purpose of speaker identification, while residual forward connections permit
dilated convolution with a sufficiently large context window to guarantee
correct streaming across syllable boundaries. End-to-end training results in a
system that recognizes one speaker in a two-speaker broadcast speech mixture
with 99.9% accuracy and both speakers with 93.9% accuracy, and that recognizes
all speakers in three-speaker scenarios with 81.2% accuracy.Comment: Accepted by Interspeech 2020 for presentation;
https://github.com/JunzheJosephZhu/Identify-Speakers-in-Cocktail-Parties-with-E2E-Attentio
Brain Tumor Segmentation Based on Refined Fully Convolutional Neural Networks with A Hierarchical Dice Loss
As a basic task in computer vision, semantic segmentation can provide
fundamental information for object detection and instance segmentation to help
the artificial intelligence better understand real world. Since the proposal of
fully convolutional neural network (FCNN), it has been widely used in semantic
segmentation because of its high accuracy of pixel-wise classification as well
as high precision of localization. In this paper, we apply several famous FCNN
to brain tumor segmentation, making comparisons and adjusting network
architectures to achieve better performance measured by metrics such as
precision, recall, mean of intersection of union (mIoU) and dice score
coefficient (DSC). The adjustments to the classic FCNN include adding more
connections between convolutional layers, enlarging decoders after up sample
layers and changing the way shallower layers' information is reused. Besides
the structure modification, we also propose a new classifier with a
hierarchical dice loss. Inspired by the containing relationship between
classes, the loss function converts multiple classification to multiple binary
classification in order to counteract the negative effect caused by imbalance
data set. Massive experiments have been done on the training set and testing
set in order to assess our refined fully convolutional neural networks and new
types of loss function. Competitive figures prove they are more effective than
their predecessors.Comment: 14 pages, 7 figures, 6 table
Adversarial Information Factorization
We propose a novel generative model architecture designed to learn
representations for images that factor out a single attribute from the rest of
the representation. A single object may have many attributes which when altered
do not change the identity of the object itself. Consider the human face; the
identity of a particular person is independent of whether or not they happen to
be wearing glasses. The attribute of wearing glasses can be changed without
changing the identity of the person. However, the ability to manipulate and
alter image attributes without altering the object identity is not a trivial
task. Here, we are interested in learning a representation of the image that
separates the identity of an object (such as a human face) from an attribute
(such as 'wearing glasses'). We demonstrate the success of our factorization
approach by using the learned representation to synthesize the same face with
and without a chosen attribute. We refer to this specific synthesis process as
image attribute manipulation. We further demonstrate that our model achieves
competitive scores, with state of the art, on a facial attribute classification
task
Residual-CNDS for Grand Challenge Scene Dataset
Increasing depth of convolutional neural networks (CNNs) is a highly
promising method of increasing the accuracy of the (CNNs). Increased CNN depth
will also result in increased layer count (parameters), leading to a slow
backpropagation convergence prone to overfitting. We trained our model
(Residual-CNDS) to classify very large-scale scene datasets MIT Places 205, and
MIT Places 365-Standard. The outcome result from the two datasets proved our
proposed model (Residual-CNDS) effectively handled the slow convergence,
overfitting, and degradation. CNNs that include deep supervision (CNDS) add
supplementary branches to the deep convolutional neural network in specified
layers by calculating vanishing, effectively addressing delayed convergence and
overfitting. Nevertheless, (CNDS) does not resolve degradation; hence, we add
residual learning to the (CNDS) in certain layers after studying the best place
in which to add it. With this approach we overcome degradation in the very deep
network. We have built two models (Residual-CNDS 8), and (Residual-CNDS 10).
Moreover, we tested our models on two large-scale datasets, and we compared our
results with other recently introduced cutting-edge networks in the domain of
top-1 and top-5 classification accuracy. As a result, both of models have shown
good improvement, which supports the assertion that the addition of residual
connections enhances network CNDS accuracy without adding any computation
complexity
Over the Air Deep Learning Based Radio Signal Classification
We conduct an in depth study on the performance of deep learning based radio
signal classification for radio communications signals. We consider a rigorous
baseline method using higher order moments and strong boosted gradient tree
classification and compare performance between the two approaches across a
range of configurations and channel impairments. We consider the effects of
carrier frequency offset, symbol rate, and multi-path fading in simulation and
conduct over-the-air measurement of radio classification performance in the lab
using software radios and compare performance and training strategies for both.
Finally we conclude with a discussion of remaining problems, and design
considerations for using such techniques.Comment: 13 pages, 22 figure
Handwritten Bangla Character Recognition Using The State-of-Art Deep Convolutional Neural Networks
In spite of advances in object recognition technology, Handwritten Bangla
Character Recognition (HBCR) remains largely unsolved due to the presence of
many ambiguous handwritten characters and excessively cursive Bangla
handwritings. Even the best existing recognizers do not lead to satisfactory
performance for practical applications related to Bangla character recognition
and have much lower performance than those developed for English alpha-numeric
characters. To improve the performance of HBCR, we herein present the
application of the state-of-the-art Deep Convolutional Neural Networks (DCNN)
including VGG Network, All Convolution Network (All-Conv Net), Network in
Network (NiN), Residual Network, FractalNet, and DenseNet for HBCR. The deep
learning approaches have the advantage of extracting and using feature
information, improving the recognition of 2D shapes with a high degree of
invariance to translation, scaling and other distortions. We systematically
evaluated the performance of DCNN models on publicly available Bangla
handwritten character dataset called CMATERdb and achieved the superior
recognition accuracy when using DCNN models. This improvement would help in
building an automatic HBCR system for practical applications.Comment: 12 pages,22 figures, 5 tables. arXiv admin note: text overlap with
arXiv:1705.0268
- …