62,677 research outputs found
Coupled Ensembles of Neural Networks
We investigate in this paper the architecture of deep convolutional networks.
Building on existing state of the art models, we propose a reconfiguration of
the model parameters into several parallel branches at the global network
level, with each branch being a standalone CNN. We show that this arrangement
is an efficient way to significantly reduce the number of parameters without
losing performance or to significantly improve the performance with the same
level of performance. The use of branches brings an additional form of
regularization. In addition to the split into parallel branches, we propose a
tighter coupling of these branches by placing the "fuse (averaging) layer"
before the Log-Likelihood and SoftMax layers during training. This gives
another significant performance improvement, the tighter coupling favouring the
learning of better representations, even at the level of the individual
branches. We refer to this branched architecture as "coupled ensembles". The
approach is very generic and can be applied with almost any DCNN architecture.
With coupled ensembles of DenseNet-BC and parameter budget of 25M, we obtain
error rates of 2.92%, 15.68% and 1.50% respectively on CIFAR-10, CIFAR-100 and
SVHN tasks. For the same budget, DenseNet-BC has error rate of 3.46%, 17.18%,
and 1.8% respectively. With ensembles of coupled ensembles, of DenseNet-BC
networks, with 50M total parameters, we obtain error rates of 2.72%, 15.13% and
1.42% respectively on these tasks
A Dilated Inception Network for Visual Saliency Prediction
Recently, with the advent of deep convolutional neural networks (DCNN), the
improvements in visual saliency prediction research are impressive. One
possible direction to approach the next improvement is to fully characterize
the multi-scale saliency-influential factors with a computationally-friendly
module in DCNN architectures. In this work, we proposed an end-to-end dilated
inception network (DINet) for visual saliency prediction. It captures
multi-scale contextual features effectively with very limited extra parameters.
Instead of utilizing parallel standard convolutions with different kernel sizes
as the existing inception module, our proposed dilated inception module (DIM)
uses parallel dilated convolutions with different dilation rates which can
significantly reduce the computation load while enriching the diversity of
receptive fields in feature maps. Moreover, the performance of our saliency
model is further improved by using a set of linear normalization-based
probability distribution distance metrics as loss functions. As such, we can
formulate saliency prediction as a probability distribution prediction task for
global saliency inference instead of a typical pixel-wise regression problem.
Experimental results on several challenging saliency benchmark datasets
demonstrate that our DINet with proposed loss functions can achieve
state-of-the-art performance with shorter inference time.Comment: Accepted by IEEE Transactions on Multimedia. The source codes are
available at https://github.com/ysyscool/DINe
A Feature Learning Siamese Model for Intelligent Control of the Dynamic Range Compressor
In this paper, a siamese DNN model is proposed to learn the characteristics
of the audio dynamic range compressor (DRC). This facilitates an intelligent
control system that uses audio examples to configure the DRC, a widely used
non-linear audio signal conditioning technique in the areas of music
production, speech communication and broadcasting. Several alternative siamese
DNN architectures are proposed to learn feature embeddings that can
characterise subtle effects due to dynamic range compression. These models are
compared with each other as well as handcrafted features proposed in previous
work. The evaluation of the relations between the hyperparameters of DNN and
DRC parameters are also provided. The best model is able to produce a universal
feature embedding that is capable of predicting multiple DRC parameters
simultaneously, which is a significant improvement from our previous research.
The feature embedding shows better performance than handcrafted audio features
when predicting DRC parameters for both mono-instrument audio loops and
polyphonic music pieces.Comment: 8 pages, accepted in IJCNN 201
- …