37,245 research outputs found
Large-scale weakly supervised audio classification using gated convolutional neural network
In this paper, we present a gated convolutional neural network and a temporal
attention-based localization method for audio classification, which won the 1st
place in the large-scale weakly supervised sound event detection task of
Detection and Classification of Acoustic Scenes and Events (DCASE) 2017
challenge. The audio clips in this task, which are extracted from YouTube
videos, are manually labeled with one or a few audio tags but without
timestamps of the audio events, which is called as weakly labeled data. Two
sub-tasks are defined in this challenge including audio tagging and sound event
detection using this weakly labeled data. A convolutional recurrent neural
network (CRNN) with learnable gated linear units (GLUs) non-linearity applied
on the log Mel spectrogram is proposed. In addition, a temporal attention
method is proposed along the frames to predicate the locations of each audio
event in a chunk from the weakly labeled data. We ranked the 1st and the 2nd as
a team in these two sub-tasks of DCASE 2017 challenge with F value 55.6\% and
Equal error 0.73, respectively.Comment: submitted to ICASSP2018, summary on the 1st place system in DCASE2017
task4 challeng
Efficient Yet Deep Convolutional Neural Networks for Semantic Segmentation
Semantic Segmentation using deep convolutional neural network pose more
complex challenge for any GPU intensive task. As it has to compute million of
parameters, it results to huge memory consumption. Moreover, extracting finer
features and conducting supervised training tends to increase the complexity.
With the introduction of Fully Convolutional Neural Network, which uses finer
strides and utilizes deconvolutional layers for upsampling, it has been a go to
for any image segmentation task. In this paper, we propose two segmentation
architecture which not only needs one-third the parameters to compute but also
gives better accuracy than the similar architectures. The model weights were
transferred from the popular neural net like VGG19 and VGG16 which were trained
on Imagenet classification data-set. Then we transform all the fully connected
layers to convolutional layers and use dilated convolution for decreasing the
parameters. Lastly, we add finer strides and attach four skip architectures
which are element-wise summed with the deconvolutional layers in steps. We
train and test on different sparse and fine data-sets like Pascal VOC2012,
Pascal-Context and NYUDv2 and show how better our model performs in this tasks.
On the other hand our model has a faster inference time and consumes less
memory for training and testing on NVIDIA Pascal GPUs, making it more efficient
and less memory consuming architecture for pixel-wise segmentation.Comment: 8 page
Holistic, Instance-Level Human Parsing
Object parsing -- the task of decomposing an object into its semantic parts
-- has traditionally been formulated as a category-level segmentation problem.
Consequently, when there are multiple objects in an image, current methods
cannot count the number of objects in the scene, nor can they determine which
part belongs to which object. We address this problem by segmenting the parts
of objects at an instance-level, such that each pixel in the image is assigned
a part label, as well as the identity of the object it belongs to. Moreover, we
show how this approach benefits us in obtaining segmentations at coarser
granularities as well. Our proposed network is trained end-to-end given
detections, and begins with a category-level segmentation module. Thereafter, a
differentiable Conditional Random Field, defined over a variable number of
instances for every input image, reasons about the identity of each part by
associating it with a human detection. In contrast to other approaches, our
method can handle the varying number of people in each image and our holistic
network produces state-of-the-art results in instance-level part and human
segmentation, together with competitive results in category-level part
segmentation, all achieved by a single forward-pass through our neural network.Comment: Poster at BMVC 201
- …