2 research outputs found
Impostor Networks for Fast Fine-Grained Recognition
In this work we introduce impostor networks, an architecture that allows to
perform fine-grained recognition with high accuracy and using a light-weight
convolutional network, making it particularly suitable for fine-grained
applications on low-power and non-GPU enabled platforms. Impostor networks
compensate for the lightness of its `backend' network by combining it with a
lightweight non-parametric classifier. The combination of a convolutional
network and such non-parametric classifier is trained in an end-to-end fashion.
Similarly to convolutional neural networks, impostor networks can fit
large-scale training datasets very well, while also being able to generalize to
new data points. At the same time, the bulk of computations within impostor
networks happen through nearest neighbor search in high-dimensions. Such search
can be performed efficiently on a variety of architectures including standard
CPUs, where deep convolutional networks are inefficient. In a series of
experiments with three fine-grained datasets, we show that impostor networks
are able to boost the classification accuracy of a moderate-sized convolutional
network considerably at a very small computational cost
Cross-domain Deep Feature Combination for Bird Species Classification with Audio-visual Data
In recent decade, many state-of-the-art algorithms on image classification as
well as audio classification have achieved noticeable successes with the
development of deep convolutional neural network (CNN). However, most of the
works only exploit single type of training data. In this paper, we present a
study on classifying bird species by exploiting the combination of both visual
(images) and audio (sounds) data using CNN, which has been sparsely treated so
far. Specifically, we propose CNN-based multimodal learning models in three
types of fusion strategies (early, middle, late) to settle the issues of
combining training data cross domains. The advantage of our proposed method
lies on the fact that We can utilize CNN not only to extract features from
image and audio data (spectrogram) but also to combine the features across
modalities. In the experiment, we train and evaluate the network structure on a
comprehensive CUB-200-2011 standard data set combing our originally collected
audio data set with respect to the data species. We observe that a model which
utilizes the combination of both data outperforms models trained with only an
either type of data. We also show that transfer learning can significantly
increase the classification performance