Search CORE

7,447 research outputs found

Hierarchical Sound Event Classification

Author: Fan Jianyu
Nichols Eric
Tompkins Daniel
Publication venue: 'New York University'
Publication date: 01/10/2019
Field of study

Task 5 of the Detection and Classification of Acoustic Scenes and Events (DCASE) 2019 challenge is "urban sound tagging''. Given a set of known sound categories and sub-categories, the goal is to build a multi-label audio classification model to predict whether each sound category is present or absent in an audio recording. We developed a model composed of a preprocessing layer that converts audio to a log-mel spectrogram, a VGG-inspired Convolutional Neural Network (CNN) that generates an embedding for the spectrogram, a pre-trained VGGish network that generates a separate audio embedding, and finally a series of fully-connected layers that converts these two embeddings (concatenated) into a multi-label classification. This model directly outputs both “fine” and “coarse” labels; it treats the task as a 37-way multi-label classification problem. One version of this network did better at the coarse labels (CNN+VGGish1); another did better with fine labels on Micro AUPRC (CNN+VGGish2). A separate family of CNN models was also trained to take into account the hierarchical nature of the labels (Hierarchical1, Hierarchical2, and Hierarchical3). The hierarchical models perform better on Micro AUPRC with fine-level classification.24825

New York University Faculty Digital Archive

HD-CNN: Hierarchical Deep Convolutional Neural Network for Large Scale Visual Recognition

Author: DeCoste Dennis
Di Wei
Jagadeesh Vignesh
Piramuthu Robinson
Yan Zhicheng
Yu Yizhou
Zhang Hao
Publication venue
Publication date: 01/01/2015
Field of study

In image classification, visual separability between different object categories is highly uneven, and some categories are more difficult to distinguish than others. Such difficult categories demand more dedicated classifiers. However, existing deep convolutional neural networks (CNN) are trained as flat N-way classifiers, and few efforts have been made to leverage the hierarchical structure of categories. In this paper, we introduce hierarchical deep CNNs (HD-CNNs) by embedding deep CNNs into a category hierarchy. An HD-CNN separates easy classes using a coarse category classifier while distinguishing difficult classes using fine category classifiers. During HD-CNN training, component-wise pretraining is followed by global finetuning with a multinomial logistic loss regularized by a coarse category consistency term. In addition, conditional executions of fine category classifiers and layer parameter compression make HD-CNNs scalable for large-scale visual recognition. We achieve state-of-the-art results on both CIFAR100 and large-scale ImageNet 1000-class benchmark datasets. In our experiments, we build up three different HD-CNNs and they lower the top-1 error of the standard CNNs by 2.65%, 3.1% and 1.1%, respectively.Comment: Add new results on ImageNet using VGG-16-layer building block ne

arXiv.org e-Print Archive

HKU Scholars Hub

Dual Skipping Networks

Author: Cheng Changmao
Feng Jianfeng
Fu Yanwei
Jiang Yu-Gang
Liu Wei
Lu Wenlian
Xue Xiangyang
Publication venue
Publication date: 27/05/2018
Field of study

Inspired by the recent neuroscience studies on the left-right asymmetry of the human brain in processing low and high spatial frequency information, this paper introduces a dual skipping network which carries out coarse-to-fine object categorization. Such a network has two branches to simultaneously deal with both coarse and fine-grained classification tasks. Specifically, we propose a layer-skipping mechanism that learns a gating network to predict which layers to skip in the testing stage. This layer-skipping mechanism endows the network with good flexibility and capability in practice. Evaluations are conducted on several widely used coarse-to-fine object categorization benchmarks, and promising results are achieved by our proposed network model.Comment: CVPR 2018 (poster); fix typ

arXiv.org e-Print Archive

Crossref

Fine-grained Image Classification by Exploring Bipartite-Graph Labels

Author: Lin Yuanqing
Zhou Feng
Publication venue
Publication date: 10/12/2015
Field of study

Given a food image, can a fine-grained object recognition engine tell "which restaurant which dish" the food belongs to? Such ultra-fine grained image recognition is the key for many applications like search by images, but it is very challenging because it needs to discern subtle difference between classes while dealing with the scarcity of training data. Fortunately, the ultra-fine granularity naturally brings rich relationships among object classes. This paper proposes a novel approach to exploit the rich relationships through bipartite-graph labels (BGL). We show how to model BGL in an overall convolutional neural networks and the resulting system can be optimized through back-propagation. We also show that it is computationally efficient in inference thanks to the bipartite structure. To facilitate the study, we construct a new food benchmark dataset, which consists of 37,885 food images collected from 6 restaurants and totally 975 menus. Experimental results on this new food and three other datasets demonstrates BGL advances previous works in fine-grained object recognition. An online demo is available at http://www.f-zhou.com/fg_demo/

arXiv.org e-Print Archive

Crossref

DISC: Deep Image Saliency Computing via Progressive Representation Learning

Author: Chen Tianshui
Li Xuelong
Lin Liang
Liu Lingbo
Luo Xiaonan
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 10/12/2015
Field of study

Salient object detection increasingly receives attention as an important component or step in several pattern recognition and image processing tasks. Although a variety of powerful saliency models have been intensively proposed, they usually involve heavy feature (or model) engineering based on priors (or assumptions) about the properties of objects and backgrounds. Inspired by the effectiveness of recently developed feature learning, we provide a novel Deep Image Saliency Computing (DISC) framework for fine-grained image saliency computing. In particular, we model the image saliency from both the coarse- and fine-level observations, and utilize the deep convolutional neural network (CNN) to learn the saliency representation in a progressive manner. Specifically, our saliency model is built upon two stacked CNNs. The first CNN generates a coarse-level saliency map by taking the overall image as the input, roughly identifying saliency regions in the global context. Furthermore, we integrate superpixel-based local context information in the first CNN to refine the coarse-level saliency map. Guided by the coarse saliency map, the second CNN focuses on the local context to produce fine-grained and accurate saliency map while preserving object details. For a testing image, the two CNNs collaboratively conduct the saliency computing in one shot. Our DISC framework is capable of uniformly highlighting the objects-of-interest from complex background while preserving well object details. Extensive experiments on several standard benchmarks suggest that DISC outperforms other state-of-the-art methods and it also generalizes well across datasets without additional training. The executable version of DISC is available online: http://vision.sysu.edu.cn/projects/DISC.Comment: This manuscript is the accepted version for IEEE Transactions on Neural Networks and Learning Systems (T-NNLS), 201

arXiv.org e-Print Archive

Institutional Repository of Xi'an Institute of Optics and Precision Mechanics, CAS