687,238 research outputs found

    CNN Architectures for Large-Scale Audio Classification

    Full text link
    Convolutional Neural Networks (CNNs) have proven very effective in image classification and show promise for audio. We use various CNN architectures to classify the soundtracks of a dataset of 70M training videos (5.24 million hours) with 30,871 video-level labels. We examine fully connected Deep Neural Networks (DNNs), AlexNet [1], VGG [2], Inception [3], and ResNet [4]. We investigate varying the size of both training set and label vocabulary, finding that analogs of the CNNs used in image classification do well on our audio classification task, and larger training and label sets help up to a point. A model using embeddings from these classifiers does much better than raw features on the Audio Set [5] Acoustic Event Detection (AED) classification task.Comment: Accepted for publication at ICASSP 2017 Changes: Added definitions of mAP, AUC, and d-prime. Updated mAP/AUC/d-prime numbers for Audio Set based on changes of latest Audio Set revision. Changed wording to fit 4 page limit with new addition

    Middle-Level Features for the Explanation of Classification Systems by Sparse Dictionary Methods.

    Get PDF
    Machine learning (ML) systems are affected by a pervasive lack of transparency. The eXplainable Artificial Intelligence (XAI) research area addresses this problem and the related issue of explaining the behavior of ML systems in terms that are understandable to human beings. In many explanation of XAI approaches, the output of ML systems are explained in terms of low-level features of their inputs. However, these approaches leave a substantive explanatory burden with human users, insofar as the latter are required to map low-level properties into more salient and readily understandable parts of the input. To alleviate this cognitive burden, an alternative model-agnostic framework is proposed here. This framework is instantiated to address explanation problems in the context of ML image classification systems, without relying on pixel relevance maps and other low-level features of the input. More specifically, one obtains sets of middle-level properties of classification inputs that are perceptually salient by applying sparse dictionary learning techniques. These middle-level properties are used as building blocks for explanations of image classifications. The achieved explanations are parsimonious, for their reliance on a limited set of middle-level image properties. And they can be contrastive, because the set of middle-level image properties can be used to explain why the system advanced the proposed classification over other antagonist classifications. In view of its model-agnostic character, the proposed framework is adaptable to a variety of other ML systems and explanation problems

    Variation Level Set Method for Multiphase Image Classification

    Get PDF
    Abstract-In this paper a multiphase image classification model based on variation level set method is presented. In recent years many classification algorithms based on level set method have been proposed for image classification. However, all of them have defects to some degree, such as parameters estimation and re-initialization of level set functions. To solve this problem, a new model including parameters estimation capability is proposed. Even for noise images the parameters needn't to be predefined. This model also includes a new term that forces the level set function to be close to a signed distance function. In addition, a boundary alignment term is also included in this model that is used for segmentation of thin structures. Finally the proposed model has been applied to both synthetic and real images with promising results

    A Novel Approach Based on Decreased Dimension and Reduced Gray Level Range Matrix Features for Stone Texture Classification

    Get PDF
    The human eye can easily identify the type of textures in flooring of the houses and in the digital images visually.  In this work, the stone textures are grouped into four categories. They are bricks, marble, granite and mosaic. A novel approach is developed for decreasing the dimension of stone image and for reducing the gray level range of the image without any loss of significant feature information. This model is named as “Decreased Dimension and Reduced Gray level Range Matrix (DDRGRM)” model. The DDRGRM model consists of 3 stages.  In stage 1, each 5×5 sub dimension of the stone image is reduced into 2×2 sub dimension without losing any important qualities, primitives, and any other local stuff.  In stage 2, the gray level of the image is reduced from 0-255 to 0-4 by using fuzzy concepts.  In stage 3, Co-occurrence Matrix (CM) features are derived from the DDRGRM model of the stone image for stone texture classification.  Based on the feature set values, a user defined algorithm is developed to classify the stone texture image into one of the 4 categories i.e. Marble, Brick, Granite and Mosaic. The proposed method is tested by using the K-Nearest Neighbor Classification algorithm with the derived texture features.  To prove the efficiency of the proposed method, it is tested on different stone texture image databases.  The proposed method resulted in high classification rate when compared with the other existing methods

    Zero-Shot Visual Classification with Guided Cropping

    Full text link
    Pretrained vision-language models, such as CLIP, show promising zero-shot performance across a wide variety of datasets. For closed-set classification tasks, however, there is an inherent limitation: CLIP image encoders are typically designed to extract generic image-level features that summarize superfluous or confounding information for the target tasks. This results in degradation of classification performance, especially when objects of interest cover small areas of input images. In this work, we propose CLIP with Guided Cropping (GC-CLIP), where we use an off-the-shelf zero-shot object detection model in a preprocessing step to increase focus of zero-shot classifier to the object of interest and minimize influence of extraneous image regions. We empirically show that our approach improves zero-shot classification results across architectures and datasets, favorably for small objects
    corecore