687,238 research outputs found
CNN Architectures for Large-Scale Audio Classification
Convolutional Neural Networks (CNNs) have proven very effective in image
classification and show promise for audio. We use various CNN architectures to
classify the soundtracks of a dataset of 70M training videos (5.24 million
hours) with 30,871 video-level labels. We examine fully connected Deep Neural
Networks (DNNs), AlexNet [1], VGG [2], Inception [3], and ResNet [4]. We
investigate varying the size of both training set and label vocabulary, finding
that analogs of the CNNs used in image classification do well on our audio
classification task, and larger training and label sets help up to a point. A
model using embeddings from these classifiers does much better than raw
features on the Audio Set [5] Acoustic Event Detection (AED) classification
task.Comment: Accepted for publication at ICASSP 2017 Changes: Added definitions of
mAP, AUC, and d-prime. Updated mAP/AUC/d-prime numbers for Audio Set based on
changes of latest Audio Set revision. Changed wording to fit 4 page limit
with new addition
Middle-Level Features for the Explanation of Classification Systems by Sparse Dictionary Methods.
Machine learning (ML) systems are affected by a pervasive lack of transparency. The eXplainable Artificial Intelligence (XAI) research area addresses this problem and the related issue of explaining the behavior of ML systems in terms that are understandable to human beings. In many explanation of XAI approaches, the output of ML systems are explained in terms of low-level features of their inputs. However, these approaches leave a substantive explanatory burden with human users, insofar as the latter are required to map low-level properties into more salient and readily understandable parts of the input. To alleviate this cognitive burden, an alternative model-agnostic framework is proposed here. This framework is instantiated to address explanation problems in the context of ML image classification systems, without relying on pixel relevance maps and other low-level features of the input. More specifically, one obtains sets of middle-level properties of classification inputs that are perceptually salient by applying sparse dictionary learning techniques. These middle-level properties are used as building blocks for explanations of image classifications. The achieved explanations are parsimonious, for their reliance on a limited set of middle-level image properties. And they can be contrastive, because the set of middle-level image properties can be used to explain why the system advanced the proposed classification over other antagonist classifications. In view of its model-agnostic character, the proposed framework is adaptable to a variety of other ML systems and explanation problems
Variation Level Set Method for Multiphase Image Classification
Abstract-In this paper a multiphase image classification model based on variation level set method is presented. In recent years many classification algorithms based on level set method have been proposed for image classification. However, all of them have defects to some degree, such as parameters estimation and re-initialization of level set functions. To solve this problem, a new model including parameters estimation capability is proposed. Even for noise images the parameters needn't to be predefined. This model also includes a new term that forces the level set function to be close to a signed distance function. In addition, a boundary alignment term is also included in this model that is used for segmentation of thin structures. Finally the proposed model has been applied to both synthetic and real images with promising results
A Novel Approach Based on Decreased Dimension and Reduced Gray Level Range Matrix Features for Stone Texture Classification
The human eye can easily identify the type of textures in flooring of the houses and in the digital images visually. In this work, the stone textures are grouped into four categories. They are bricks, marble, granite and mosaic. A novel approach is developed for decreasing the dimension of stone image and for reducing the gray level range of the image without any loss of significant feature information. This model is named as “Decreased Dimension and Reduced Gray level Range Matrix (DDRGRM)” model. The DDRGRM model consists of 3 stages. In stage 1, each 5×5 sub dimension of the stone image is reduced into 2×2 sub dimension without losing any important qualities, primitives, and any other local stuff. In stage 2, the gray level of the image is reduced from 0-255 to 0-4 by using fuzzy concepts. In stage 3, Co-occurrence Matrix (CM) features are derived from the DDRGRM model of the stone image for stone texture classification. Based on the feature set values, a user defined algorithm is developed to classify the stone texture image into one of the 4 categories i.e. Marble, Brick, Granite and Mosaic. The proposed method is tested by using the K-Nearest Neighbor Classification algorithm with the derived texture features. To prove the efficiency of the proposed method, it is tested on different stone texture image databases. The proposed method resulted in high classification rate when compared with the other existing methods
Zero-Shot Visual Classification with Guided Cropping
Pretrained vision-language models, such as CLIP, show promising zero-shot
performance across a wide variety of datasets. For closed-set classification
tasks, however, there is an inherent limitation: CLIP image encoders are
typically designed to extract generic image-level features that summarize
superfluous or confounding information for the target tasks. This results in
degradation of classification performance, especially when objects of interest
cover small areas of input images. In this work, we propose CLIP with Guided
Cropping (GC-CLIP), where we use an off-the-shelf zero-shot object detection
model in a preprocessing step to increase focus of zero-shot classifier to the
object of interest and minimize influence of extraneous image regions. We
empirically show that our approach improves zero-shot classification results
across architectures and datasets, favorably for small objects
- …