6,986 research outputs found

    Acoustic Scene Classification

    Get PDF
    This work was supported by the Centre for Digital Music Platform (grant EP/K009559/1) and a Leadership Fellowship (EP/G007144/1) both from the United Kingdom Engineering and Physical Sciences Research Council

    Real time automatic scene classification

    Get PDF
    This work has been done as part of the EU VICAR (IST) project and the EU SCOFI project (IAP). The aim of the first project was to develop a real time video indexing classification annotation and retrieval system. For our systems, we have adapted the approach of Picard and Minka [3], who categorized elements of a scene automatically with so-called ’stuff’ categories (e.g., grass, sky, sand, stone). Campbell et al. [1] use similar concepts to describe certain parts of an image, which they named “labeled image regions”. However, they did not use these elements to classify the topic of the scene. Subsequently, we developed a generic approach for the recognition of visual scenes, where an alphabet of basic visual elements (or “typed patches”) is used to classify the topic of a scene. We define a new image element: a patch, which is a group of adjacent pixels within an image, described by a specific local pixel distribution, brightness, and color. In contrast with pixels, a patch as a whole can incorporate semantics. A patch is described by a HSI color histogram with 16 bins and by three texture features (i.e., the variance and two values based on the two eigen values of the covariance matrix of the Intensity values of a mask ran over the image. For more details on the features used we refer to Israel et al. [2]. We aimed at describing each image as a vector with a fixed size and with information about the position of patches that is not strict (strict position would limit generalization). Therefore, a fixed grid is placed over the image and each grid cell is segmented into patches, which are then categorized by a patch classifier. For each grid cell a frequency vector of its classified patches is calculated. These vectors are concate- nated. The resulting vector describes the complete image. Several grids were applied and several patch sizes with the grid cells were tested. Grid size of 3x2 combined with patches of size 16x16 provided the best system performance. For the two classification phases of our system, back-propagation networks were trained: (i) classification of the patches and (ii) classification of the image vector, as a whole. The system was tested on the classification of eight categories of scenes from the Corel database: interiors, city/street, forest, agriculture/countryside, desert, sea, portrait, and crowds. Each of these categories were relevant for the VICAR project. Based upon their relevance for these eight categories of scenes, we choose nine categories for the classification of the patches: building, crowd, grass, road, sand, skin, sky, tree, and water. This approach was found to be successful (for classification of the patches 87.5% correct, and classification of the scenes 73.8% correct). An advantage of our method is its low computational complexity. Moreover, the classified patches themselves are intermediate image representations and can be used for image classification, image segmentation as well as for image matching. A disadvantage is that the patches with which the classifiers were trained had to be manually classified. To solve this drawback, we currently develop algorithms for automatic extraction of relevant patch types. Within the IST project VICAR, a video indexing system was built for the Netherlands Institute for Sound and Vision1, consisting of four independent mod- ules: car recognition, face recognition, movement recognition (of people) and scene recognition. The latter module was based upon the afore mentioned approach. Within the IAP project SCOFI, a real time Internet pornography filter was built, based upon this approach. The system is currently running on several schools in Europe. Within the SCOFI filtering system, our image classification system (with a performance of 92% correct) works together with a text classi- fication system that includes a proxy server (FilterX, developed by Demokritos, Greece) to classify web-pages. Its total performance is 0% overblocking and 1% underblocking

    ARTSCENE: A Neural System for Natural Scene Classification

    Full text link
    How do humans rapidly recognize a scene? How can neural models capture this biological competence to achieve state-of-the-art scene classification? The ARTSCENE neural system classifies natural scene photographs by using multiple spatial scales to efficiently accumulate evidence for gist and texture. ARTSCENE embodies a coarse-to-fine Texture Size Ranking Principle whereby spatial attention processes multiple scales of scenic information, ranging from global gist to local properties of textures. The model can incrementally learn and predict scene identity by gist information alone and can improve performance through selective attention to scenic textures of progressively smaller size. ARTSCENE discriminates 4 landscape scene categories (coast, forest, mountain and countryside) with up to 91.58% correct on a test set, outperforms alternative models in the literature which use biologically implausible computations, and outperforms component systems that use either gist or texture information alone. Model simulations also show that adjacent textures form higher-order features that are also informative for scene recognition.National Science Foundation (NSF SBE-0354378); Office of Naval Research (N00014-01-1-0624
    • 

    corecore