116,738 research outputs found

    Interactive Perception Based on Gaussian Process Classification for House-Hold Objects Recognition and Sorting

    Get PDF
    We present an interactive perception model for object sorting based on Gaussian Process (GP) classification that is capable of recognizing objects categories from point cloud data. In our approach, FPFH features are extracted from point clouds to describe the local 3D shape of objects and a Bag-of-Words coding method is used to obtain an object-level vocabulary representation. Multi-class Gaussian Process classification is employed to provide and probable estimation of the identity of the object and serves a key role in the interactive perception cycle – modelling perception confidence. We show results from simulated input data on both SVM and GP based multi-class classifiers to validate the recognition accuracy of our proposed perception model. Our results demonstrate that by using a GP-based classifier, we obtain true positive classification rates of up to 80%. Our semi-autonomous object sorting experiments show that the proposed GP based interactive sorting approach outperforms random sorting by up to 30% when applied to scenes comprising configurations of household objects

    Contextual modulation of primary visual cortex by auditory signals

    Get PDF
    Early visual cortex receives non-feedforward input from lateral and top-down connections (Muckli & Petro 2013 Curr. Opin. Neurobiol. 23, 195–201. (doi:10.1016/j.conb.2013.01.020)), including long-range projections from auditory areas. Early visual cortex can code for high-level auditory information, with neural patterns representing natural sound stimulation (Vetter et al. 2014 Curr. Biol. 24, 1256–1262. (doi:10.1016/j.cub.2014.04.020)). We discuss a number of questions arising from these findings. What is the adaptive function of bimodal representations in visual cortex? What type of information projects from auditory to visual cortex? What are the anatomical constraints of auditory information in V1, for example, periphery versus fovea, superficial versus deep cortical layers? Is there a putative neural mechanism we can infer from human neuroimaging data and recent theoretical accounts of cortex? We also present data showing we can read out high-level auditory information from the activation patterns of early visual cortex even when visual cortex receives simple visual stimulation, suggesting independent channels for visual and auditory signals in V1. We speculate which cellular mechanisms allow V1 to be contextually modulated by auditory input to facilitate perception, cognition and behaviour. Beyond cortical feedback that facilitates perception, we argue that there is also feedback serving counterfactual processing during imagery, dreaming and mind wandering, which is not relevant for immediate perception but for behaviour and cognition over a longer time frame. This article is part of the themed issue ‘Auditory and visual scene analysis’

    Colour appearance descriptors for image browsing and retrieval

    No full text
    In this paper, we focus on the development of whole-scene colour appearance descriptors for classification to be used in browsing applications. The descriptors can classify a whole-scene image into various categories of semantically-based colour appearance. Colour appearance is an important feature and has been extensively used in image-analysis, retrieval and classification. By using pre-existing global CIELAB colour histograms, firstly, we try to develop metrics for wholescene colour appearance: “colour strength”, “high/low lightness” and “multicoloured”. Secondly we propose methods using these metrics either alone or combined to classify whole-scene images into five categories of appearance: strong, pastel, dark, pale and multicoloured. Experiments show positive results and that the global colour histogram is actually useful and can be used for whole-scene colour appearance classification. We have also conducted a small-scale human evaluation test on whole-scene colour appearance. The results show, with suitable threshold settings, the proposed methods can describe the whole-scene colour appearance of images close to human classification. The descriptors were tested on thousands of images from various scenes: paintings, natural scenes, objects, photographs and documents. The colour appearance classifications are being integrated into an image browsing system which allows them to also be used to refine browsing

    ARTSCENE: A Neural System for Natural Scene Classification

    Full text link
    How do humans rapidly recognize a scene? How can neural models capture this biological competence to achieve state-of-the-art scene classification? The ARTSCENE neural system classifies natural scene photographs by using multiple spatial scales to efficiently accumulate evidence for gist and texture. ARTSCENE embodies a coarse-to-fine Texture Size Ranking Principle whereby spatial attention processes multiple scales of scenic information, ranging from global gist to local properties of textures. The model can incrementally learn and predict scene identity by gist information alone and can improve performance through selective attention to scenic textures of progressively smaller size. ARTSCENE discriminates 4 landscape scene categories (coast, forest, mountain and countryside) with up to 91.58% correct on a test set, outperforms alternative models in the literature which use biologically implausible computations, and outperforms component systems that use either gist or texture information alone. Model simulations also show that adjacent textures form higher-order features that are also informative for scene recognition.National Science Foundation (NSF SBE-0354378); Office of Naval Research (N00014-01-1-0624

    Audio Caption: Listen and Tell

    Full text link
    Increasing amount of research has shed light on machine perception of audio events, most of which concerns detection and classification tasks. However, human-like perception of audio scenes involves not only detecting and classifying audio sounds, but also summarizing the relationship between different audio events. Comparable research such as image caption has been conducted, yet the audio field is still quite barren. This paper introduces a manually-annotated dataset for audio caption. The purpose is to automatically generate natural sentences for audio scene description and to bridge the gap between machine perception of audio and image. The whole dataset is labelled in Mandarin and we also include translated English annotations. A baseline encoder-decoder model is provided for both English and Mandarin. Similar BLEU scores are derived for both languages: our model can generate understandable and data-related captions based on the dataset.Comment: accepted by ICASSP201

    Visual pathways from the perspective of cost functions and multi-task deep neural networks

    Get PDF
    Vision research has been shaped by the seminal insight that we can understand the higher-tier visual cortex from the perspective of multiple functional pathways with different goals. In this paper, we try to give a computational account of the functional organization of this system by reasoning from the perspective of multi-task deep neural networks. Machine learning has shown that tasks become easier to solve when they are decomposed into subtasks with their own cost function. We hypothesize that the visual system optimizes multiple cost functions of unrelated tasks and this causes the emergence of a ventral pathway dedicated to vision for perception, and a dorsal pathway dedicated to vision for action. To evaluate the functional organization in multi-task deep neural networks, we propose a method that measures the contribution of a unit towards each task, applying it to two networks that have been trained on either two related or two unrelated tasks, using an identical stimulus set. Results show that the network trained on the unrelated tasks shows a decreasing degree of feature representation sharing towards higher-tier layers while the network trained on related tasks uniformly shows high degree of sharing. We conjecture that the method we propose can be used to analyze the anatomical and functional organization of the visual system and beyond. We predict that the degree to which tasks are related is a good descriptor of the degree to which they share downstream cortical-units.Comment: 16 pages, 5 figure

    Building Machines That Learn and Think Like People

    Get PDF
    Recent progress in artificial intelligence (AI) has renewed interest in building systems that learn and think like people. Many advances have come from using deep neural networks trained end-to-end in tasks such as object recognition, video games, and board games, achieving performance that equals or even beats humans in some respects. Despite their biological inspiration and performance achievements, these systems differ from human intelligence in crucial ways. We review progress in cognitive science suggesting that truly human-like learning and thinking machines will have to reach beyond current engineering trends in both what they learn, and how they learn it. Specifically, we argue that these machines should (a) build causal models of the world that support explanation and understanding, rather than merely solving pattern recognition problems; (b) ground learning in intuitive theories of physics and psychology, to support and enrich the knowledge that is learned; and (c) harness compositionality and learning-to-learn to rapidly acquire and generalize knowledge to new tasks and situations. We suggest concrete challenges and promising routes towards these goals that can combine the strengths of recent neural network advances with more structured cognitive models.Comment: In press at Behavioral and Brain Sciences. Open call for commentary proposals (until Nov. 22, 2016). https://www.cambridge.org/core/journals/behavioral-and-brain-sciences/information/calls-for-commentary/open-calls-for-commentar
    • 

    corecore