6,877 research outputs found

    Intelligent visual media processing: when graphics meets vision

    Get PDF
    The computer graphics and computer vision communities have been working closely together in recent years, and a variety of algorithms and applications have been developed to analyze and manipulate the visual media around us. There are three major driving forces behind this phenomenon: i) the availability of big data from the Internet has created a demand for dealing with the ever increasing, vast amount of resources; ii) powerful processing tools, such as deep neural networks, provide e�ective ways for learning how to deal with heterogeneous visual data; iii) new data capture devices, such as the Kinect, bridge between algorithms for 2D image understanding and 3D model analysis. These driving forces have emerged only recently, and we believe that the computer graphics and computer vision communities are still in the beginning of their honeymoon phase. In this work we survey recent research on how computer vision techniques bene�t computer graphics techniques and vice versa, and cover research on analysis, manipulation, synthesis, and interaction. We also discuss existing problems and suggest possible further research directions

    Texture Segregation By Visual Cortex: Perceptual Grouping, Attention, and Learning

    Get PDF
    A neural model is proposed of how laminar interactions in the visual cortex may learn and recognize object texture and form boundaries. The model brings together five interacting processes: region-based texture classification, contour-based boundary grouping, surface filling-in, spatial attention, and object attention. The model shows how form boundaries can determine regions in which surface filling-in occurs; how surface filling-in interacts with spatial attention to generate a form-fitting distribution of spatial attention, or attentional shroud; how the strongest shroud can inhibit weaker shrouds; and how the winning shroud regulates learning of texture categories, and thus the allocation of object attention. The model can discriminate abutted textures with blurred boundaries and is sensitive to texture boundary attributes like discontinuities in orientation and texture flow curvature as well as to relative orientations of texture elements. The model quantitatively fits a large set of human psychophysical data on orientation-based textures. Object boundar output of the model is compared to computer vision algorithms using a set of human segmented photographic images. The model classifies textures and suppresses noise using a multiple scale oriented filterbank and a distributed Adaptive Resonance Theory (dART) classifier. The matched signal between the bottom-up texture inputs and top-down learned texture categories is utilized by oriented competitive and cooperative grouping processes to generate texture boundaries that control surface filling-in and spatial attention. Topdown modulatory attentional feedback from boundary and surface representations to early filtering stages results in enhanced texture boundaries and more efficient learning of texture within attended surface regions. Surface-based attention also provides a self-supervising training signal for learning new textures. Importance of the surface-based attentional feedback in texture learning and classification is tested using a set of textured images from the Brodatz micro-texture album. Benchmark studies vary from 95.1% to 98.6% with attention, and from 90.6% to 93.2% without attention.Air Force Office of Scientific Research (F49620-01-1-0397, F49620-01-1-0423); National Science Foundation (SBE-0354378); Office of Naval Research (N00014-01-1-0624

    Retinal Vessel Segmentation using Tensor Voting

    Get PDF
    Medical imaging studies generate tremendous amounts of data that are reviewedmanually by physicians every day. Medical image segmentation aims to automate theprocess of extracting (segmenting) “interesting” structures from background structuresin the images, saving physicians time and opening the door to more sophisticatedanalysis such as automatically correlating studies over time. This work focuseson segmenting blood vessels (in particular the retinal vasculature), a task that requiresintegrating both local and global properties of the vasculature to produce goodquality segmentations. We use the Tensor Voting framework as it naturally groupsstructures together based on the consensus of locally voting segments. We investigateseveral ways of encoding the image data as tensors and compare our results quantitativelywith a publically available hand-labeled data set. We demonstrate competitiveperformance versus previously published techniques

    Geometry-based shading for shape depiction Enhancement,

    Get PDF
    Recent works on Non-Photorealistic Rendering (NPR) show that object shape enhancement requires sophisticated effects such as: surface details detection and stylized shading. To date, some rendering techniques have been proposed to overcome this issue, but most of which are limited to correlate shape enhancement functionalities to surface feature variations. Therefore, this problem still persists especially in NPR. This paper is an attempt to address this problem by presenting a new approach for enhancing shape depiction of 3D objects in NPR. We first introduce a tweakable shape descriptor that offers versatile func- tionalities for describing the salient features of 3D objects. Then to enhance the classical shading models, we propose a new technique called Geometry-based Shading. This tech- nique controls reflected lighting intensities based on local geometry. Our approach works without any constraint on the choice of material or illumination. We demonstrate results obtained with Blinn-Phong shading, Gooch shading, and cartoon shading. These results prove that our approach produces more satisfying results compared with the results of pre- vious shape depiction techniques. Finally, our approach runs on modern graphics hardware in real time, which works efficiently with interactive 3D visualization

    Deep Learning for Audio Signal Processing

    Full text link
    Given the recent surge in developments of deep learning, this article provides a review of the state-of-the-art deep learning techniques for audio signal processing. Speech, music, and environmental sound processing are considered side-by-side, in order to point out similarities and differences between the domains, highlighting general methods, problems, key references, and potential for cross-fertilization between areas. The dominant feature representations (in particular, log-mel spectra and raw waveform) and deep learning models are reviewed, including convolutional neural networks, variants of the long short-term memory architecture, as well as more audio-specific neural network models. Subsequently, prominent deep learning application areas are covered, i.e. audio recognition (automatic speech recognition, music information retrieval, environmental sound detection, localization and tracking) and synthesis and transformation (source separation, audio enhancement, generative models for speech, sound, and music synthesis). Finally, key issues and future questions regarding deep learning applied to audio signal processing are identified.Comment: 15 pages, 2 pdf figure

    Attention modulates spatial priority maps in the human occipital, parietal and frontal cortices.

    Get PDF
    Computational theories propose that attention modulates the topographical landscape of spatial 'priority' maps in regions of the visual cortex so that the location of an important object is associated with higher activation levels. Although studies of single-unit recordings have demonstrated attention-related increases in the gain of neural responses and changes in the size of spatial receptive fields, the net effect of these modulations on the topography of region-level priority maps has not been investigated. Here we used functional magnetic resonance imaging and a multivariate encoding model to reconstruct spatial representations of attended and ignored stimuli using activation patterns across entire visual areas. These reconstructed spatial representations reveal the influence of attention on the amplitude and size of stimulus representations within putative priority maps across the visual hierarchy. Our results suggest that attention increases the amplitude of stimulus representations in these spatial maps, particularly in higher visual areas, but does not substantively change their size

    Accurate and discernible photocollages

    Get PDF
    There currently exist several techniques for selecting and combining images from a digital image library into a single image so that the result meets certain prespecified visual criteria. Image mosaic methods, first explored by Connors and Trivedi[18], arrange library images according to some tiling arrangement, often a regular grid, so that the combination of images, when viewed as a whole, resembles some input target image. Other techniques, such as Autocollage of Rother et al.[78], seek only to combine images in an interesting and visually pleasing manner, according to certain composition principles, without attempting to approximate any target image. Each of these techniques provide a myriad of creative options for artists who wish to combine several levels of meaning into a single image or who wish to exploit the meaning and symbolism contained in each of a large set of images through an efficient and easy process. We first examine the most notable and successful of these methods, and summarize the advantages and limitations of each. We then formulate a set of goals for an image collage system that combines the advantages of these methods while addressing and mitigating the drawbacks. Particularly, we propose a system for creating photocollages that approximate a target image as an aggregation of smaller images, chosen from a large library, so that interesting visual correspondences between images are exploited. In this way, we allow users to create collages in which multiple layers of meaning are encoded, with meaningful visual links between each layer. In service of this goal, we ensure that the images used are as large as possible and are combined in such a way that boundaries between images are not immediately apparent, as in Autocollage. This has required us to apply a multiscale approach to searching and comparing images from a large database, which achieves both speed and accuracy. We also propose a new framework for color post-processing, and propose novel techniques for decomposing images according to object and texture information

    The role of multisensory integration in the bottom-up and top-down control of attentional object selection

    Get PDF
    Selective spatial attention and multisensory integration have been traditionally considered as separate domains in psychology and cognitive neuroscience. However, theoretical and methodological advancements in the last two decades have paved the way for studying different types of interactions between spatial attention and multisensory integration. In the present thesis, two types of such interactions are investigated. In the first part of the thesis, the role of audiovisual synchrony as a source of bottom-up bias in visual selection was investigated. In six out of seven experiments, a variant of the spatial cueing paradigm was used to compare attentional capture by visual and audiovisual distractors. In another experiment, single-frame search arrays were presented to investigate whether multisensory integration can bias spatial selection via salience-based mechanisms. Behavioural and electrophysiological results demonstrated that the ability of visual objects to capture attention was enhanced when they were accompanied by noninformative auditory signals. They also showed evidence for the bottom-up nature of these audiovisual enhancements of attentional capture by revealing that these enhancements occurred irrespective of the task-relevance of visual objects. In the second part of this thesis, four experiments are reported that investigated the spatial selection of audiovisual relative to visual objects and the guidance of their selection by bimodal object templates. Behavioural and ERP results demonstrated that the ability of task-irrelevant target-matching visual objects to capture attention was reduced during search for audiovisual as compared to purely visual targets, suggesting that bimodal search is guided by integrated audiovisual templates. However, the observation that unimodal targetmatching visual events retained some ability to capture attention indicates that bimodal search is controlled to some extent by modality-specific representations of task-relevant information. In summary, the present thesis has contributed to our knowledge of how attention is controlled in real-life environments by demonstrating that spatial selective attention can be biased towards bimodal objects via salience-driven as well as goal-based mechanisms
    corecore