6,877 research outputs found
Intelligent visual media processing: when graphics meets vision
The computer graphics and computer vision communities have been working closely together in recent
years, and a variety of algorithms and applications have been developed to analyze and manipulate the visual media
around us. There are three major driving forces behind this phenomenon: i) the availability of big data from the
Internet has created a demand for dealing with the ever increasing, vast amount of resources; ii) powerful processing
tools, such as deep neural networks, provide e�ective ways for learning how to deal with heterogeneous visual data;
iii) new data capture devices, such as the Kinect, bridge between algorithms for 2D image understanding and
3D model analysis. These driving forces have emerged only recently, and we believe that the computer graphics
and computer vision communities are still in the beginning of their honeymoon phase. In this work we survey
recent research on how computer vision techniques bene�t computer graphics techniques and vice versa, and cover
research on analysis, manipulation, synthesis, and interaction. We also discuss existing problems and suggest
possible further research directions
Texture Segregation By Visual Cortex: Perceptual Grouping, Attention, and Learning
A neural model is proposed of how laminar interactions in the visual cortex may learn and recognize object texture and form boundaries. The model brings together five interacting processes: region-based texture classification, contour-based boundary grouping, surface filling-in, spatial attention, and object attention. The model shows how form boundaries can determine regions in which surface filling-in occurs; how surface filling-in interacts with spatial attention to generate a form-fitting distribution of spatial attention, or attentional shroud; how the strongest shroud can inhibit weaker shrouds; and how the winning shroud regulates learning of texture categories, and thus the allocation of object attention. The model can discriminate abutted textures with blurred boundaries and is sensitive to texture boundary attributes like discontinuities in orientation and texture flow curvature as well as to relative orientations of texture elements. The model quantitatively fits a large set of human psychophysical data on orientation-based textures. Object boundar output of the model is compared to computer vision algorithms using a set of human segmented photographic images. The model classifies textures and suppresses noise using a multiple scale oriented filterbank and a distributed Adaptive Resonance Theory (dART) classifier. The matched signal between the bottom-up texture inputs and top-down learned texture categories is utilized by oriented competitive and cooperative grouping processes to generate texture boundaries that control surface filling-in and spatial attention. Topdown modulatory attentional feedback from boundary and surface representations to early filtering stages results in enhanced texture boundaries and more efficient learning of texture within attended surface regions. Surface-based attention also provides a self-supervising training signal for learning new textures. Importance of the surface-based attentional feedback in texture learning and classification is tested using a set of textured images from the Brodatz micro-texture album. Benchmark studies vary from 95.1% to 98.6% with attention, and from 90.6% to 93.2% without attention.Air Force Office of Scientific Research (F49620-01-1-0397, F49620-01-1-0423); National Science Foundation (SBE-0354378); Office of Naval Research (N00014-01-1-0624
Retinal Vessel Segmentation using Tensor Voting
Medical imaging studies generate tremendous amounts of data that are reviewedmanually by physicians every day. Medical image segmentation aims to automate theprocess of extracting (segmenting) “interesting” structures from background structuresin the images, saving physicians time and opening the door to more sophisticatedanalysis such as automatically correlating studies over time. This work focuseson segmenting blood vessels (in particular the retinal vasculature), a task that requiresintegrating both local and global properties of the vasculature to produce goodquality segmentations. We use the Tensor Voting framework as it naturally groupsstructures together based on the consensus of locally voting segments. We investigateseveral ways of encoding the image data as tensors and compare our results quantitativelywith a publically available hand-labeled data set. We demonstrate competitiveperformance versus previously published techniques
Recommended from our members
The role of HG in the analysis of temporal iteration and interaural correlation
Geometry-based shading for shape depiction Enhancement,
Recent works on Non-Photorealistic Rendering (NPR) show that object shape enhancement requires sophisticated effects such as: surface details detection and stylized shading. To date, some rendering techniques have been proposed to overcome this issue, but most of which are limited to correlate shape enhancement functionalities to surface feature variations. Therefore, this problem still persists especially in NPR. This paper is an attempt to address this problem by presenting a new approach for enhancing shape depiction of 3D objects in NPR. We first introduce a tweakable shape descriptor that offers versatile func- tionalities for describing the salient features of 3D objects. Then to enhance the classical shading models, we propose a new technique called Geometry-based Shading. This tech- nique controls reflected lighting intensities based on local geometry. Our approach works without any constraint on the choice of material or illumination. We demonstrate results obtained with Blinn-Phong shading, Gooch shading, and cartoon shading. These results prove that our approach produces more satisfying results compared with the results of pre- vious shape depiction techniques. Finally, our approach runs on modern graphics hardware in real time, which works efficiently with interactive 3D visualization
Deep Learning for Audio Signal Processing
Given the recent surge in developments of deep learning, this article
provides a review of the state-of-the-art deep learning techniques for audio
signal processing. Speech, music, and environmental sound processing are
considered side-by-side, in order to point out similarities and differences
between the domains, highlighting general methods, problems, key references,
and potential for cross-fertilization between areas. The dominant feature
representations (in particular, log-mel spectra and raw waveform) and deep
learning models are reviewed, including convolutional neural networks, variants
of the long short-term memory architecture, as well as more audio-specific
neural network models. Subsequently, prominent deep learning application areas
are covered, i.e. audio recognition (automatic speech recognition, music
information retrieval, environmental sound detection, localization and
tracking) and synthesis and transformation (source separation, audio
enhancement, generative models for speech, sound, and music synthesis).
Finally, key issues and future questions regarding deep learning applied to
audio signal processing are identified.Comment: 15 pages, 2 pdf figure
Attention modulates spatial priority maps in the human occipital, parietal and frontal cortices.
Computational theories propose that attention modulates the topographical landscape of spatial 'priority' maps in regions of the visual cortex so that the location of an important object is associated with higher activation levels. Although studies of single-unit recordings have demonstrated attention-related increases in the gain of neural responses and changes in the size of spatial receptive fields, the net effect of these modulations on the topography of region-level priority maps has not been investigated. Here we used functional magnetic resonance imaging and a multivariate encoding model to reconstruct spatial representations of attended and ignored stimuli using activation patterns across entire visual areas. These reconstructed spatial representations reveal the influence of attention on the amplitude and size of stimulus representations within putative priority maps across the visual hierarchy. Our results suggest that attention increases the amplitude of stimulus representations in these spatial maps, particularly in higher visual areas, but does not substantively change their size
Accurate and discernible photocollages
There currently exist several techniques for selecting and combining images from a digital image library into a single image so that the result meets certain prespecified visual criteria. Image mosaic methods, first explored by Connors and Trivedi[18], arrange library images according to some tiling arrangement, often a regular grid, so that the combination of images, when viewed as a whole, resembles some input target image. Other techniques, such as Autocollage of Rother et al.[78], seek only to combine images in an interesting and visually pleasing manner, according to certain composition principles, without attempting to approximate any target image. Each of these techniques provide a myriad of creative options for artists who wish to combine several levels of meaning into a single image or who wish to exploit the meaning and symbolism contained in each of a large set of images through an efficient and easy process. We first examine the most notable and successful of these methods, and summarize the advantages and limitations of each. We then formulate a set of goals for an image collage system that combines the advantages of these methods while addressing and mitigating the drawbacks. Particularly, we propose a system for creating photocollages that approximate a target image as an aggregation of smaller images, chosen from a large library, so that interesting visual correspondences between images are exploited. In this way, we allow users to create collages in which multiple layers of meaning are encoded, with meaningful visual links between each layer. In service of this goal, we ensure that the images used are as large as possible and are combined in such a way that boundaries between images are not immediately apparent, as in Autocollage. This has required us to apply a multiscale approach to searching and comparing images from a large database, which achieves both speed and accuracy. We also propose a new framework for color post-processing, and propose novel techniques for decomposing images according to object and texture information
The role of multisensory integration in the bottom-up and top-down control of attentional object selection
Selective spatial attention and multisensory integration have been traditionally considered as
separate domains in psychology and cognitive neuroscience. However, theoretical and
methodological advancements in the last two decades have paved the way for studying
different types of interactions between spatial attention and multisensory integration. In the
present thesis, two types of such interactions are investigated.
In the first part of the thesis, the role of audiovisual synchrony as a source of
bottom-up bias in visual selection was investigated. In six out of seven experiments, a
variant of the spatial cueing paradigm was used to compare attentional capture by visual and
audiovisual distractors. In another experiment, single-frame search arrays were presented to
investigate whether multisensory integration can bias spatial selection via salience-based
mechanisms. Behavioural and electrophysiological results demonstrated that the ability of
visual objects to capture attention was enhanced when they were accompanied by noninformative
auditory signals. They also showed evidence for the bottom-up nature of these
audiovisual enhancements of attentional capture by revealing that these enhancements
occurred irrespective of the task-relevance of visual objects.
In the second part of this thesis, four experiments are reported that investigated the
spatial selection of audiovisual relative to visual objects and the guidance of their selection
by bimodal object templates. Behavioural and ERP results demonstrated that the ability of
task-irrelevant target-matching visual objects to capture attention was reduced during search
for audiovisual as compared to purely visual targets, suggesting that bimodal search is
guided by integrated audiovisual templates. However, the observation that unimodal targetmatching
visual events retained some ability to capture attention indicates that bimodal
search is controlled to some extent by modality-specific representations of task-relevant
information.
In summary, the present thesis has contributed to our knowledge of how attention is
controlled in real-life environments by demonstrating that spatial selective attention can be
biased towards bimodal objects via salience-driven as well as goal-based mechanisms
- …