2,023 research outputs found

    Deep Learning for Audio Signal Processing

    Full text link
    Given the recent surge in developments of deep learning, this article provides a review of the state-of-the-art deep learning techniques for audio signal processing. Speech, music, and environmental sound processing are considered side-by-side, in order to point out similarities and differences between the domains, highlighting general methods, problems, key references, and potential for cross-fertilization between areas. The dominant feature representations (in particular, log-mel spectra and raw waveform) and deep learning models are reviewed, including convolutional neural networks, variants of the long short-term memory architecture, as well as more audio-specific neural network models. Subsequently, prominent deep learning application areas are covered, i.e. audio recognition (automatic speech recognition, music information retrieval, environmental sound detection, localization and tracking) and synthesis and transformation (source separation, audio enhancement, generative models for speech, sound, and music synthesis). Finally, key issues and future questions regarding deep learning applied to audio signal processing are identified.Comment: 15 pages, 2 pdf figure

    Cortical Dynamics of Contextually-Cued Attentive Visual Learning and Search: Spatial and Object Evidence Accumulation

    Full text link
    How do humans use predictive contextual information to facilitate visual search? How are consistently paired scenic objects and positions learned and used to more efficiently guide search in familiar scenes? For example, a certain combination of objects can define a context for a kitchen and trigger a more efficient search for a typical object, such as a sink, in that context. A neural model, ARTSCENE Search, is developed to illustrate the neural mechanisms of such memory-based contextual learning and guidance, and to explain challenging behavioral data on positive/negative, spatial/object, and local/distant global cueing effects during visual search. The model proposes how global scene layout at a first glance rapidly forms a hypothesis about the target location. This hypothesis is then incrementally refined by enhancing target-like objects in space as a scene is scanned with saccadic eye movements. The model clarifies the functional roles of neuroanatomical, neurophysiological, and neuroimaging data in visual search for a desired goal object. In particular, the model simulates the interactive dynamics of spatial and object contextual cueing in the cortical What and Where streams starting from early visual areas through medial temporal lobe to prefrontal cortex. After learning, model dorsolateral prefrontal cortical cells (area 46) prime possible target locations in posterior parietal cortex based on goalmodulated percepts of spatial scene gist represented in parahippocampal cortex, whereas model ventral prefrontal cortical cells (area 47/12) prime possible target object representations in inferior temporal cortex based on the history of viewed objects represented in perirhinal cortex. The model hereby predicts how the cortical What and Where streams cooperate during scene perception, learning, and memory to accumulate evidence over time to drive efficient visual search of familiar scenes.CELEST, an NSF Science of Learning Center (SBE-0354378); SyNAPSE program of Defense Advanced Research Projects Agency (HR0011-09-3-0001, HR0011-09-C-0011

    The Cat Is On the Mat. Or Is It a Dog? Dynamic Competition in Perceptual Decision Making

    Get PDF
    Recent neurobiological findings suggest that the brain solves simple perceptual decision-making tasks by means of a dynamic competition in which evidence is accumulated in favor of the alternatives. However, it is unclear if and how the same process applies in more complex, real-world tasks, such as the categorization of ambiguous visual scenes and what elements are considered as evidence in this case. Furthermore, dynamic decision models typically consider evidence accumulation as a passive process disregarding the role of active perception strategies. In this paper, we adopt the principles of dynamic competition and active vision for the realization of a biologically- motivated computational model, which we test in a visual catego- rization task. Moreover, our system uses predictive power of the features as the main dimension for both evidence accumulation and the guidance of active vision. Comparison of human and synthetic data in a common experimental setup suggests that the proposed model captures essential aspects of how the brain solves perceptual ambiguities in time. Our results point to the importance of the proposed principles of dynamic competi- tion, parallel specification, and selection of multiple alternatives through prediction, as well as active guidance of perceptual strategies for perceptual decision-making and the resolution of perceptual ambiguities. These principles could apply to both the simple perceptual decision problems studied in neuroscience and the more complex ones addressed by vision research.Peer reviewe

    Modelling multimodal language processing

    Get PDF
    corecore