57,438 research outputs found

    Deep Learning for Environmentally Robust Speech Recognition: An Overview of Recent Developments

    Get PDF
    Eliminating the negative effect of non-stationary environmental noise is a long-standing research topic for automatic speech recognition that stills remains an important challenge. Data-driven supervised approaches, including ones based on deep neural networks, have recently emerged as potential alternatives to traditional unsupervised approaches and with sufficient training, can alleviate the shortcomings of the unsupervised methods in various real-life acoustic environments. In this light, we review recently developed, representative deep learning approaches for tackling non-stationary additive and convolutional degradation of speech with the aim of providing guidelines for those involved in the development of environmentally robust speech recognition systems. We separately discuss single- and multi-channel techniques developed for the front-end and back-end of speech recognition systems, as well as joint front-end and back-end training frameworks

    Methodology to assess safety effects of future Intelligent Transport Systems on railway level crossings

    Get PDF
    There is consistent evidence showing that driver behaviour contributes to crashes and near miss incidents at railway level crossings (RLXs). The development of emerging Vehicle-to-Vehicle and Vehicle-to-Infrastructure technologies is a highly promising approach to improve RLX safety. To date, research has not evaluated comprehensively the potential effects of such technologies on driving behaviour at RLXs. This paper presents an on-going research programme assessing the impacts of such new technologies on human factors and drivers’ situational awareness at RLX. Additionally, requirements for the design of such promising technologies and ways to display safety information to drivers were systematically reviewed. Finally, a methodology which comprehensively assesses the effects of in-vehicle and road-based interventions warning the driver of incoming trains at RLXs is discussed, with a focus on both benefits and potential negative behavioural adaptations. The methodology is designed for implementation in a driving simulator and covers compliance, control of the vehicle, distraction, mental workload and drivers’ acceptance. This study has the potential to provide a broad understanding of the effects of deploying new in-vehicle and road-based technologies at RLXs and hence inform policy makers on safety improvements planning for RLX

    Deep Learning for Audio Signal Processing

    Full text link
    Given the recent surge in developments of deep learning, this article provides a review of the state-of-the-art deep learning techniques for audio signal processing. Speech, music, and environmental sound processing are considered side-by-side, in order to point out similarities and differences between the domains, highlighting general methods, problems, key references, and potential for cross-fertilization between areas. The dominant feature representations (in particular, log-mel spectra and raw waveform) and deep learning models are reviewed, including convolutional neural networks, variants of the long short-term memory architecture, as well as more audio-specific neural network models. Subsequently, prominent deep learning application areas are covered, i.e. audio recognition (automatic speech recognition, music information retrieval, environmental sound detection, localization and tracking) and synthesis and transformation (source separation, audio enhancement, generative models for speech, sound, and music synthesis). Finally, key issues and future questions regarding deep learning applied to audio signal processing are identified.Comment: 15 pages, 2 pdf figure

    Unsupervised decoding of long-term, naturalistic human neural recordings with automated video and audio annotations

    Get PDF
    Fully automated decoding of human activities and intentions from direct neural recordings is a tantalizing challenge in brain-computer interfacing. Most ongoing efforts have focused on training decoders on specific, stereotyped tasks in laboratory settings. Implementing brain-computer interfaces (BCIs) in natural settings requires adaptive strategies and scalable algorithms that require minimal supervision. Here we propose an unsupervised approach to decoding neural states from human brain recordings acquired in a naturalistic context. We demonstrate our approach on continuous long-term electrocorticographic (ECoG) data recorded over many days from the brain surface of subjects in a hospital room, with simultaneous audio and video recordings. We first discovered clusters in high-dimensional ECoG recordings and then annotated coherent clusters using speech and movement labels extracted automatically from audio and video recordings. To our knowledge, this represents the first time techniques from computer vision and speech processing have been used for natural ECoG decoding. Our results show that our unsupervised approach can discover distinct behaviors from ECoG data, including moving, speaking and resting. We verify the accuracy of our approach by comparing to manual annotations. Projecting the discovered cluster centers back onto the brain, this technique opens the door to automated functional brain mapping in natural settings

    The emotional contents of the ‘space’ in spatial music

    Get PDF
    Human spatial perception is how we understand places. Beyond understanding what is where (William James’ formulation of the psychological approach to perception); there are holistic qualities to places. We perceive places as busy, crowded, exciting, threatening or peaceful, calm, comfortable and so on. Designers of places spend a great deal of time and effort on these qualities; scientists rarely do. In the scientific world-view physical qualities and our emotive responses to them are neatly divided in the objective-subjective dichotomy. In this context, music has traditionally constituted an item in a place. Over the last two decades, development of “spatial music” has been within the prevailing engineering paradigm, informed by psychophysical data; here, space is an abstract, Euclidean 3-dimensional ‘container’ for events. The emotional consequence of spatial arrangements is not the main focus in this approach. This paper argues that a paradigm shift is appropriate, from ‘music-in-a-place’ to ‘music-as-a-place’ requiring a fundamental philosophical realignment of ‘meaning’ away from subjective response to include consequences-in-the-environment. Hence the hegemony of the subjective-objective dichotomy is questioned. There are precedents for this, for example in the ecological approach to perception (Gibson). An ecological approach to music-as-environment intrinsically treats the emotional consequences of spatio-musical arrangement holistically. A simplified taxonomy of the attributes of artificial spatial sound in this context will be discussed
    • 

    corecore