57,438 research outputs found
Deep Learning for Environmentally Robust Speech Recognition: An Overview of Recent Developments
Eliminating the negative effect of non-stationary environmental noise is a
long-standing research topic for automatic speech recognition that stills
remains an important challenge. Data-driven supervised approaches, including
ones based on deep neural networks, have recently emerged as potential
alternatives to traditional unsupervised approaches and with sufficient
training, can alleviate the shortcomings of the unsupervised methods in various
real-life acoustic environments. In this light, we review recently developed,
representative deep learning approaches for tackling non-stationary additive
and convolutional degradation of speech with the aim of providing guidelines
for those involved in the development of environmentally robust speech
recognition systems. We separately discuss single- and multi-channel techniques
developed for the front-end and back-end of speech recognition systems, as well
as joint front-end and back-end training frameworks
Methodology to assess safety effects of future Intelligent Transport Systems on railway level crossings
There is consistent evidence showing that driver behaviour contributes to crashes and near miss incidents at railway level crossings (RLXs). The development of emerging Vehicle-to-Vehicle and Vehicle-to-Infrastructure technologies is a highly promising approach to improve RLX safety. To date, research has not evaluated comprehensively the potential effects of such technologies on driving behaviour at RLXs. This paper presents an on-going research programme assessing the impacts of such new technologies on human factors and driversâ situational awareness at RLX. Additionally, requirements for the design of such promising technologies and ways to display safety information to drivers were systematically reviewed. Finally, a methodology which comprehensively assesses the effects of in-vehicle and road-based interventions warning the driver of incoming trains at RLXs is discussed, with a focus on both benefits and potential negative behavioural adaptations. The methodology is designed for implementation in a driving simulator and covers compliance, control of the vehicle, distraction, mental workload and driversâ acceptance. This study has the potential to provide a broad understanding of the effects of deploying new in-vehicle and road-based technologies at RLXs and hence inform policy makers on safety improvements planning for RLX
Deep Learning for Audio Signal Processing
Given the recent surge in developments of deep learning, this article
provides a review of the state-of-the-art deep learning techniques for audio
signal processing. Speech, music, and environmental sound processing are
considered side-by-side, in order to point out similarities and differences
between the domains, highlighting general methods, problems, key references,
and potential for cross-fertilization between areas. The dominant feature
representations (in particular, log-mel spectra and raw waveform) and deep
learning models are reviewed, including convolutional neural networks, variants
of the long short-term memory architecture, as well as more audio-specific
neural network models. Subsequently, prominent deep learning application areas
are covered, i.e. audio recognition (automatic speech recognition, music
information retrieval, environmental sound detection, localization and
tracking) and synthesis and transformation (source separation, audio
enhancement, generative models for speech, sound, and music synthesis).
Finally, key issues and future questions regarding deep learning applied to
audio signal processing are identified.Comment: 15 pages, 2 pdf figure
Unsupervised decoding of long-term, naturalistic human neural recordings with automated video and audio annotations
Fully automated decoding of human activities and intentions from direct
neural recordings is a tantalizing challenge in brain-computer interfacing.
Most ongoing efforts have focused on training decoders on specific, stereotyped
tasks in laboratory settings. Implementing brain-computer interfaces (BCIs) in
natural settings requires adaptive strategies and scalable algorithms that
require minimal supervision. Here we propose an unsupervised approach to
decoding neural states from human brain recordings acquired in a naturalistic
context. We demonstrate our approach on continuous long-term
electrocorticographic (ECoG) data recorded over many days from the brain
surface of subjects in a hospital room, with simultaneous audio and video
recordings. We first discovered clusters in high-dimensional ECoG recordings
and then annotated coherent clusters using speech and movement labels extracted
automatically from audio and video recordings. To our knowledge, this
represents the first time techniques from computer vision and speech processing
have been used for natural ECoG decoding. Our results show that our
unsupervised approach can discover distinct behaviors from ECoG data, including
moving, speaking and resting. We verify the accuracy of our approach by
comparing to manual annotations. Projecting the discovered cluster centers back
onto the brain, this technique opens the door to automated functional brain
mapping in natural settings
The emotional contents of the âspaceâ in spatial music
Human spatial perception is how we understand places. Beyond understanding what is where (William Jamesâ formulation of the psychological approach to perception); there are holistic qualities to places. We perceive places as busy, crowded, exciting, threatening or peaceful, calm, comfortable and so on. Designers of places spend a great deal of time and effort on these qualities; scientists rarely do. In the scientific world-view physical qualities and our emotive responses to them are neatly divided in the objective-subjective dichotomy. In this context, music has traditionally constituted an item in a place. Over the last two decades, development of âspatial musicâ has been within the prevailing engineering paradigm, informed by psychophysical data; here, space is an abstract, Euclidean 3-dimensional âcontainerâ for events. The emotional consequence of spatial arrangements is not the main focus in this approach. This paper argues that a paradigm shift is appropriate, from âmusic-in-a-placeâ to âmusic-as-a-placeâ requiring a fundamental philosophical realignment of âmeaningâ away from subjective response to include consequences-in-the-environment. Hence the hegemony of the subjective-objective dichotomy is questioned. There are precedents for this, for example in the ecological approach to perception (Gibson). An ecological approach to music-as-environment intrinsically treats the emotional consequences of spatio-musical arrangement holistically. A simplified taxonomy of the attributes of artificial spatial sound in this context will be discussed
- âŠ