1,853 research outputs found

    Microphone array signal processing for robot audition

    Get PDF
    Robot audition for humanoid robots interacting naturally with humans in an unconstrained real-world environment is a hitherto unsolved challenge. The recorded microphone signals are usually distorted by background and interfering noise sources (speakers) as well as room reverberation. In addition, the movements of a robot and its actuators cause ego-noise which degrades the recorded signals significantly. The movement of the robot body and its head also complicates the detection and tracking of the desired, possibly moving, sound sources of interest. This paper presents an overview of the concepts in microphone array processing for robot audition and some recent achievements

    The selective use of gaze in automatic speech recognition

    Get PDF
    The performance of automatic speech recognition (ASR) degrades significantly in natural environments compared to in laboratory assessments. Being a major source of interference, acoustic noise affects speech intelligibility during the ASR process. There are two main problems caused by the acoustic noise. The first is the speech signal contamination. The second is the speakers' vocal and non-vocal behavioural changes. These phenomena elicit mismatch between the ASR training and recognition conditions, which leads to considerable performance degradation. To improve noise-robustness, exploiting prior knowledge of the acoustic noise in speech enhancement, feature extraction and recognition models are popular approaches. An alternative approach presented in this thesis is to introduce eye gaze as an extra modality. Eye gaze behaviours have roles in interaction and contain information about cognition and visual attention; not all behaviours are relevant to speech. Therefore, gaze behaviours are used selectively to improve ASR performance. This is achieved by inference procedures using noise-dependant models of gaze behaviours and their temporal and semantic relationship with speech. `Selective gaze-contingent ASR' systems are proposed and evaluated on a corpus of eye movement and related speech in different clean, noisy environments. The best performing systems utilise both acoustic and language model adaptation

    Assistive listening headsets for high noise environments: Protection and communication

    Get PDF
    © 2015 IEEE. In industrial noise environments, the use of assistive listening headsets is a means to provide adequate access to voice communication while wearing hearing protection. This paper presents a performance evaluation and comparison of two different methods to provide the binaural speech enhancement in real industrial noise scenarios. The investigated binaural methods based on differential beamforming and multichannel Wiener filter show different strengths and weaknesses. A transient noise suppression algorithm is also proposed and evaluated. Performance evaluation shows that this algorithm, together with the binaural multi-channel Wiener filter approach, can successfully reduce the hammering noise. This can be observed from the PESQ scores and the signal characteristics

    A Study into Speech Enhancement Techniques in Adverse Environment

    Get PDF
    This dissertation developed speech enhancement techniques that improve the speech quality in applications such as mobile communications, teleconferencing and smart loudspeakers. For these applications it is necessary to suppress noise and reverberation. Thus the contribution in this dissertation is twofold: single channel speech enhancement system which exploits the temporal and spectral diversity of the received microphone signal for noise suppression and multi-channel speech enhancement method with the ability to employ spatial diversity to reduce reverberation

    Speech enhancement in binaural hearing protection devices

    Get PDF
    The capability of people to operate safely and effective under extreme noise conditions is dependent on their accesses to adequate voice communication while using hearing protection. This thesis develops speech enhancement algorithms that can be implemented in binaural hearing protection devices to improve communication and situation awareness in the workplace. The developed algorithms which emphasize low computational complexity, come with the capability to suppress noise while enhancing speech

    Workplace Health and Its Impact on Human Capital: Seven Key Performance Indicators of Workplace Health

    Get PDF
    Health, a state of complete physical, mental and social well-being according to the World Health Organization, is a critical issue in the workplace as it is directly related to human capital, the most important and expensive asset of an organization. When it comes to workplace health, there are seven key performance indicators to consider. These include physical fitness, physical comfort, physical nourishment, cognitive well-being, social well-being, emotional well-being, and environmental well-being. Various environmental attributes in these seven KPIs in the workplace affect not only health but also performance and engagement of employees via their physical, mental, and social interactions within the environment. For instance, ergonomics, acoustics, lighting, thermal comfort, and olfactory comfort address the overall physical comfort while biophilic components contribute to employee cognitive functions as well as their capacity to cope with mental stress and fatigue. These seven KPIs of workplace health ultimately contribute to five positive organizational outcomes, including healthy organizational culture, higher productivity, improved individual health and safety, financial savings, and enhanced reputation of the organization. This chapter discusses critical health factors in the workplace and their contributions to the capacity of human capital at the individual as well as organizational levels

    Bio-motivated features and deep learning for robust speech recognition

    Get PDF
    Mención Internacional en el título de doctorIn spite of the enormous leap forward that the Automatic Speech Recognition (ASR) technologies has experienced over the last five years their performance under hard environmental condition is still far from that of humans preventing their adoption in several real applications. In this thesis the challenge of robustness of modern automatic speech recognition systems is addressed following two main research lines. The first one focuses on modeling the human auditory system to improve the robustness of the feature extraction stage yielding to novel auditory motivated features. Two main contributions are produced. On the one hand, a model of the masking behaviour of the Human Auditory System (HAS) is introduced, based on the non-linear filtering of a speech spectro-temporal representation applied simultaneously to both frequency and time domains. This filtering is accomplished by using image processing techniques, in particular mathematical morphology operations with an specifically designed Structuring Element (SE) that closely resembles the masking phenomena that take place in the cochlea. On the other hand, the temporal patterns of auditory-nerve firings are modeled. Most conventional acoustic features are based on short-time energy per frequency band discarding the information contained in the temporal patterns. Our contribution is the design of several types of feature extraction schemes based on the synchrony effect of auditory-nerve activity, showing that the modeling of this effect can indeed improve speech recognition accuracy in the presence of additive noise. Both models are further integrated into the well known Power Normalized Cepstral Coefficients (PNCC). The second research line addresses the problem of robustness in noisy environments by means of the use of Deep Neural Networks (DNNs)-based acoustic modeling and, in particular, of Convolutional Neural Networks (CNNs) architectures. A deep residual network scheme is proposed and adapted for our purposes, allowing Residual Networks (ResNets), originally intended for image processing tasks, to be used in speech recognition where the network input is small in comparison with usual image dimensions. We have observed that ResNets on their own already enhance the robustness of the whole system against noisy conditions. Moreover, our experiments demonstrate that their combination with the auditory motivated features devised in this thesis provide significant improvements in recognition accuracy in comparison to other state-of-the-art CNN-based ASR systems under mismatched conditions, while maintaining the performance in matched scenarios. The proposed methods have been thoroughly tested and compared with other state-of-the-art proposals for a variety of datasets and conditions. The obtained results prove that our methods outperform other state-of-the-art approaches and reveal that they are suitable for practical applications, specially where the operating conditions are unknown.El objetivo de esta tesis se centra en proponer soluciones al problema del reconocimiento de habla robusto; por ello, se han llevado a cabo dos líneas de investigación. En la primera líınea se han propuesto esquemas de extracción de características novedosos, basados en el modelado del comportamiento del sistema auditivo humano, modelando especialmente los fenómenos de enmascaramiento y sincronía. En la segunda, se propone mejorar las tasas de reconocimiento mediante el uso de técnicas de aprendizaje profundo, en conjunto con las características propuestas. Los métodos propuestos tienen como principal objetivo, mejorar la precisión del sistema de reconocimiento cuando las condiciones de operación no son conocidas, aunque el caso contrario también ha sido abordado. En concreto, nuestras principales propuestas son los siguientes: Simular el sistema auditivo humano con el objetivo de mejorar la tasa de reconocimiento en condiciones difíciles, principalmente en situaciones de alto ruido, proponiendo esquemas de extracción de características novedosos. Siguiendo esta dirección, nuestras principales propuestas se detallan a continuación: • Modelar el comportamiento de enmascaramiento del sistema auditivo humano, usando técnicas del procesado de imagen sobre el espectro, en concreto, llevando a cabo el diseño de un filtro morfológico que captura este efecto. • Modelar el efecto de la sincroní que tiene lugar en el nervio auditivo. • La integración de ambos modelos en los conocidos Power Normalized Cepstral Coefficients (PNCC). La aplicación de técnicas de aprendizaje profundo con el objetivo de hacer el sistema más robusto frente al ruido, en particular con el uso de redes neuronales convolucionales profundas, como pueden ser las redes residuales. Por último, la aplicación de las características propuestas en combinación con las redes neuronales profundas, con el objetivo principal de obtener mejoras significativas, cuando las condiciones de entrenamiento y test no coinciden.Programa Oficial de Doctorado en Multimedia y ComunicacionesPresidente: Javier Ferreiros López.- Secretario: Fernando Díaz de María.- Vocal: Rubén Solera Ureñ

    A survey on artificial intelligence-based acoustic source identification

    Get PDF
    The concept of Acoustic Source Identification (ASI), which refers to the process of identifying noise sources has attracted increasing attention in recent years. The ASI technology can be used for surveillance, monitoring, and maintenance applications in a wide range of sectors, such as defence, manufacturing, healthcare, and agriculture. Acoustic signature analysis and pattern recognition remain the core technologies for noise source identification. Manual identification of acoustic signatures, however, has become increasingly challenging as dataset sizes grow. As a result, the use of Artificial Intelligence (AI) techniques for identifying noise sources has become increasingly relevant and useful. In this paper, we provide a comprehensive review of AI-based acoustic source identification techniques. We analyze the strengths and weaknesses of AI-based ASI processes and associated methods proposed by researchers in the literature. Additionally, we did a detailed survey of ASI applications in machinery, underwater applications, environment/event source recognition, healthcare, and other fields. We also highlight relevant research directions
    • …
    corecore