13 research outputs found

    Deep Audio Zooming: Beamwidth-Controllable Neural Beamformer

    Full text link
    Audio zooming, a signal processing technique, enables selective focusing and enhancement of sound signals from a specified region, attenuating others. While traditional beamforming and neural beamforming techniques, centered on creating a directional array, necessitate the designation of a singular target direction, they often overlook the concept of a field of view (FOV), that defines an angular area. In this paper, we proposed a simple yet effective FOV feature, amalgamating all directional attributes within the user-defined field. In conjunction, we've introduced a counter FOV feature capturing directional aspects outside the desired field. Such advancements ensure refined sound capture, particularly emphasizing the FOV's boundaries, and guarantee the enhanced capture of all desired sound sources inside the user-defined field. The results from the experiment demonstrate the efficacy of the introduced angular FOV feature and its seamless incorporation into a low-power subband model suited for real-time applica?tions.Comment: 6 pages, 5 figure

    Adaptive Algorithms for Intelligent Acoustic Interfaces

    Get PDF
    Modern speech communications are evolving towards a new direction which involves users in a more perceptive way. That is the immersive experience, which may be considered as the “last-mile” problem of telecommunications. One of the main feature of immersive communications is the distant-talking, i.e. the hands-free (in the broad sense) speech communications without bodyworn or tethered microphones that takes place in a multisource environment where interfering signals may degrade the communication quality and the intelligibility of the desired speech source. In order to preserve speech quality intelligent acoustic interfaces may be used. An intelligent acoustic interface may comprise multiple microphones and loudspeakers and its peculiarity is to model the acoustic channel in order to adapt to user requirements and to environment conditions. This is the reason why intelligent acoustic interfaces are based on adaptive filtering algorithms. The acoustic path modelling entails a set of problems which have to be taken into account in designing an adaptive filtering algorithm. Such problems may be basically generated by a linear or a nonlinear process and can be tackled respectively by linear or nonlinear adaptive algorithms. In this work we consider such modelling problems and we propose novel effective adaptive algorithms that allow acoustic interfaces to be robust against any interfering signals, thus preserving the perceived quality of desired speech signals. As regards linear adaptive algorithms, a class of adaptive filters based on the sparse nature of the acoustic impulse response has been recently proposed. We adopt such class of adaptive filters, named proportionate adaptive filters, and derive a general framework from which it is possible to derive any linear adaptive algorithm. Using such framework we also propose some efficient proportionate adaptive algorithms, expressly designed to tackle problems of a linear nature. On the other side, in order to address problems deriving from a nonlinear process, we propose a novel filtering model which performs a nonlinear transformations by means of functional links. Using such nonlinear model, we propose functional link adaptive filters which provide an efficient solution to the modelling of a nonlinear acoustic channel. Finally, we introduce robust filtering architectures based on adaptive combinations of filters that allow acoustic interfaces to more effectively adapt to environment conditions, thus providing a powerful mean to immersive speech communications

    Adaptive Algorithms for Intelligent Acoustic Interfaces

    Get PDF
    Modern speech communications are evolving towards a new direction which involves users in a more perceptive way. That is the immersive experience, which may be considered as the “last-mile” problem of telecommunications. One of the main feature of immersive communications is the distant-talking, i.e. the hands-free (in the broad sense) speech communications without bodyworn or tethered microphones that takes place in a multisource environment where interfering signals may degrade the communication quality and the intelligibility of the desired speech source. In order to preserve speech quality intelligent acoustic interfaces may be used. An intelligent acoustic interface may comprise multiple microphones and loudspeakers and its peculiarity is to model the acoustic channel in order to adapt to user requirements and to environment conditions. This is the reason why intelligent acoustic interfaces are based on adaptive filtering algorithms. The acoustic path modelling entails a set of problems which have to be taken into account in designing an adaptive filtering algorithm. Such problems may be basically generated by a linear or a nonlinear process and can be tackled respectively by linear or nonlinear adaptive algorithms. In this work we consider such modelling problems and we propose novel effective adaptive algorithms that allow acoustic interfaces to be robust against any interfering signals, thus preserving the perceived quality of desired speech signals. As regards linear adaptive algorithms, a class of adaptive filters based on the sparse nature of the acoustic impulse response has been recently proposed. We adopt such class of adaptive filters, named proportionate adaptive filters, and derive a general framework from which it is possible to derive any linear adaptive algorithm. Using such framework we also propose some efficient proportionate adaptive algorithms, expressly designed to tackle problems of a linear nature. On the other side, in order to address problems deriving from a nonlinear process, we propose a novel filtering model which performs a nonlinear transformations by means of functional links. Using such nonlinear model, we propose functional link adaptive filters which provide an efficient solution to the modelling of a nonlinear acoustic channel. Finally, we introduce robust filtering architectures based on adaptive combinations of filters that allow acoustic interfaces to more effectively adapt to environment conditions, thus providing a powerful mean to immersive speech communications

    Machine Learning for Beamforming in Audio, Ultrasound, and Radar

    Get PDF
    Multi-sensor signal processing plays a crucial role in the working of several everyday technologies, from correctly understanding speech on smart home devices to ensuring aircraft fly safely. A specific type of multi-sensor signal processing called beamforming forms a central part of this thesis. Beamforming works by combining the information from several spatially distributed sensors to directionally filter information, boosting the signal from a certain direction but suppressing others. The idea of beamforming is key to the domains of audio, ultrasound, and radar. Machine learning is the other central part of this thesis. Machine learning, and especially its sub-field of deep learning, has enabled breakneck progress in tackling several problems that were previously thought intractable. Today, machine learning powers many of the cutting edge systems we see on the internet for image classification, speech recognition, language translation, and more. In this dissertation, we look at beamforming pipelines in audio, ultrasound, and radar from a machine learning lens and endeavor to improve different parts of the pipelines using ideas from machine learning. We start off in the audio domain and derive a machine learning inspired beamformer to tackle the problem of ensuring the audio captured by a camera matches its visual content, a problem we term audiovisual zooming. Staying in the audio domain, we then demonstrate how deep learning can be used to improve the perceptual qualities of speech by denoising speech clipping, codec distortions, and gaps in speech. Transitioning to the ultrasound domain, we improve the performance of short-lag spatial coherence ultrasound imaging by exploiting the differences in tissue texture at each short lag value by applying robust principal component analysis. Next, we use deep learning as an alternative to beamforming in ultrasound and improve the information extraction pipeline by simultaneously generating both a segmentation map and B-mode image of high quality directly from raw received ultrasound data. Finally, we move to the radar domain and study how deep learning can be used to improve signal quality in ultra-wideband synthetic aperture radar by suppressing radio frequency interference, random spectral gaps, and contiguous block spectral gaps. By training and applying the networks on raw single-aperture data prior to beamforming, it can work with myriad sensor geometries and different beamforming equations, a crucial requirement in synthetic aperture radar

    Array processing techniques for direction of arrival estimation, communications, and localization in vehicular and wireless sensor networks

    Get PDF
    Tese (doutorado)—Universidade de Brasília, Faculdade de Tecnologia, Departamento de Engenharia Elétrica, 2018.Técnicas de processamentos de sinais para comunicações sem fio tem sido um tópico de interesse para pesquisas há mais de três décadas. De acordo com o padrão Release 9 desenvolvido pelo consorcio 3rd Generation Partnership Project (3GPP) sistemas utilizando múltiplas antenas foram adotados na quarta geração (4G) dos sistemas de comunicação sem fio, também conhecida em inglês como Long Term Evolution (LTE). Para a quinta geração (5G) dos sistemas de comunicação sem fio centenas de antenas devem ser incorporadas aos equipamentos, na arquitetura conhecida em inglês como massive multi-user Multiple Input Multiple Output (MIMO). A presença de múltiplas antenas provê benefícios como o ganho do arranjo, ganho de diversidade, ganho espacial e redução de interferência. Além disso, arranjos de antenas possibilitam a filtragem espacial e a estimação de parâmetros, ambos podem ser usados para se resolver problemas que antes não eram vistos pelo prisma de processamento de sinais. O objetivo dessa tese é superar a lacuna entre a teoria de processamento de sinais e as aplicações da mesma em problemas reais. Tradicionalmente, técnicas de processamento de sinais assumem a existência de um arranjo de antenas ideal. Portanto, para que tais técnicas sejam exploradas em aplicações reais, um conjunto robusto de métodos para interpolação do arranjo é fundamental. Estes métodos são desenvolvidos nesta tese. Além disso problemas no campo de redes de sensores e redes veiculares são tratados nesta tese utilizando-se uma perspectiva de processamento de sinais. Nessa tesa métodos inovadores de interpolação de arranjos são apresentados e sua performance é testada utilizando-se cenários reais. Conceitos de processamento de sinais são implementados no contexto de redes de sensores. Esses conceitos possibilitam um nível de sincronização suficiente para a aplicação de sistemas de múltiplas antenas distribuídos, o que resulta em uma rede com maior vida útil e melhor performance. Métodos de processamento de sinais em arranjos são propostos para resolver o problema de localização baseada em sinais de rádio em redes veiculares, com aplicações em segurança de estradas e proteção de pedestres. Esta tese foi escrita em língua inglesa, um sumário em língua portuguesa é apresentado ao final da mesma.Coordenação de Aperfeiçoamento de Pessoal de Nível Superior (CAPES).Array signal processing in wireless communication has been a topic of interest in research for over three decades. In the fourth generation (4G) of the wireless communication systems, also known as Long Term Evolution (LTE), multi antenna systems have been adopted according to the Release 9 of the 3rd Generation Partnership Project (3GPP). For the fifth generation (5G) of the wireless communication systems, hundreds of antennas should be incorporated to the devices in a massive multi-user Multiple Input Multiple Output (MIMO) architecture. The presence of multiple antennas provides array gain, diversity gain, spatial gain, and interference reduction. Furthermore, arrays enable spatial filtering and parameter estimation, which can be used to help solve problems that could not previously be addressed from a signal processing perspective. The aim of this thesis is to bridge some gaps between signal processing theory and real world applications. Array processing techniques traditionally assume an ideal array. Therefore, in order to exploit such techniques, a robust set of methods for array interpolation are fundamental and are developed in this work. Problems in the field of wireless sensor networks and vehicular networks are also addressed from an array signal processing perspective. In this dissertation, novel methods for array interpolation are presented and their performance in real world scenarios is evaluated. Signal processing concepts are implemented in the context of a wireless sensor network. These concepts provide a level of synchronization sufficient for distributed multi antenna communication to be applied, resulting in improved lifetime and improved overall network behaviour. Array signal processing methods are proposed to solve the problem of radio based localization in vehicular network scenarios with applications in road safety and pedestrian protection

    Deep Learning for Distant Speech Recognition

    Full text link
    Deep learning is an emerging technology that is considered one of the most promising directions for reaching higher levels of artificial intelligence. Among the other achievements, building computers that understand speech represents a crucial leap towards intelligent machines. Despite the great efforts of the past decades, however, a natural and robust human-machine speech interaction still appears to be out of reach, especially when users interact with a distant microphone in noisy and reverberant environments. The latter disturbances severely hamper the intelligibility of a speech signal, making Distant Speech Recognition (DSR) one of the major open challenges in the field. This thesis addresses the latter scenario and proposes some novel techniques, architectures, and algorithms to improve the robustness of distant-talking acoustic models. We first elaborate on methodologies for realistic data contamination, with a particular emphasis on DNN training with simulated data. We then investigate on approaches for better exploiting speech contexts, proposing some original methodologies for both feed-forward and recurrent neural networks. Lastly, inspired by the idea that cooperation across different DNNs could be the key for counteracting the harmful effects of noise and reverberation, we propose a novel deep learning paradigm called network of deep neural networks. The analysis of the original concepts were based on extensive experimental validations conducted on both real and simulated data, considering different corpora, microphone configurations, environments, noisy conditions, and ASR tasks.Comment: PhD Thesis Unitn, 201

    AUDIO ZOOM FOR SMARTPHONES BASED ON MULTIPLE ADAPTIVE BEAMFORMERS

    No full text
    International audienceSome recent smartphones have offered the so-called audio zoom feature which allows to focus sound capture in the front direction while attenuating progressively surrounding sounds along with video zoom. This paper proposes a complete implementation of such function involving two major steps. First, targeted sound source is extracted by a novel approach that combines multiple adaptive beam-formers having different look directions with a post-processing algorithm. Second, spatial zooming effect is created by leveraging the microphone signals and the enhanced target source. Subjective test with real-world audio recordings using a mock-up simulating an usual shape of the smartphone confirms the rich user experience obtained by the proposed system
    corecore