Search CORE

13 research outputs found

Deep Audio Zooming: Beamwidth-Controllable Neural Beamformer

Author: Yu Dong
Yu Meng
Publication venue
Publication date: 21/11/2023
Field of study

Audio zooming, a signal processing technique, enables selective focusing and enhancement of sound signals from a specified region, attenuating others. While traditional beamforming and neural beamforming techniques, centered on creating a directional array, necessitate the designation of a singular target direction, they often overlook the concept of a field of view (FOV), that defines an angular area. In this paper, we proposed a simple yet effective FOV feature, amalgamating all directional attributes within the user-defined field. In conjunction, we've introduced a counter FOV feature capturing directional aspects outside the desired field. Such advancements ensure refined sound capture, particularly emphasizing the FOV's boundaries, and guarantee the enhanced capture of all desired sound sources inside the user-defined field. The results from the experiment demonstrate the efficacy of the introduced angular FOV feature and its seamless incorporation into a low-power subband model suited for real-time applica?tions.Comment: 6 pages, 5 figure

arXiv.org e-Print Archive

Adaptive Algorithms for Intelligent Acoustic Interfaces

Author: COMMINIELLO DANILO
Publication venue
Publication date: 16/04/2012
Field of study

Modern speech communications are evolving towards a new direction which involves users in a more perceptive way. That is the immersive experience, which may be considered as the “last-mile” problem of telecommunications. One of the main feature of immersive communications is the distant-talking, i.e. the hands-free (in the broad sense) speech communications without bodyworn or tethered microphones that takes place in a multisource environment where interfering signals may degrade the communication quality and the intelligibility of the desired speech source. In order to preserve speech quality intelligent acoustic interfaces may be used. An intelligent acoustic interface may comprise multiple microphones and loudspeakers and its peculiarity is to model the acoustic channel in order to adapt to user requirements and to environment conditions. This is the reason why intelligent acoustic interfaces are based on adaptive filtering algorithms. The acoustic path modelling entails a set of problems which have to be taken into account in designing an adaptive filtering algorithm. Such problems may be basically generated by a linear or a nonlinear process and can be tackled respectively by linear or nonlinear adaptive algorithms. In this work we consider such modelling problems and we propose novel effective adaptive algorithms that allow acoustic interfaces to be robust against any interfering signals, thus preserving the perceived quality of desired speech signals. As regards linear adaptive algorithms, a class of adaptive filters based on the sparse nature of the acoustic impulse response has been recently proposed. We adopt such class of adaptive filters, named proportionate adaptive filters, and derive a general framework from which it is possible to derive any linear adaptive algorithm. Using such framework we also propose some efficient proportionate adaptive algorithms, expressly designed to tackle problems of a linear nature. On the other side, in order to address problems deriving from a nonlinear process, we propose a novel filtering model which performs a nonlinear transformations by means of functional links. Using such nonlinear model, we propose functional link adaptive filters which provide an efficient solution to the modelling of a nonlinear acoustic channel. Finally, we introduce robust filtering architectures based on adaptive combinations of filters that allow acoustic interfaces to more effectively adapt to environment conditions, thus providing a powerful mean to immersive speech communications

Pubblicazioni Aperte Digitali Interateneo Sapienza

Archivio della ricerca- Università di Roma La Sapienza

Adaptive Algorithms for Intelligent Acoustic Interfaces

Author: COMMINIELLO DANILO
Publication venue
Publication date: 16/04/2012
Field of study

Archivio della ricerca- Università di Roma La Sapienza

Machine Learning for Beamforming in Audio, Ultrasound, and Radar

Author: Nair Arun Asokan
Publication venue: 'The Busan Gyeongnam Mathematical Society'
Publication date: 16/09/2021
Field of study

Multi-sensor signal processing plays a crucial role in the working of several everyday technologies, from correctly understanding speech on smart home devices to ensuring aircraft fly safely. A specific type of multi-sensor signal processing called beamforming forms a central part of this thesis. Beamforming works by combining the information from several spatially distributed sensors to directionally filter information, boosting the signal from a certain direction but suppressing others. The idea of beamforming is key to the domains of audio, ultrasound, and radar. Machine learning is the other central part of this thesis. Machine learning, and especially its sub-field of deep learning, has enabled breakneck progress in tackling several problems that were previously thought intractable. Today, machine learning powers many of the cutting edge systems we see on the internet for image classification, speech recognition, language translation, and more. In this dissertation, we look at beamforming pipelines in audio, ultrasound, and radar from a machine learning lens and endeavor to improve different parts of the pipelines using ideas from machine learning. We start off in the audio domain and derive a machine learning inspired beamformer to tackle the problem of ensuring the audio captured by a camera matches its visual content, a problem we term audiovisual zooming. Staying in the audio domain, we then demonstrate how deep learning can be used to improve the perceptual qualities of speech by denoising speech clipping, codec distortions, and gaps in speech. Transitioning to the ultrasound domain, we improve the performance of short-lag spatial coherence ultrasound imaging by exploiting the differences in tissue texture at each short lag value by applying robust principal component analysis. Next, we use deep learning as an alternative to beamforming in ultrasound and improve the information extraction pipeline by simultaneously generating both a segmentation map and B-mode image of high quality directly from raw received ultrasound data. Finally, we move to the radar domain and study how deep learning can be used to improve signal quality in ultra-wideband synthetic aperture radar by suppressing radio frequency interference, random spectral gaps, and contiguous block spectral gaps. By training and applying the networks on raw single-aperture data prior to beamforming, it can work with myriad sensor geometries and different beamforming equations, a crucial requirement in synthetic aperture radar

Johns Hopkins University

JScholarship

Recommended from our members

From active to passive spatial acoustic sensing and applications

Author: Sun Wei (Ph. D. in computer science)
Publication venue
Publication date: 31/03/2023
Field of study

The active acoustic sensing system emits modulated acoustic waves and analyzes reflection signals. It is dominant in acoustic spatial sensing. On the other side, the passive acoustic sensing system receives and investigates nature sounds directly. It is good at semantic tasks but has weak performance on spatial sensing. In this dissertation, we manage to bridge three gaps in existing systems. They are the gap between the assumption of signal processing algorithms and the real acoustic environment, the gap between powerful active spatial sensing and limited passive spatial sensing, and the gap between the semantic features and spatial information. We evolve the acoustic sensing system design and extend the functionalities by three novel systems. First, we develop a fully active spatial sensing system DeepRange which can adapt to the real environment easily. We develop an effective mechanism to generate synthetic training data that captures noise, speaker/mic distortion, and interference in the signals. It removes the need of collecting a large volume of data. We then design a deep range neural network (DRNet) to estimate the distance from raw acoustic signals. It is inspired by signal processing that an ultra-long convolution kernel size helps to combat noise and interference. The model is fully trained over synthetic data, but it can achieve sub-centimeter error robustly in real data despite various environments, background noise, interference, and mobile phone models. Second, we develop a fused active and passive spatial sensing system for speech separation noted as Spatial Aware Multi-task learning-based Separation (SAMS). We leverage both active sensing and passive sensing to improve AoA estimation and jointly optimize the semantic task and the spatial task. SAMS estimates the spatial location and extracts speech for the target user during teleconferencing simultaneously. We first generate fine-grained spatial embeddings from the user’s voice and inaudible tracking sound, which contains the user’s position and rich multipath information. Furthermore, we develop a deep neural network with multi-task learning to jointly optimize source separation and location. We significantly speed up inference to provide a real-time guarantee. Finally, we deeply fuse the semantic features and spatial cues to combat the interference and noise in the real environment as well as enable depth sensing in a fully passive setup. Inspired by the ”flash-to-bang” phenomenon (i.e.hearing the thunder after seeing the lightning), we propose FBDepth to measure the depth of the sound source. We formulate the problem as an audio-visual event localization task for collision events. Specifically, FBDepth first aligns correspondence between the video track and audio track to locate the target object and target sound in a coarse granularity. Based on the observation of moving objects’ trajectories, it proposes to estimate the intersection of optical flow before and after the collision to locate video events in time. It feeds the estimated timestamp of the video event and the other modalities for the final depth estimation. We use a mobile phone to collect the 3.6K+ video clips involving 24 different objects at up to 60m. FBDepth shows superior performance especially at a long range compared to monocular and stereo methods.Computer Science

Texas ScholarWorks

Array processing techniques for direction of arrival estimation, communications, and localization in vehicular and wireless sensor networks

Author: Marinho Marco Antonio Marques
Publication venue
Publication date: 01/01/2018
Field of study

Tese (doutorado)—Universidade de Brasília, Faculdade de Tecnologia, Departamento de Engenharia Elétrica, 2018.Técnicas de processamentos de sinais para comunicações sem fio tem sido um tópico de interesse para pesquisas há mais de três décadas. De acordo com o padrão Release 9 desenvolvido pelo consorcio 3rd Generation Partnership Project (3GPP) sistemas utilizando múltiplas antenas foram adotados na quarta geração (4G) dos sistemas de comunicação sem fio, também conhecida em inglês como Long Term Evolution (LTE). Para a quinta geração (5G) dos sistemas de comunicação sem fio centenas de antenas devem ser incorporadas aos equipamentos, na arquitetura conhecida em inglês como massive multi-user Multiple Input Multiple Output (MIMO). A presença de múltiplas antenas provê benefícios como o ganho do arranjo, ganho de diversidade, ganho espacial e redução de interferência. Além disso, arranjos de antenas possibilitam a filtragem espacial e a estimação de parâmetros, ambos podem ser usados para se resolver problemas que antes não eram vistos pelo prisma de processamento de sinais. O objetivo dessa tese é superar a lacuna entre a teoria de processamento de sinais e as aplicações da mesma em problemas reais. Tradicionalmente, técnicas de processamento de sinais assumem a existência de um arranjo de antenas ideal. Portanto, para que tais técnicas sejam exploradas em aplicações reais, um conjunto robusto de métodos para interpolação do arranjo é fundamental. Estes métodos são desenvolvidos nesta tese. Além disso problemas no campo de redes de sensores e redes veiculares são tratados nesta tese utilizando-se uma perspectiva de processamento de sinais. Nessa tesa métodos inovadores de interpolação de arranjos são apresentados e sua performance é testada utilizando-se cenários reais. Conceitos de processamento de sinais são implementados no contexto de redes de sensores. Esses conceitos possibilitam um nível de sincronização suficiente para a aplicação de sistemas de múltiplas antenas distribuídos, o que resulta em uma rede com maior vida útil e melhor performance. Métodos de processamento de sinais em arranjos são propostos para resolver o problema de localização baseada em sinais de rádio em redes veiculares, com aplicações em segurança de estradas e proteção de pedestres. Esta tese foi escrita em língua inglesa, um sumário em língua portuguesa é apresentado ao final da mesma.Coordenação de Aperfeiçoamento de Pessoal de Nível Superior (CAPES).Array signal processing in wireless communication has been a topic of interest in research for over three decades. In the fourth generation (4G) of the wireless communication systems, also known as Long Term Evolution (LTE), multi antenna systems have been adopted according to the Release 9 of the 3rd Generation Partnership Project (3GPP). For the fifth generation (5G) of the wireless communication systems, hundreds of antennas should be incorporated to the devices in a massive multi-user Multiple Input Multiple Output (MIMO) architecture. The presence of multiple antennas provides array gain, diversity gain, spatial gain, and interference reduction. Furthermore, arrays enable spatial filtering and parameter estimation, which can be used to help solve problems that could not previously be addressed from a signal processing perspective. The aim of this thesis is to bridge some gaps between signal processing theory and real world applications. Array processing techniques traditionally assume an ideal array. Therefore, in order to exploit such techniques, a robust set of methods for array interpolation are fundamental and are developed in this work. Problems in the field of wireless sensor networks and vehicular networks are also addressed from an array signal processing perspective. In this dissertation, novel methods for array interpolation are presented and their performance in real world scenarios is evaluated. Signal processing concepts are implemented in the context of a wireless sensor network. These concepts provide a level of synchronization sufficient for distributed multi antenna communication to be applied, resulting in improved lifetime and improved overall network behaviour. Array signal processing methods are proposed to solve the problem of radio based localization in vehicular network scenarios with applications in road safety and pedestrian protection

Repositório Institucional da Universidade de Brasília

Digitala Vetenskapliga Arkivet - Academic Archive On-line

Högskolebiblioteket i Halmstad Publikationer

Deep Learning for Distant Speech Recognition

Author: Ravanelli Mirco
Publication venue
Publication date: 15/12/2017
Field of study

Deep learning is an emerging technology that is considered one of the most promising directions for reaching higher levels of artificial intelligence. Among the other achievements, building computers that understand speech represents a crucial leap towards intelligent machines. Despite the great efforts of the past decades, however, a natural and robust human-machine speech interaction still appears to be out of reach, especially when users interact with a distant microphone in noisy and reverberant environments. The latter disturbances severely hamper the intelligibility of a speech signal, making Distant Speech Recognition (DSR) one of the major open challenges in the field. This thesis addresses the latter scenario and proposes some novel techniques, architectures, and algorithms to improve the robustness of distant-talking acoustic models. We first elaborate on methodologies for realistic data contamination, with a particular emphasis on DNN training with simulated data. We then investigate on approaches for better exploiting speech contexts, proposing some original methodologies for both feed-forward and recurrent neural networks. Lastly, inspired by the idea that cooperation across different DNNs could be the key for counteracting the harmful effects of noise and reverberation, we propose a novel deep learning paradigm called network of deep neural networks. The analysis of the original concepts were based on extensive experimental validations conducted on both real and simulated data, considering different corpora, microphone configurations, environments, noisy conditions, and ASR tasks.Comment: PhD Thesis Unitn, 201

arXiv.org e-Print Archive

Unitn-eprints PhD

Single-Photon Avalanche Diodes: devices and advanced applications

Author: Tosi Alberto
Villa Federica Alberta
Zappa Franco
Publication venue
Publication date: 01/01/2015
Field of study

Archivio istituzionale della ricerca - Politecnico di Milano

AUDIO ZOOM FOR SMARTPHONES BASED ON MULTIPLE ADAPTIVE BEAMFORMERS

Author: Berthet Pierre
Chevallier Louis
Duong Ngoc Q. K.
Kerdranvat Michel
Ozerov Alexey
Zabre Sidkieta
Publication venue: HAL CCSD
Publication date: 21/02/2017
Field of study

International audienceSome recent smartphones have offered the so-called audio zoom feature which allows to focus sound capture in the front direction while attenuating progressively surrounding sounds along with video zoom. This paper proposes a complete implementation of such function involving two major steps. First, targeted sound source is extracted by a novel approach that combines multiple adaptive beam-formers having different look directions with a post-processing algorithm. Second, spatial zooming effect is created by leveraging the microphone signals and the enhanced target source. Subjective test with real-world audio recordings using a mock-up simulating an usual shape of the smartphone confirms the rich user experience obtained by the proposed system