853 research outputs found

    Robust Nearfield Wideband Beamforming Design Based on Adaptive-Weighted Convex Optimization

    Get PDF
    Nearfield wideband beamformers for microphone arrays have wide applications in multichannel speech enhancement. The nearfield wideband beamformer design based on convex optimization is one of the typical representatives of robust approaches. However, in this approach, the coefficient of convex optimization is a constant, which has not used all the freedom provided by the weighting coefficient efficiently. Therefore, it is still necessary to further improve the performance. To solve this problem, we developed a robust nearfield wideband beamformer design approach based on adaptive-weighted convex optimization. The proposed approach defines an adaptive-weighted function by the adaptive array signal processing theory and adjusts its value flexibly, which has improved the beamforming performance. During each process of the adaptive updating of the weighting function, the convex optimization problem can be formulated as a SOCP (Second-Order Cone Program) problem, which could be solved efficiently using the well-established interior-point methods. This method is suitable for the case where the sound source is in the nearfield range, can work well in the presence of microphone mismatches, and is applicable to arbitrary array geometries. Several design examples are presented to verify the effectiveness of the proposed approach and the correctness of the theoretical analysis

    Sparse Array Design for Wideband Beamforming with Reduced Complexity in Tapped Delay-lines

    Get PDF
    Sparse wideband array design for sensor location optimization is highly nonlinear and it is traditionally solved by genetic algorithms (GAs) or other similar optimization methods. This is an extremely time-consuming process and an optimum solution is not always guaranteed. In this work, this problem is studied from the viewpoint of compressive sensing (CS). Although there have been CS-based methods proposed for the design of sparse narrowband arrays, its extension to the wideband case is not straightforward, as there are multiple coefficients associated with each sensor and they have to be simultaneously minimized in order to discard the corresponding sensor locations. At first, sensor location optimization for both general wideband beamforming and frequency invariant beamforming is considered. Then, sparsity in the tapped delay-line (TDL) coefficients associated with each sensor is considered in order to reduce the implementation complexity of each TDL. Finally, design of robust wideband arrays against norm-bounded steering vector errors is addressed. Design examples are provided to verify the effectiveness of the proposed methods, with comparisons drawn with a GA-based design method

    Advanced algorithms for audio and image processing

    Get PDF
    The objective of the thesis is the development of a set of innovative algorithms around the topic of beamforming in the field of acoustic imaging, audio and image processing, aimed at significantly improving the performance of devices that exploit these computational approaches. Therefore the context is the improvement of devices (ultrasound machines and video/audio devices) already on the market or the development of new ones which, through the proposed studies, can be introduced on new the markets with the launch of innovative high-tech start-ups. This is the motivation and the leitmotiv behind the doctoral work carried out. In fact, in the first part of the work an innovative image reconstruction algorithm in the field of ultrasound biomedical imaging is presented, which is connected to the development of such equipment that exploits the computing opportunities currently offered nowadays at low cost by GPUs (Moore\u2019s law). The proposed target is to obtain a new pipeline of the reconstruction of the image abandoning the architecture of such hardware based In the first part of the thesis I faced the topic of the reconstruction of ultrasound images for applications hypothesized on a software based device through image reconstruction algorithms processed in the frequency domain. An innovative beamforming algorithm based on seismic migration is presented, in which a transformation of the RF data is carried out and the reconstruction algorithm can evaluate a masking of the k-space of the data, speeding up the reconstruction process and reducing the computational burden. The analysis and development of the algorithms responsible for carrying out the thesis has been approached from a feasibility point in an off-line context and on the Matlab platform, processing both synthetic simulated generated data and real RF data: the subsequent development of these algorithms within of the future ultrasound biomedical equipment will exploit an high-performance computing framework capable of processing customized kernel pipelines (henceforth called \u2019filters\u2019) on CPU/GPU. The type of filters implemented involved the topic of Plane Wave Imaging (PWI), an alternative method of acquiring the ultrasound image compared to the state of the art of the traditional standard B-mode which currently exploit sequential sequence of insonification of the sample under examination through focused beams transmitted by the probe channels. The PWI mode is interesting and opens up new scenarios compared to the usual signal acquisition and processing techniques, with the aim of making signal processing in general and image reconstruction in particular faster and more flexible, and increasing importantly the frame rate opens up and improves clinical applications. The innovative idea is to introduce in an offline seismic reconstruction algorithm for ultrasound imaging a further filter, named masking matrix. The masking matrices can be computed offline knowing the system parameters, since they do not depend from acquired data. Moreover, they can be pre-multiplied to propagation matrices, without affecting the overall computational load. Subsequently in the thesis, the topic of beamforming in audio processing on super-direct linear arrays of microphones is addressed. The aim is to make an in depth analysis of two main families of data-independent approaches and algorithms present in the literature by comparing their performances and the trade-off between directivity and frequency invariance, which is not yet known at to the state-of-the-art. The goal is to validate the best algorithm that allows, from the perspective of an implementation, to experimentally verify performance, correlating it with the characteristics and error statistics. Frequency-invariant beam patterns are often required by systems using an array of sensors to process broadband signals. In some experimental conditions, the array spatial aperture is shorter than the involved wavelengths. In these conditions, superdirective beamforming is essential for an efficient system. I present a comparison between two methods that deal with a data-independent beamformer based on a filter-and-sum structure. Both methods (the first one numerical, the second one analytic) formulate a mathematical convex minimization problem, in which the variables to be optimized are the filters coefficients or frequency responses. In the described simulations, I have chosen a geometry and a set-up of parameters that allows us to make a fair comparison between the performances of the two different design methods analyzed. In particular, I addressed a small linear array for audio capture with different purposes (hearing aids, audio surveillance system, video-conference system, multimedia device, etc.). The research activity carried out has been used for the launch of a high-tech device through an innovative start-up in the field of glasses/audio devices (https://acoesis.com/en/). It has been proven that the proposed algorithm gives the possibility of obtaining higher performances than the state of the art of similar algorithms, additionally providing the possibility of connecting directivity or better generalized directivity to the statistics of phase errors and gain of sensors, extremely important in superdirective arrays in the case of real and industrial implementation. Therefore, the method proposed by the comparison is innovative because it quantitatively links the physical construction characteristics of the array to measurable and experimentally verifiable quantities, making the real implementation process controllable. The third topic faced is the reconstruction of the Room Impluse Response (RIR) using audio processing blind methods. Given an unknown audio source, the estimation of time differences-of-arrivals (TDOAs) can be efficiently and robustly solved using blind channel identification and exploiting the cross-correlation identity (CCI). Prior blind works have improved the estimate of TDOAs by means of different algorithmic solutions and optimization strategies, while always sticking to the case N = 2 microphones. But what if we can obtain a direct improvement in performance by just increasing N? In the fourth Chapter I tried to investigate this direction, showing that, despite the arguable simplicity, this is capable of (sharply) improving upon state-of-the-art blind channel identification methods based on CCI, without modifying the computational pipeline. Inspired by our results, we seek to warm up the community and the practitioners by paving the way (with two concrete, yet preliminary, examples) towards joint approaches in which advances in the optimization are combined with an increased number of microphones, in order to achieve further improvements. Sound source localisation applications can be tackled by inferring the time-difference-of-arrivals (TDOAs) between a sound-emitting source and a set of microphones. Among the referred applications, one can surely list room-aware sound reproduction, room geometry\u2019s estimation, speech enhancement. Despite a broad spectrum of prior works estimate TDOAs from a known audio source, even when the signal emitted from the acoustic source is unknown, TDOAs can be inferred by comparing the signals received at two (or more) spatially separated microphones, using the notion of cross-corrlation identity (CCI). This is the key theoretical tool, not only, to make the ordering of microphones irrelevant during the acquisition stage, but also to solve the problem as blind channel identification, robustly and reliably inferring TDOAs from an unknown audio source. However, when dealing with natural environments, such \u201cmutual agreement\u201d between microphones can be tampered by a variety of audio ambiguities such as ambient noise. Furthermore, each observed signal may contain multiple distorted or delayed replicas of the emitting source due to reflections or generic boundary effects related to the (closed) environment. Thus, robustly estimating TDOAs is surely a challenging problem and CCI-based approaches cast it as single-input/multi-output blind channel identification. Such methods promote robustness in the estimate from the methodological standpoint: using either energy-based regularization, sparsity or positivity constraints, while also pre-conditioning the solution space. Last but not least, the Acoustic Imaging is an imaging modality that exploits the propagation of acoustic waves in a medium to recover the spatial distribution and intensity of sound sources in a given region. Well known and widespread acoustic imaging applications are, for example, sonar and ultrasound. There are active and passive imaging devices: in the context of this thesis I consider a passive imaging system called Dual Cam that does not emit any sound but acquires it from the environment. In an acoustic image each pixel corresponds to the sound intensity of the source, the whose position is described by a particular pair of angles and, in the case in which the beamformer can, as in our case, work in near-field, from a distance on which the system is focused. In the last part of this work I propose the use of a new modality characterized by a richer information content, namely acoustic images, for the sake of audio-visual scene understanding. Each pixel in such images is characterized by a spectral signature, associated to a specific direction in space and obtained by processing the audio signals coming from an array of microphones. By coupling such array with a video camera, we obtain spatio-temporal alignment of acoustic images and video frames. This constitutes a powerful source of self-supervision, which can be exploited in the learning pipeline we are proposing, without resorting to expensive data annotations. However, since 2D planar arrays are cumbersome and not as widespread as ordinary microphones, we propose that the richer information content of acoustic images can be distilled, through a self-supervised learning scheme, into more powerful audio and visual feature representations. The learnt feature representations can then be employed for downstream tasks such as classification and cross-modal retrieval, without the need of a microphone array. To prove that, we introduce a novel multimodal dataset consisting in RGB videos, raw audio signals and acoustic images, aligned in space and synchronized in time. Experimental results demonstrate the validity of our hypothesis and the effectiveness of the proposed pipeline, also when tested for tasks and datasets different from those used for training. Chapter 6 closes the thesis, presenting a development activity of a new Dual Cam POC to build-up from it a spin-off, assuming to apply for an innovation project for hi-tech start- ups (such as a SME instrument H2020) for a 50Keuro grant, following the idea of the technology transfer. A deep analysis of the reference market, technologies and commercial competitors, business model and the FTO of intellectual property is then conducted. Finally, following the latest technological trends (https://www.flir.eu/products/si124/) a new version of the device (planar audio array) with reduced dimensions and improved technical characteristics is simulated, simpler and easier to use than the current one, opening up new interesting possibilities of development not only technical and scientific but also in terms of business fallout

    Optimal Space-Time-Frequency Design of Microphone Networks

    Get PDF
    Consider a sensing system using a large number of N microphones placed in multiple dimensions to monitor a acoustic field. Using all the microphones at once is impractical because of the amount data generated. Instead, we choose a subset of D microphones to be active. Specifically, we wish to find the D set of microphones that minimizes the largest interference gain at multiple frequencies while monitoring a target of interest. A direct, combinatorial approach - testing all N choose D subsets of microphones - is impractical because of problem size. Instead, we use a convex optimization technique that induces sparsity through a l1-penalty to determine which subset of microphones to use. Our work investigates not only the optimal placement (space) of microphones but also how to process the output of each microphone (time/frequency). We explore this problem for both single and multi-frequency sources, optimizing both microphone weights and positions simultaneously. In addition, we explore this problem for random sources where the output of each of the N microphones is processed by an individual multirate filterbank. The N processed filterbank outputs are then combined to form one final signal. In this case, we fix all the analysis filters and optimize over all the synthesis filters. We show how to convert the continuous frequency problem to a discrete frequency approximation that is computationally tractable. In this random source/multirate filterbank case, we once again optimize over space-time-frequency simultaneously

    Efficient Synthesis of Room Acoustics via Scattering Delay Networks

    Get PDF
    An acoustic reverberator consisting of a network of delay lines connected via scattering junctions is proposed. All parameters of the reverberator are derived from physical properties of the enclosure it simulates. It allows for simulation of unequal and frequency-dependent wall absorption, as well as directional sources and microphones. The reverberator renders the first-order reflections exactly, while making progressively coarser approximations of higher-order reflections. The rate of energy decay is close to that obtained with the image method (IM) and consistent with the predictions of Sabine and Eyring equations. The time evolution of the normalized echo density, which was previously shown to be correlated with the perceived texture of reverberation, is also close to that of IM. However, its computational complexity is one to two orders of magnitude lower, comparable to the computational complexity of a feedback delay network (FDN), and its memory requirements are negligible

    Reducing the number of elements in the synthesis of a broadband linear array with multiple simultaneous frequency-invariant beam patterns

    Full text link
    © 2018 IEEE. The problem of reducing the number of elements in a broadband linear array with multiple simultaneous crossover frequency-invariant (FI) patterns is considered. Different from the single FI pattern array case, every element channel in the multiple FI pattern array is divided and followed by multiple finite-impulse-response (FIR) filters, and each of the multiple FIR filters has a set of coefficients. In this situation, a collective filter coefficient vector and its energy bound are introduced for each element, and then the problem of reducing the number of elements is transformed as minimizing the number of active collective filter coefficient vectors. In addition, the radiation characteristics including beam pointing direction, mainlobe FI property, sidelobe level, and space-frequency notching requirement for each of the multiple patterns can be formulated as multiple convex constraints. The whole synthesis method is implemented by performing an iterative second-order cone programming (SOCP). This method can be considered as a significant extension of the original SOCP for synthesizing broadband sparse array with single FI pattern. Numerical synthesis results show that the proposed method by synthesizing multiple discretized crossover FI patterns can save more elements than the original iterative SOCP by using a single continuously scannable FI pattern for covering the same space range. Moreover, even for multiple FI-patterns case with complicated space-frequency notching, the proposed method is still effective in the reduction of the number of elements

    The Impact of the White Noise Gain (WNG) of a Virtual Artificial Head on the Appraisal of Binaural Sound Reproduction

    Get PDF
    As an individualized alternative to traditional artificial heads, individual head-related transfer functions (HRTFs) can be synthesized with a microphone array and digital filtering. This strategy is referred to as "virtual artificial head" (VAH). The VAH filter coefficients are calculated by incorporating regularization to account for small errors in the characteristics and/or the position of the microphones. A common way to increase robustness is to impose a socalled white noise gain (WNG) constraint. The higher the WNG, the more robust the HRTF synthesis will be. On the other hand, this comes at the cost of decreasing the synthesis accuracy for the given sample of the HRTF set in question. Thus, a compromise between robustness and accuracy must be found, which furthermore depends on the used setup (sensor noise, mechanical stability etc.). In this study, different WNG are evaluated perceptually by four expert listeners for two different microphone arrays. The aim of the study is to find microphone array-dependent WNG regions which result in appropriate perceptual performances. It turns out that the perceptually optimal WNG varies with the microphone array, depending on the sensor noise and mechanical stability but also on the individual HRTFs and preferences. These results may be used to optimize VAH regularization strategies with respect to microphone characteristics, in particular self noise and stability.BMBF, 17080X10,ViKK: Virtueller KunstkopfDFG, EXC 1077/1, Hören für alle: Modelle, Technologien und Lösungsansätze für Diagnostik, Wiederherstellung und Unterstützung des Hören
    • …
    corecore