56 research outputs found

    Enhancements to the Generalized Sidelobe Canceller for Audio Beamforming in an Immersive Environment

    Get PDF
    The Generalized Sidelobe Canceller is an adaptive algorithm for optimally estimating the parameters for beamforming, the signal processing technique of combining data from an array of sensors to improve SNR at a point in space. This work focuses on the algorithm’s application to widely-separated microphone arrays with irregular distributions used for human voice capture. Methods are presented for improving the performance of the algorithm’s blocking matrix, a stage that creates a noise reference for elimination, by proposing a stochastic model for amplitude correction and enhanced use of cross correlation for phase correction and time-difference of arrival estimation via a correlation coefficient threshold. This correlation technique is also applied to a multilateration algorithm for an efficient method of explicit target tracking. In addition, the underlying microphone array geometry is studied with parameters and guidelines for evaluation proposed. Finally, an analysis of the stability of the system is performed with respect to its adaptation parameters

    Adaptive Algorithms for Intelligent Acoustic Interfaces

    Get PDF
    Modern speech communications are evolving towards a new direction which involves users in a more perceptive way. That is the immersive experience, which may be considered as the “last-mile” problem of telecommunications. One of the main feature of immersive communications is the distant-talking, i.e. the hands-free (in the broad sense) speech communications without bodyworn or tethered microphones that takes place in a multisource environment where interfering signals may degrade the communication quality and the intelligibility of the desired speech source. In order to preserve speech quality intelligent acoustic interfaces may be used. An intelligent acoustic interface may comprise multiple microphones and loudspeakers and its peculiarity is to model the acoustic channel in order to adapt to user requirements and to environment conditions. This is the reason why intelligent acoustic interfaces are based on adaptive filtering algorithms. The acoustic path modelling entails a set of problems which have to be taken into account in designing an adaptive filtering algorithm. Such problems may be basically generated by a linear or a nonlinear process and can be tackled respectively by linear or nonlinear adaptive algorithms. In this work we consider such modelling problems and we propose novel effective adaptive algorithms that allow acoustic interfaces to be robust against any interfering signals, thus preserving the perceived quality of desired speech signals. As regards linear adaptive algorithms, a class of adaptive filters based on the sparse nature of the acoustic impulse response has been recently proposed. We adopt such class of adaptive filters, named proportionate adaptive filters, and derive a general framework from which it is possible to derive any linear adaptive algorithm. Using such framework we also propose some efficient proportionate adaptive algorithms, expressly designed to tackle problems of a linear nature. On the other side, in order to address problems deriving from a nonlinear process, we propose a novel filtering model which performs a nonlinear transformations by means of functional links. Using such nonlinear model, we propose functional link adaptive filters which provide an efficient solution to the modelling of a nonlinear acoustic channel. Finally, we introduce robust filtering architectures based on adaptive combinations of filters that allow acoustic interfaces to more effectively adapt to environment conditions, thus providing a powerful mean to immersive speech communications

    Adaptive Algorithms for Intelligent Acoustic Interfaces

    Get PDF
    Modern speech communications are evolving towards a new direction which involves users in a more perceptive way. That is the immersive experience, which may be considered as the “last-mile” problem of telecommunications. One of the main feature of immersive communications is the distant-talking, i.e. the hands-free (in the broad sense) speech communications without bodyworn or tethered microphones that takes place in a multisource environment where interfering signals may degrade the communication quality and the intelligibility of the desired speech source. In order to preserve speech quality intelligent acoustic interfaces may be used. An intelligent acoustic interface may comprise multiple microphones and loudspeakers and its peculiarity is to model the acoustic channel in order to adapt to user requirements and to environment conditions. This is the reason why intelligent acoustic interfaces are based on adaptive filtering algorithms. The acoustic path modelling entails a set of problems which have to be taken into account in designing an adaptive filtering algorithm. Such problems may be basically generated by a linear or a nonlinear process and can be tackled respectively by linear or nonlinear adaptive algorithms. In this work we consider such modelling problems and we propose novel effective adaptive algorithms that allow acoustic interfaces to be robust against any interfering signals, thus preserving the perceived quality of desired speech signals. As regards linear adaptive algorithms, a class of adaptive filters based on the sparse nature of the acoustic impulse response has been recently proposed. We adopt such class of adaptive filters, named proportionate adaptive filters, and derive a general framework from which it is possible to derive any linear adaptive algorithm. Using such framework we also propose some efficient proportionate adaptive algorithms, expressly designed to tackle problems of a linear nature. On the other side, in order to address problems deriving from a nonlinear process, we propose a novel filtering model which performs a nonlinear transformations by means of functional links. Using such nonlinear model, we propose functional link adaptive filters which provide an efficient solution to the modelling of a nonlinear acoustic channel. Finally, we introduce robust filtering architectures based on adaptive combinations of filters that allow acoustic interfaces to more effectively adapt to environment conditions, thus providing a powerful mean to immersive speech communications

    A Novel Voice Activity Detection for Multi-Channel Noise Reduction

    Get PDF
    In this study, a voice activity detection technique is designed using features such as short-term energy, periodicity and spectral flatness. The desired results are obtained by using these three features, even at low signal to noise ratio values. In addition, performance of multi-channel noise reduction algorithms such as Wiener speech distortion weighted, spatial prediction, minimum variance distortion-less response are compared using the proposed voice activity detection. Two different audio signals and three different noise types are used in the experiment. Noisy speech and only detection of noisy areas have been performed by proposed voice activity detection algorithm. The filter coefficients have been calculated for each filter algorithm used after detection of noisy speech and only noisy areas. The calculated filter coefficients have been multiplied by the frequency components of the signal received from the reference microphone to obtain an enhanced signal. Segmental signal to noise ratio, an objective method, and mean opinion score as a subjective method have been used to evaluate the performance of the filters. Speech distortion weighted Wiener filter has been found to be the best filter for noise reduction performance.[NKUBAP.06]; [YL.18.156]This work was supported by the Tekirda Namk Kemal University Scienti~c Research Project Commission under Grant NKUBAP.06.YL.18.156

    User-Symbiotic Speech Enhancement for Hearing Aids

    Get PDF

    Audio source separation into the wild

    Get PDF
    International audienceThis review chapter is dedicated to multichannel audio source separation in real-life environment. We explore some of the major achievements in the field and discuss some of the remaining challenges. We will explore several important practical scenarios, e.g. moving sources and/or microphones, varying number of sources and sensors, high reverberation levels, spatially diffuse sources, and synchronization problems. Several applications such as smart assistants, cellular phones, hearing aids and robots, will be discussed. Our perspectives on the future of the field will be given as concluding remarks of this chapter

    Voice inactivity ranking for enhancement of speech on microphone arrays

    Full text link
    Motivated by the problem of improving the performance of speech enhancement algorithms in non-stationary acoustic environments with low SNR, a framework is proposed for identifying signal frames of noisy speech that are unlikely to contain voice activity. Such voice-inactive frames can then be incorporated into an adaptation strategy to improve the performance of existing speech enhancement algorithms. This adaptive approach is applicable to single-channel as well as multi-channel algorithms for noisy speech. In both cases, the adaptive versions of the enhancement algorithms are observed to improve SNR levels by 20dB, as indicated by PESQ and WER criteria. In advanced speech enhancement algorithms, it is often of interest to identify some regions of the signal that have a high likelihood of being noise only i.e. no speech present. This is in contrast to advanced speech recognition, speaker recognition, and pitch tracking algorithms in which we are interested in identifying all regions that have a high likelihood of containing speech, as well as regions that have a high likelihood of not containing speech. In other terms, this would mean minimizing the false positive and false negative rates, respectively. In the context of speech enhancement, the identification of some speech-absent regions prompts the minimization of false positives while setting an acceptable tolerance on false negatives, as determined by the performance of the enhancement algorithm. Typically, Voice Activity Detectors (VADs) are used for identifying speech absent regions for the application of speech enhancement. In recent years a myriad of Deep Neural Network (DNN) based approaches have been proposed to improve the performance of VADs at low SNR levels by training on combinations of speech and noise. Training on such an exhaustive dataset is combinatorically explosive. For this dissertation, we propose a voice inactivity ranking framework, where the identification of voice-inactive frames is performed using a machine learning (ML) approach that only uses clean speech utterances for training and is robust to high levels of noise. In the proposed framework, input frames of noisy speech are ranked by ‘voice inactivity score’ to acquire definitely speech inactive (DSI) frame-sequences. These DSI regions serve as a noise estimate and are adaptively used by the underlying speech enhancement algorithm to enhance speech from a speech mixture. The proposed voice-inactivity ranking framework was used to perform speech enhancement in single-channel and multi-channel systems. In the context of microphone arrays, the proposed framework was used to determine parameters for spatial filtering using adaptive beamformers. We achieved an average Word Error Rate (WER) improvement of 50% at SNR levels below 0dB compared to the noisy signal, which is 7±2.5% more than the framework where state-of-the-art VAD decision was used for spatial filtering. For monaural signals, we propose a multi-frame multiband spectral-subtraction (MF-MBSS) speech enhancement system utilizing the voice inactivity framework to compute and update the noise statistics on overlapping frequency bands. The proposed MF-MBSS not only achieved an average PESQ improvement of 16% with a maximum improvement of 56% when compared to the state-of-the-art Spectral Subtraction but also a 5 ± 1.5% improvement in the Word Error Rate (WER) of the spatially filtered output signal, in non-stationary acoustic environments

    Speech enhancement algorithms for audiological applications

    Get PDF
    Texto en inglés y resumen en inglés y españolPremio Extraordinario de Doctorado de la UAH en el año académico 2013-2014La mejora de la calidad de la voz es un problema que, aunque ha sido abordado durante muchos años, aún sigue abierto. El creciente auge de aplicaciones tales como los sistemas manos libres o de reconocimiento de voz automático y las cada vez mayores exigencias de las personas con pérdidas auditivas han dado un impulso definitivo a este área de investigación. Esta tesis doctoral se centra en la mejora de la calidad de la voz en aplicaciones audiológicas. La mayoría del trabajo de investigación desarrollado en esta tesis está dirigido a la mejora de la inteligibilidad de la voz en audífonos digitales, teniendo en cuenta las limitaciones de este tipo de dispositivos. La combinación de técnicas de separación de fuentes y filtrado espacial con técnicas de aprendizaje automático y computación evolutiva ha originado novedosos e interesantes algoritmos que son incluidos en esta tesis. La tesis esta dividida en dos grandes bloques. El primer bloque contiene un estudio preliminar del problema y una exhaustiva revisión del estudio del arte sobre algoritmos de mejora de la calidad de la voz, que sirve para definir los objetivos de esta tesis. El segundo bloque contiene la descripción del trabajo de investigación realizado para cumplir los objetivos de la tesis, así como los experimentos y resultados obtenidos. En primer lugar, el problema de mejora de la calidad de la voz es descrito formalmente en el dominio tiempo-frecuencia. Los principales requerimientos y restricciones de los audífonos digitales son definidas. Tras describir el problema, una amplia revisión del estudio del arte ha sido elaborada. La revisión incluye algoritmos de mejora de la calidad de la voz mono-canal y multi-canal, considerando técnicas de reducción de ruido y técnicas de separación de fuentes. Además, la aplicación de estos algoritmos en audífonos digitales es evaluada. El primer problema abordado en la tesis es la separación de fuentes sonoras en mezclas infra-determinadas en el dominio tiempo-frecuencia, sin considerar ningún tipo de restricción computacional. El rendimiento del famoso algoritmo DUET, que consigue separar fuentes de voz con solo dos mezclas, ha sido evaluado en diversos escenarios, incluyendo mezclas lineales y binaurales no reverberantes, mezclas reverberantes, y mezclas de voz con otro tipo de fuentes tales como ruido y música. El estudio revela la falta de robustez del algoritmo DUET, cuyo rendimiento se ve seriamente disminuido en mezclas reverberantes, mezclas binaurales, y mezclas de voz con música y ruido. Con el objetivo de mejorar el rendimiento en estos casos, se presenta un novedoso algoritmo de separación de fuentes que combina la técnica de clustering mean shift con la base del algoritmo DUET. La etapa de clustering del algoritmo DUET, que esta basada en un histograma ponderado, es reemplazada por una modificación del algoritmo mean shift, introduciendo el uso de un kernel Gaussiano ponderado. El análisis de los resultados obtenidos muestran una clara mejora obtenida por el algoritmo propuesto en relación con el algoritmo DUET original y una modificación que usa k-means. Además, el algoritmo propuesto ha sido extendido para usar un array de micrófonos de cualquier tamaño y geometría. A continuación se ha abordado el problema de la enumeración de fuentes de voz, que esta relacionado con el problema de separación de fuentes. Se ha propuesto un novedoso algoritmo basado en un criterio de teoría de la información y en la estimación de los retardos relativos causados por las fuentes entre un par de micrófonos. El algoritmo ha obtenido excelente resultados y muestra robustez en la enumeración de mezclas no reverberantes de hasta 5 fuentes de voz. Además se demuestra la potencia del algoritmo para la enumeración de fuentes en mezclas reverberantes. El resto de la tesis esta centrada en audífonos digitales. El primer problema tratado es el de la mejora de la inteligibilidad de la voz en audífonos monoaurales. En primer lugar, se realiza un estudio de los recursos computacionales disponibles en audífonos digitales de ultima generación. Los resultados de este estudio se han utilizado para limitar el coste computacional de los algoritmos de mejora de la calidad de la voz para audífonos propuestos en esta tesis. Para resolver este primer problema se propone un algoritmo mono-canal de mejora de la calidad de la voz de bajo coste computacional. El objetivo es la estimación de una mascara tiempo-frecuencia continua para obtener el mayor parámetro PESQ de salida. El algoritmo combina una versión generalizada del estimador de mínimos cuadrados con un algoritmo de selección de características a medida, utilizando un novedoso conjunto de características. El algoritmo ha obtenido resultados excelentes incluso con baja relación señal a ruido. El siguiente problema abordado es el diseño de algoritmos de mejora de la calidad de la voz para audífonos binaurales comunicados de forma inalámbrica. Estos sistemas tienen un problema adicional, y es que la conexión inalámbrica aumenta el consumo de potencia. El objetivo en esta tesis es diseñar algoritmos de mejora de la calidad de la voz de bajo coste computacional que incrementen la eficiencia energética en audífonos binaurales comunicados de forma inalámbrica. Se han propuesto dos soluciones. La primera es un algoritmo de extremado bajo coste computacional que maximiza el parámetro WDO y esta basado en la estimación de una mascara binaria mediante un discriminante cuadrático que utiliza los valores ILD e ITD de cada punto tiempo-frecuencia para clasificarlo entre voz o ruido. El segundo algoritmo propuesto, también de bajo coste, utiliza además la información de puntos tiempo-frecuencia vecinos para estimar la IBM mediante una versión generalizada del LS-LDA. Además, se propone utilizar un MSE ponderado para estimar la IBM y maximizar el parámetro WDO al mismo tiempo. En ambos algoritmos se propone un esquema de transmisión eficiente energéticamente, que se basa en cuantificar los valores de amplitud y fase de cada banda de frecuencia con un numero distinto de bits. La distribución de bits entre frecuencias se optimiza mediante técnicas de computación evolutivas. El ultimo trabajo incluido en esta tesis trata del diseño de filtros espaciales para audífonos personalizados a una persona determinada. Los coeficientes del filtro pueden adaptarse a una persona siempre que se conozca su HRTF. Desafortunadamente, esta información no esta disponible cuando un paciente visita el audiólogo, lo que causa perdidas de ganancia y distorsiones. Con este problema en mente, se han propuesto tres métodos para diseñar filtros espaciales que maximicen la ganancia y minimicen las distorsiones medias para un conjunto de HRTFs de diseño

    Towards low-cost gigabit wireless systems at 60 GHz

    Get PDF
    The world-wide availability of the huge amount of license-free spectral space in the 60 GHz band provides wide room for gigabit-per-second (Gb/s) wireless applications. A commercial (read: low-cost) 60-GHz transceiver will, however, provide limited system performance due to the stringent link budget and the substantial RF imperfections. The work presented in this thesis is intended to support the design of low-cost 60-GHz transceivers for Gb/s transmission over short distances (a few meters). Typical applications are the transfer of high-definition streaming video and high-speed download. The presented work comprises research into the characteristics of typical 60-GHz channels, the evaluation of the transmission quality as well as the development of suitable baseband algorithms. This can be summarized as follows. In the first part, the characteristics of the wave propagation at 60 GHz are charted out by means of channel measurements and ray-tracing simulations for both narrow-beam and omni-directional configurations. Both line-of-sight (LOS) and non-line-of-sight (NLOS) are considered. This study reveals that antennas that produce a narrow beam can be used to boost the received power by tens of dBs when compared with omnidirectional configurations. Meanwhile, the time-domain dispersion of the channel is reduced to the order of nanoseconds, which facilitates Gb/s data transmission over 60-GHz channels considerably. Besides the execution of measurements and simulations, the influence of antenna radiation patterns is analyzed theoretically. It is indicated to what extent the signal-to-noise ratio, Rician-K factor and channel dispersion are improved by application of narrow-beam antennas and to what extent these parameters will be influenced by beam pointing errors. From both experimental and analytical work it can be concluded that the problem of the stringent link-budget can be solved effectively by application of beam-steering techniques. The second part treats wideband transmission methods and relevant baseband algorithms. The considered schemes include orthogonal frequency division multiplexing (OFDM), multi-carrier code division multiple access (MC-CDMA) and single carrier with frequency-domain equalization (SC-FDE), which are promising candidates for Gb/s wireless transmission. In particular, the optimal linear equalization in the frei quency domain and associated implementation issues such as synchronization and channel estimation are examined. Bit error rate (BER) expressions are derived to evaluate the transmission performance. Besides the linear equalization techniques, a low-complexity inter-symbol interference cancellation technique is proposed to achieve much better performance of code-spreading systems such as MC-CDMA and SC-FDE. Both theoretical analysis and simulations demonstrate that the proposed scheme offers great advantages as regards both complexity and performance. This makes it particularly suitable for 60-GHz applications in multipath environments. The third part treats the influence of quantization and RF imperfections on the considered transmission methods in the context of 60-GHz radios. First, expressions for the BER are derived and the influence of nonlinear distortions caused by the digital-to-analog converters, analog-to-digital converters and power amplifiers on the BER performance is examined. Next, the BER performance under the influence of phase noise and IQ imbalance is evaluated for the case that digital compensation techniques are applied in the receiver as well as for the case that such techniques are not applied. Finally, a baseline design of a low-cost Gb/s 60-GHz transceiver is presented. It is shown that, by application of beam-steering in combination with SC-FDE without advanced channel coding, a data rate in the order of 2 Gb/s can be achieved over a distance of 10 meters in a typical NLOS indoor scenario
    corecore