40 research outputs found

    Sparse Modeling of Grouped Line Spectra

    Get PDF
    This licentiate thesis focuses on clustered parametric models for estimation of line spectra, when the spectral content of a signal source is assumed to exhibit some form of grouping. Different from previous parametric approaches, which generally require explicit knowledge of the model orders, this thesis exploits sparse modeling, where the orders are implicitly chosen. For line spectra, the non-linear parametric model is approximated by a linear system, containing an overcomplete basis of candidate frequencies, called a dictionary, and a large set of linear response variables that selects and weights the components in the dictionary. Frequency estimates are obtained by solving a convex optimization program, where the sum of squared residuals is minimized. To discourage overfitting and to infer certain structure in the solution, different convex penalty functions are introduced into the optimization. The cost trade-off between fit and penalty is set by some user parameters, as to approximate the true number of spectral lines in the signal, which implies that the response variable will be sparse, i.e., have few non-zero elements. Thus, instead of explicit model orders, the orders are implicitly set by this trade-off. For grouped variables, the dictionary is customized, and appropriate convex penalties selected, so that the solution becomes group sparse, i.e., has few groups with non-zero variables. In an array of sensors, the specific time-delays and attenuations will depend on the source and sensor positions. By modeling this, one may estimate the location of a source. In this thesis, a novel joint location and grouped frequency estimator is proposed, which exploits sparse modeling for both spectral and spatial estimates, showing robustness against sources with overlapping frequency content. For audio signals, this thesis uses two different features for clustering. Pitch is a perceptual property of sound that may be described by the harmonic model, i.e., by a group of spectral lines at integer multiples of a fundamental frequency, which we estimate by exploiting a novel adaptive total variation penalty. The other feature, chroma, is a concept in musical theory, collecting pitches at powers of 2 from each other into groups. Using a chroma dictionary, together with appropriate group sparse penalties, we propose an automatic transcription of the chroma content of a signal

    Ambient acoustics as indicator of environmental change in the Beaufort Sea: experiments & methods for analysis

    Get PDF
    Submitted in partial fulfillment of the requirements for the degree of Doctor of Philosophy at the Massachusetts Institute of Technology and the Woods Hole Oceanographic Institution June 2021.The Arctic Ocean is a vital component of Earth’s climate system experiencing dramatic environmental changes. The changes are reflected in its underwater ambient soundscape as its origin and propagation are primarily dependent on properties of the ice cover and water column. The first component of this work examines the effect on ambient noise characteristics due to changes to the Beaufort Sea sound speed profile (SSP) and ice cover. Specifically, the emergence of a warm water intrusion near 70 m depth has altered the historical Arctic SSP while the ice cover has become thinner and younger due to the rise in average global temperature. Hypothesized shifts to the ambient soundscape and surface noise generation due to these changes are verified by comparing the measured noise data during two experiments to modeled results. These changes include a broadside notch in noise vertical directionality as well as a shift from uniform surface noise generation to discrete generation at specific ranges. Motivated by our data analyses, the second component presents several tools to facilitate ambient noise characterization and generation monitoring. One is a convolutional neural network (CNN) approach to noise range estimation. Its robustness to SSP and bottom depth mismatch is compared with conventional matched field processing. We further explore how the CNN approach achieves its performance by examining its intermediate outputs. Another tool is a frequency domain, transient event detection algorithm that leverages image processing and hierarchical clustering to identify and categorize noise transients in data spectrograms. The spectral content retained by this method enables insight into the generation mechanism of the detected events by the ice cover. Lastly, we present the deployment of a seismo-acoustic system to localize transient events. Two forward approaches that utilize time-difference-ofarrival are described and compared with a more conventional, inverse technique. The examination of this system’s performance prompts recommendations for future deployments. With our ambient noise analysis and algorithm development, we hope these contributions provide a stronger foundation for continued study of the Arctic ambient soundscape as the region continues to grow in significance.Office of Naval Research (ONR) via the University of California - San Diego (UCSD) under award number N00014-16-1-2129. Defense Advanced Research Projects Agency (DARPA) via Applied Physical Sciences Corp. (APS) under award number HR0011-18-C-0008. Office of Naval Research (ONR) under award number N00014-17-1-2474. Office of Naval Research (ONR) under award number N00014-19-1-2741. National Science Foundation (NSF) under grant number 2389237

    Object-based Modeling of Audio for Coding and Source Separation

    Get PDF
    This thesis studies several data decomposition algorithms for obtaining an object-based representation of an audio signal. The estimation of the representation parameters are coupled with audio-specific criteria, such as the spectral redundancy, sparsity, perceptual relevance and spatial position of sounds. The objective is to obtain an audio signal representation that is composed of meaningful entities called audio objects that reflect the properties of real-world sound objects and events. The estimation of the object-based model is based on magnitude spectrogram redundancy using non-negative matrix factorization with extensions to multichannel and complex-valued data. The benefits of working with object-based audio representations over the conventional time-frequency bin-wise processing are studied. The two main applications of the object-based audio representations proposed in this thesis are spatial audio coding and sound source separation from multichannel microphone array recordings. In the proposed spatial audio coding algorithm, the audio objects are estimated from the multichannel magnitude spectrogram. The audio objects are used for recovering the content of each original channel from a single downmixed signal, using time-frequency filtering. The perceptual relevance of modeling the audio signal is considered in the estimation of the parameters of the object-based model, and the sparsity of the model is utilized in encoding its parameters. Additionally, a quantization of the model parameters is proposed that reflects the perceptual relevance of each quantized element. The proposed object-based spatial audio coding algorithm is evaluated via listening tests and comparing the overall perceptual quality to conventional time-frequency block-wise methods at the same bitrates. The proposed approach is found to produce comparable coding efficiency while providing additional functionality via the object-based coding domain representation, such as the blind separation of the mixture of sound sources in the encoded channels. For the sound source separation from multichannel audio recorded by a microphone array, a method combining an object-based magnitude model and spatial covariance matrix estimation is considered. A direction of arrival-based model for the spatial covariance matrices of the sound sources is proposed. Unlike the conventional approaches, the estimation of the parameters of the proposed spatial covariance matrix model ensures a spatially coherent solution for the spatial parameterization of the sound sources. The separation quality is measured with objective criteria and the proposed method is shown to improve over the state-of-the-art sound source separation methods, with recordings done using a small microphone array

    Acoustic Source Localisation in constrained environments

    Get PDF
    Acoustic Source Localisation (ASL) is a problem with real-world applications across multiple domains, from smart assistants to acoustic detection and tracking. And yet, despite the level of attention in recent years, a technique for rapid and robust ASL remains elusive – not least in the constrained environments in which such techniques are most likely to be deployed. In this work, we seek to address some of these current limitations by presenting improvements to the ASL method for three commonly encountered constraints: the number and configuration of sensors; the limited signal sampling potentially available; and the nature and volume of training data required to accurately estimate Direction of Arrival (DOA) when deploying a particular supervised machine learning technique. In regard to the number and configuration of sensors, we find that accuracy can be maintained at state-of-the-art levels, Steered Response Power (SRP), while reducing computation sixfold, based on direct optimisation of well known ASL formulations. Moreover, we find that the circular microphone configuration is the least desirable as it yields the highest localisation error. In regard to signal sampling, we demonstrate that the computer vision inspired algorithm presented in this work, which extracts selected keypoints from the signal spectrogram, and uses them to select signal samples, outperforms an audio fingerprinting baseline while maintaining a compression ratio of 40:1. In regard to the training data employed in machine learning ASL techniques, we show that the use of music training data yields an improvement of 19% against a noise data baseline while maintaining accuracy using only 25% of the training data, while training with speech as opposed to noise improves DOA estimation by an average of 17%, outperforming the Generalised Cross-Correlation technique by 125% in scenarios in which the test and training acoustic environments are matched.Heriot-Watt University James Watt Scholarship (JSW) in the School of Engineering & Physical Sciences

    Measurement-Based Automatic Parameterization of a Virtual Acoustic Room Model

    Get PDF
    Modernien auralisaatiotekniikoiden ansiosta kuulokkeilla voidaan tuottaa kuuntelukokemus, joka muistuttaa useimpien äänitteiden tuotannossa oletettua kaiutinkuuntelua. Huoneakustinen mallinnus on tärkeä osa toimivaa auralisaatiojärjestelmää. Huonemallinnuksen parametrien määrittäminen vaatii kuitenkin ammattitaitoa ja aikaa. Tässä työssä kehitetään järjestelmä parametrien automaattiseksi määrittämiseksi huoneakustisten mittausten perusteella. Parametrisaatio perustuu mikrofoniryhmällä mitattuihin huoneen impulssivasteisiin ja voidaan jakaa kahteen osaan: suoran äänen ja aikaisten heijastusten analyysiin sekä jälkikaiunnan analyysiin. Suorat äänet erotellaan impulssivasteista erilaisia signaalinkäsittelytekniikoita käyttäen ja niitä hyödynnetään heijastuksia etsivässä algoritmissa. Äänilähteet ja heijastuksia vastaavat kuvalähteet paikannetaan saapumisaikaeroon perustuvalla paikannusmenetelmällä ja taajuusriippuvat etenemistien vaikutukset arvioidaan kuvalähdemallissa käyttöä varten. Auralisaation jälkikaiunta on toteutettu takaisinkytkevällä viiveverkostomallilla. Sen parametrisointi vaatii taajuusriippuvan jälkikaiunta-ajan ja jälkikaiunnan taajuusvasteen määrittämistä. Normalisoitua kaikutiheyttä käytetään jälkikaiunnan alkamisajan löytämiseen mittauksista ja simuloidun jälkikaiunnan alkamisajan asettamiseen. Jälkikaiunta-aikojen määrittämisessä hyödynnetään energy decay relief -metodia. Kuuntelukokeiden perusteella automaattinen parametrisaatiojärjestelmä tuottaa parempia tuloksia kuin parametrien asettaminen manuaalisesti huoneen summittaisten geometriatietojen pohjalta. Järjestelmässä on ongelmia erityisesti jälkikaiunnan ekvalisoinnissa, mutta käytettyihin suhteellisen yksinkertaisiin tekniikoihin nähden järjestelmä toimii hyvin.Modern auralization techniques enable making the headphone listening experience similar to the experience of listening with loudspeakers, which is the reproduction method most content is made to be listened with. Room acoustic modeling is an essential part of a plausible auralization system. Specifying the parameters for room modeling requires expertise and time. In this thesis, a system is developed for automatic analysis of the parameters from room acoustic measurements. The parameterization is based on room impulse responses measured with a microphone array and can be divided into two parts: the analysis of the direct sound and early reflections, and the analysis of the late reverberation. The direct sounds are separated from the impulse responses using various signal processing techniques and used in the matching pursuit algorithm to find the reflections in the impulse responses. The sound sources and their reflection images are localized using time difference of arrival -based localization and frequency-dependent propagation path effects are estimated for use in an image source model. The late reverberation of the auralization is implemented using a feedback delay network. Its parameterization requires the analysis of the frequency-dependent reverberation time and frequency response of the late reverberation. Normalized echo density is used to determine the beginning of the late reverberation in the measurements and to set the starting point of the modeled late field. The reverberation times are analyzed using the energy decay relief. A formal listening test shows that the automatic parameterization system outperforms parameters set manually based on approximate geometrical data. Problems remain especially in the precision of the late reverberation equalization but the system works well considering the relative simplicity of the processing methods used

    Inferring Room Geometries

    No full text
    Determining the geometry of an acoustic enclosure using microphone arrays has become an active area of research. Knowledge gained about the acoustic environment, such as the location of reflectors, can be advantageous for applications such as sound source localization, dereverberation and adaptive echo cancellation by assisting in tracking environment changes and helping the initialization of such algorithms. A methodology to blindly infer the geometry of an acoustic enclosure by estimating the location of reflective surfaces based on acoustic measurements using an arbitrary array geometry is developed and analyzed. The starting point of this work considers a geometric constraint, valid both in two and three-dimensions, that converts time-of-arrival and time-difference-pf-arrival information into elliptical constraints about the location of reflectors. Multiple constraints are combined to yield the line or plane parameters of the reflectors by minimizing a specific cost function in the least-squares sense. An iterative constrained least-squares estimator, along with a closed-form estimator, that performs optimally in a noise-free scenario, solve the associated common tangent estimation problem that arises from the geometric constraint. Additionally, a Hough transform based data fusion and estimation technique, that considers acquisitions from multiple source positions, refines the reflector localization even in adverse conditions. An extension to the geometric inference framework, that includes the estimation of the actual speed of sound to improve the accuracy under temperature variations, is presented that also reduces the required prior information needed such that only relative microphone positions in the array are required for the localization of acoustic reflectors. Simulated and real-world experiments demonstrate the feasibility of the proposed method.Open Acces

    Interference Mitigation and Localization Based on Time-Frequency Analysis for Navigation Satellite Systems

    Get PDF
    Interference Mitigation and Localization Based on Time-Frequency Analysis for Navigation Satellite SystemsNowadays, the operation of global navigation satellite systems (GNSS) is imperative across a multitude of applications worldwide. The increasing reliance on accurate positioning and timing information has made more serious than ever the consequences of possible service outages in the satellite navigation systems. Among others, interference is regarded as the primary threat to their operation. Due the recent proliferation of portable interferers, notably jammers, it has now become common for GNSS receivers to endure simultaneous attacks from multiple sources of interference, which are likely spatially distributed and transmit different modulations. To the best knowledge of the author, the present dissertation is the first publication to investigate the use of the S-transform (ST) to devise countermeasures to interference. The original contributions in this context are mainly: • the formulation of a complexity-scalable ST implementable in real time as a bank of filters; • a method for characterizing and localizing multiple in-car jammers through interference snapshots that are collected by separate receivers and analysed with a clever use of the ST; • a preliminary assessment of novel methods for mitigating generic interference at the receiver end by means the ST and more computationally efficient variants of the transform. Besides GNSSs, the countermeasures to interference proposed are equivalently applicable to protect any direct-sequence spread spectrum (DS-SS) communication

    Measurement-Based Automatic Parameterization of a Virtual Acoustic Room Model

    Get PDF
    Modernien auralisaatiotekniikoiden ansiosta kuulokkeilla voidaan tuottaa kuuntelukokemus, joka muistuttaa useimpien äänitteiden tuotannossa oletettua kaiutinkuuntelua. Huoneakustinen mallinnus on tärkeä osa toimivaa auralisaatiojärjestelmää. Huonemallinnuksen parametrien määrittäminen vaatii kuitenkin ammattitaitoa ja aikaa. Tässä työssä kehitetään järjestelmä parametrien automaattiseksi määrittämiseksi huoneakustisten mittausten perusteella. Parametrisaatio perustuu mikrofoniryhmällä mitattuihin huoneen impulssivasteisiin ja voidaan jakaa kahteen osaan: suoran äänen ja aikaisten heijastusten analyysiin sekä jälkikaiunnan analyysiin. Suorat äänet erotellaan impulssivasteista erilaisia signaalinkäsittelytekniikoita käyttäen ja niitä hyödynnetään heijastuksia etsivässä algoritmissa. Äänilähteet ja heijastuksia vastaavat kuvalähteet paikannetaan saapumisaikaeroon perustuvalla paikannusmenetelmällä ja taajuusriippuvat etenemistien vaikutukset arvioidaan kuvalähdemallissa käyttöä varten. Auralisaation jälkikaiunta on toteutettu takaisinkytkevällä viiveverkostomallilla. Sen parametrisointi vaatii taajuusriippuvan jälkikaiunta-ajan ja jälkikaiunnan taajuusvasteen määrittämistä. Normalisoitua kaikutiheyttä käytetään jälkikaiunnan alkamisajan löytämiseen mittauksista ja simuloidun jälkikaiunnan alkamisajan asettamiseen. Jälkikaiunta-aikojen määrittämisessä hyödynnetään energy decay relief -metodia. Kuuntelukokeiden perusteella automaattinen parametrisaatiojärjestelmä tuottaa parempia tuloksia kuin parametrien asettaminen manuaalisesti huoneen summittaisten geometriatietojen pohjalta. Järjestelmässä on ongelmia erityisesti jälkikaiunnan ekvalisoinnissa, mutta käytettyihin suhteellisen yksinkertaisiin tekniikoihin nähden järjestelmä toimii hyvin.Modern auralization techniques enable making the headphone listening experience similar to the experience of listening with loudspeakers, which is the reproduction method most content is made to be listened with. Room acoustic modeling is an essential part of a plausible auralization system. Specifying the parameters for room modeling requires expertise and time. In this thesis, a system is developed for automatic analysis of the parameters from room acoustic measurements. The parameterization is based on room impulse responses measured with a microphone array and can be divided into two parts: the analysis of the direct sound and early reflections, and the analysis of the late reverberation. The direct sounds are separated from the impulse responses using various signal processing techniques and used in the matching pursuit algorithm to find the reflections in the impulse responses. The sound sources and their reflection images are localized using time difference of arrival -based localization and frequency-dependent propagation path effects are estimated for use in an image source model. The late reverberation of the auralization is implemented using a feedback delay network. Its parameterization requires the analysis of the frequency-dependent reverberation time and frequency response of the late reverberation. Normalized echo density is used to determine the beginning of the late reverberation in the measurements and to set the starting point of the modeled late field. The reverberation times are analyzed using the energy decay relief. A formal listening test shows that the automatic parameterization system outperforms parameters set manually based on approximate geometrical data. Problems remain especially in the precision of the late reverberation equalization but the system works well considering the relative simplicity of the processing methods used

    Robust speaker diarization for meetings

    Get PDF
    Aquesta tesi doctoral mostra la recerca feta en l'àrea de la diarització de locutor per a sales de reunions. En la present s'estudien els algorismes i la implementació d'un sistema en diferit de segmentació i aglomerat de locutor per a grabacions de reunions a on normalment es té accés a més d'un micròfon per al processat. El bloc més important de recerca s'ha fet durant una estada al International Computer Science Institute (ICSI, Berkeley, Caligornia) per un període de dos anys.La diarització de locutor s'ha estudiat força per al domini de grabacions de ràdio i televisió. La majoria dels sistemes proposats utilitzen algun tipus d'aglomerat jeràrquic de les dades en grups acústics a on de bon principi no se sap el número de locutors òptim ni tampoc la seva identitat. Un mètode molt comunment utilitzat s'anomena "bottom-up clustering" (aglomerat de baix-a-dalt), amb el qual inicialment es defineixen molts grups acústics de dades que es van ajuntant de manera iterativa fins a obtenir el nombre òptim de grups tot i acomplint un criteri de parada. Tots aquests sistemes es basen en l'anàlisi d'un canal d'entrada individual, el qual no permet la seva aplicació directa per a reunions. A més a més, molts d'aquests algorisms necessiten entrenar models o afinar els parameters del sistema usant dades externes, el qual dificulta l'aplicabilitat d'aquests sistemes per a dades diferents de les usades per a l'adaptació.La implementació proposada en aquesta tesi es dirigeix a solventar els problemes mencionats anteriorment. Aquesta pren com a punt de partida el sistema existent al ICSI de diarització de locutor basat en l'aglomerat de "baix-a-dalt". Primer es processen els canals de grabació disponibles per a obtindre un sol canal d'audio de qualitat major, a més dínformació sobre la posició dels locutors existents. Aleshores s'implementa un sistema de detecció de veu/silenci que no requereix de cap entrenament previ, i processa els segments de veu resultant amb una versió millorada del sistema mono-canal de diarització de locutor. Aquest sistema ha estat modificat per a l'ús de l'informació de posició dels locutors (quan es tingui) i s'han adaptat i creat nous algorismes per a que el sistema obtingui tanta informació com sigui possible directament del senyal acustic, fent-lo menys depenent de les dades de desenvolupament. El sistema resultant és flexible i es pot usar en qualsevol tipus de sala de reunions pel que fa al nombre de micròfons o la seva posició. El sistema, a més, no requereix en absolute dades d´entrenament, sent més senzill adaptar-lo a diferents tipus de dades o dominis d'aplicació. Finalment, fa un pas endavant en l'ús de parametres que siguin mes robusts als canvis en les dades acústiques. Dos versions del sistema es van presentar amb resultats excel.lents a les evaluacions de RT05s i RT06s del NIST en transcripció rica per a reunions, a on aquests es van avaluar amb dades de dos subdominis diferents (conferencies i reunions). A més a més, es fan experiments utilitzant totes les dades disponibles de les evaluacions RT per a demostrar la viabilitat dels algorisms proposats en aquesta tasca.This thesis shows research performed into the topic of speaker diarization for meeting rooms. It looks into the algorithms and the implementation of an offline speaker segmentation and clustering system for a meeting recording where usually more than one microphone is available. The main research and system implementation has been done while visiting the International Computes Science Institute (ICSI, Berkeley, California) for a period of two years. Speaker diarization is a well studied topic on the domain of broadcast news recordings. Most of the proposed systems involve some sort of hierarchical clustering of the data into clusters, where the optimum number of speakers of their identities are unknown a priory. A very commonly used method is called bottom-up clustering, where multiple initial clusters are iteratively merged until the optimum number of clusters is reached, according to some stopping criterion. Such systems are based on a single channel input, not allowing a direct application for the meetings domain. Although some efforts have been done to adapt such systems to multichannel data, at the start of this thesis no effective implementation had been proposed. Furthermore, many of these speaker diarization algorithms involve some sort of models training or parameter tuning using external data, which impedes its usability with data different from what they have been adapted to.The implementation proposed in this thesis works towards solving the aforementioned problems. Taking the existing hierarchical bottom-up mono-channel speaker diarization system from ICSI, it first uses a flexible acoustic beamforming to extract speaker location information and obtain a single enhanced signal from all available microphones. It then implements a train-free speech/non-speech detection on such signal and processes the resulting speech segments with an improved version of the mono-channel speaker diarization system. Such system has been modified to use speaker location information (then available) and several algorithms have been adapted or created new to adapt the system behavior to each particular recording by obtaining information directly from the acoustics, making it less dependent on the development data.The resulting system is flexible to any meetings room layout regarding the number of microphones and their placement. It is train-free making it easy to adapt to different sorts of data and domains of application. Finally, it takes a step forward into the use of parameters that are more robust to changes in the acoustic data. Two versions of the system were submitted with excellent results in RT05s and RT06s NIST Rich Transcription evaluations for meetings, where data from two different subdomains (lectures and conferences) was evaluated. Also, experiments using the RT datasets from all meetings evaluations were used to test the different proposed algorithms proving their suitability to the task.Postprint (published version

    Speech enhancement algorithms for audiological applications

    Get PDF
    Texto en inglés y resumen en inglés y españolPremio Extraordinario de Doctorado de la UAH en el año académico 2013-2014La mejora de la calidad de la voz es un problema que, aunque ha sido abordado durante muchos años, aún sigue abierto. El creciente auge de aplicaciones tales como los sistemas manos libres o de reconocimiento de voz automático y las cada vez mayores exigencias de las personas con pérdidas auditivas han dado un impulso definitivo a este área de investigación. Esta tesis doctoral se centra en la mejora de la calidad de la voz en aplicaciones audiológicas. La mayoría del trabajo de investigación desarrollado en esta tesis está dirigido a la mejora de la inteligibilidad de la voz en audífonos digitales, teniendo en cuenta las limitaciones de este tipo de dispositivos. La combinación de técnicas de separación de fuentes y filtrado espacial con técnicas de aprendizaje automático y computación evolutiva ha originado novedosos e interesantes algoritmos que son incluidos en esta tesis. La tesis esta dividida en dos grandes bloques. El primer bloque contiene un estudio preliminar del problema y una exhaustiva revisión del estudio del arte sobre algoritmos de mejora de la calidad de la voz, que sirve para definir los objetivos de esta tesis. El segundo bloque contiene la descripción del trabajo de investigación realizado para cumplir los objetivos de la tesis, así como los experimentos y resultados obtenidos. En primer lugar, el problema de mejora de la calidad de la voz es descrito formalmente en el dominio tiempo-frecuencia. Los principales requerimientos y restricciones de los audífonos digitales son definidas. Tras describir el problema, una amplia revisión del estudio del arte ha sido elaborada. La revisión incluye algoritmos de mejora de la calidad de la voz mono-canal y multi-canal, considerando técnicas de reducción de ruido y técnicas de separación de fuentes. Además, la aplicación de estos algoritmos en audífonos digitales es evaluada. El primer problema abordado en la tesis es la separación de fuentes sonoras en mezclas infra-determinadas en el dominio tiempo-frecuencia, sin considerar ningún tipo de restricción computacional. El rendimiento del famoso algoritmo DUET, que consigue separar fuentes de voz con solo dos mezclas, ha sido evaluado en diversos escenarios, incluyendo mezclas lineales y binaurales no reverberantes, mezclas reverberantes, y mezclas de voz con otro tipo de fuentes tales como ruido y música. El estudio revela la falta de robustez del algoritmo DUET, cuyo rendimiento se ve seriamente disminuido en mezclas reverberantes, mezclas binaurales, y mezclas de voz con música y ruido. Con el objetivo de mejorar el rendimiento en estos casos, se presenta un novedoso algoritmo de separación de fuentes que combina la técnica de clustering mean shift con la base del algoritmo DUET. La etapa de clustering del algoritmo DUET, que esta basada en un histograma ponderado, es reemplazada por una modificación del algoritmo mean shift, introduciendo el uso de un kernel Gaussiano ponderado. El análisis de los resultados obtenidos muestran una clara mejora obtenida por el algoritmo propuesto en relación con el algoritmo DUET original y una modificación que usa k-means. Además, el algoritmo propuesto ha sido extendido para usar un array de micrófonos de cualquier tamaño y geometría. A continuación se ha abordado el problema de la enumeración de fuentes de voz, que esta relacionado con el problema de separación de fuentes. Se ha propuesto un novedoso algoritmo basado en un criterio de teoría de la información y en la estimación de los retardos relativos causados por las fuentes entre un par de micrófonos. El algoritmo ha obtenido excelente resultados y muestra robustez en la enumeración de mezclas no reverberantes de hasta 5 fuentes de voz. Además se demuestra la potencia del algoritmo para la enumeración de fuentes en mezclas reverberantes. El resto de la tesis esta centrada en audífonos digitales. El primer problema tratado es el de la mejora de la inteligibilidad de la voz en audífonos monoaurales. En primer lugar, se realiza un estudio de los recursos computacionales disponibles en audífonos digitales de ultima generación. Los resultados de este estudio se han utilizado para limitar el coste computacional de los algoritmos de mejora de la calidad de la voz para audífonos propuestos en esta tesis. Para resolver este primer problema se propone un algoritmo mono-canal de mejora de la calidad de la voz de bajo coste computacional. El objetivo es la estimación de una mascara tiempo-frecuencia continua para obtener el mayor parámetro PESQ de salida. El algoritmo combina una versión generalizada del estimador de mínimos cuadrados con un algoritmo de selección de características a medida, utilizando un novedoso conjunto de características. El algoritmo ha obtenido resultados excelentes incluso con baja relación señal a ruido. El siguiente problema abordado es el diseño de algoritmos de mejora de la calidad de la voz para audífonos binaurales comunicados de forma inalámbrica. Estos sistemas tienen un problema adicional, y es que la conexión inalámbrica aumenta el consumo de potencia. El objetivo en esta tesis es diseñar algoritmos de mejora de la calidad de la voz de bajo coste computacional que incrementen la eficiencia energética en audífonos binaurales comunicados de forma inalámbrica. Se han propuesto dos soluciones. La primera es un algoritmo de extremado bajo coste computacional que maximiza el parámetro WDO y esta basado en la estimación de una mascara binaria mediante un discriminante cuadrático que utiliza los valores ILD e ITD de cada punto tiempo-frecuencia para clasificarlo entre voz o ruido. El segundo algoritmo propuesto, también de bajo coste, utiliza además la información de puntos tiempo-frecuencia vecinos para estimar la IBM mediante una versión generalizada del LS-LDA. Además, se propone utilizar un MSE ponderado para estimar la IBM y maximizar el parámetro WDO al mismo tiempo. En ambos algoritmos se propone un esquema de transmisión eficiente energéticamente, que se basa en cuantificar los valores de amplitud y fase de cada banda de frecuencia con un numero distinto de bits. La distribución de bits entre frecuencias se optimiza mediante técnicas de computación evolutivas. El ultimo trabajo incluido en esta tesis trata del diseño de filtros espaciales para audífonos personalizados a una persona determinada. Los coeficientes del filtro pueden adaptarse a una persona siempre que se conozca su HRTF. Desafortunadamente, esta información no esta disponible cuando un paciente visita el audiólogo, lo que causa perdidas de ganancia y distorsiones. Con este problema en mente, se han propuesto tres métodos para diseñar filtros espaciales que maximicen la ganancia y minimicen las distorsiones medias para un conjunto de HRTFs de diseño
    corecore