522 research outputs found

    Speech Enhancement Exploiting the Source-Filter Model

    Get PDF
    Imagining everyday life without mobile telephony is nowadays hardly possible. Calls are being made in every thinkable situation and environment. Hence, the microphone will not only pick up the user’s speech but also sound from the surroundings which is likely to impede the understanding of the conversational partner. Modern speech enhancement systems are able to mitigate such effects and most users are not even aware of their existence. In this thesis the development of a modern single-channel speech enhancement approach is presented, which uses the divide and conquer principle to combat environmental noise in microphone signals. Though initially motivated by mobile telephony applications, this approach can be applied whenever speech is to be retrieved from a corrupted signal. The approach uses the so-called source-filter model to divide the problem into two subproblems which are then subsequently conquered by enhancing the source (the excitation signal) and the filter (the spectral envelope) separately. Both enhanced signals are then used to denoise the corrupted signal. The estimation of spectral envelopes has quite some history and some approaches already exist for speech enhancement. However, they typically neglect the excitation signal which leads to the inability of enhancing the fine structure properly. Both individual enhancement approaches exploit benefits of the cepstral domain which offers, e.g., advantageous mathematical properties and straightforward synthesis of excitation-like signals. We investigate traditional model-based schemes like Gaussian mixture models (GMMs), classical signal processing-based, as well as modern deep neural network (DNN)-based approaches in this thesis. The enhanced signals are not used directly to enhance the corrupted signal (e.g., to synthesize a clean speech signal) but as so-called a priori signal-to-noise ratio (SNR) estimate in a traditional statistical speech enhancement system. Such a traditional system consists of a noise power estimator, an a priori SNR estimator, and a spectral weighting rule that is usually driven by the results of the aforementioned estimators and subsequently employed to retrieve the clean speech estimate from the noisy observation. As a result the new approach obtains significantly higher noise attenuation compared to current state-of-the-art systems while maintaining a quite comparable speech component quality and speech intelligibility. In consequence, the overall quality of the enhanced speech signal turns out to be superior as compared to state-of-the-art speech ehnahcement approaches.Mobiltelefonie ist aus dem heutigen Leben nicht mehr wegzudenken. Telefonate werden in beliebigen Situationen an beliebigen Orten geführt und dabei nimmt das Mikrofon nicht nur die Sprache des Nutzers auf, sondern auch die Umgebungsgeräusche, welche das Verständnis des Gesprächspartners stark beeinflussen können. Moderne Systeme können durch Sprachverbesserungsalgorithmen solchen Effekten entgegenwirken, dabei ist vielen Nutzern nicht einmal bewusst, dass diese Algorithmen existieren. In dieser Arbeit wird die Entwicklung eines einkanaligen Sprachverbesserungssystems vorgestellt. Der Ansatz setzt auf das Teile-und-herrsche-Verfahren, um störende Umgebungsgeräusche aus Mikrofonsignalen herauszufiltern. Dieses Verfahren kann für sämtliche Fälle angewendet werden, in denen Sprache aus verrauschten Signalen extrahiert werden soll. Der Ansatz nutzt das Quelle-Filter-Modell, um das ursprüngliche Problem in zwei Unterprobleme aufzuteilen, die anschließend gelöst werden, indem die Quelle (das Anregungssignal) und das Filter (die spektrale Einhüllende) separat verbessert werden. Die verbesserten Signale werden gemeinsam genutzt, um das gestörte Mikrofonsignal zu entrauschen. Die Schätzung von spektralen Einhüllenden wurde bereits in der Vergangenheit erforscht und zum Teil auch für die Sprachverbesserung angewandt. Typischerweise wird dabei jedoch das Anregungssignal vernachlässigt, so dass die spektrale Feinstruktur des Mikrofonsignals nicht verbessert werden kann. Beide Ansätze nutzen jeweils die Eigenschaften der cepstralen Domäne, die unter anderem vorteilhafte mathematische Eigenschaften mit sich bringen, sowie die Möglichkeit, Prototypen eines Anregungssignals zu erzeugen. Wir untersuchen modellbasierte Ansätze, wie z.B. Gaußsche Mischmodelle, klassische signalverarbeitungsbasierte Lösungen und auch moderne tiefe neuronale Netzwerke in dieser Arbeit. Die so verbesserten Signale werden nicht direkt zur Sprachsignalverbesserung genutzt (z.B. Sprachsynthese), sondern als sogenannter A-priori-Signal-zu-Rauschleistungs-Schätzwert in einem traditionellen statistischen Sprachverbesserungssystem. Dieses besteht aus einem Störleistungs-Schätzer, einem A-priori-Signal-zu-Rauschleistungs-Schätzer und einer spektralen Gewichtungsregel, die üblicherweise mit Hilfe der Ergebnisse der beiden Schätzer berechnet wird. Schließlich wird eine Schätzung des sauberen Sprachsignals aus der Mikrofonaufnahme gewonnen. Der neue Ansatz bietet eine signifikant höhere Dämpfung des Störgeräuschs als der bisherige Stand der Technik. Dabei wird eine vergleichbare Qualität der Sprachkomponente und der Sprachverständlichkeit gewährleistet. Somit konnte die Gesamtqualität des verbesserten Sprachsignals gegenüber dem Stand der Technik erhöht werden

    WASIS - Identificação bioacústica de espécies baseada em múltiplos algoritmos de extração de descritores e de classificação

    Get PDF
    Orientador: Claudia Maria Bauzer MedeirosDissertação (mestrado) - Universidade Estadual de Campinas, Instituto de ComputaçãoResumo: A identificação automática de animais por meio de seus sons é um dos meios para realizar pesquisa em bioacústica. Este domínio de pesquisa fornece, por exemplo, métodos para o monitoramento de espécies raras e ameaçadas, análises de mudanças em comunidades ecológicas, ou meios para o estudo da função social de vocalizações no contexto comportamental. Mecanismos de identificação são tipicamente executados em dois estágios: extração de descritores e classificação. Ambos estágios apresentam desafios, tanto em ciência da computação quanto na bioacústica. A escolha de algoritmos de extração de descritores e técnicas de classificação eficientes é um desafio em qualquer sistema de reconhecimento de áudio, especialmente no domínio da bioacústica. Dada a grande variedade de grupos de animais estudados, algoritmos são adaptados a grupos específicos. Técnicas de classificação de áudio também são sensíveis aos descritores extraídos e condições associadas às gravações. Como resultado, muitos sistemas computacionais para bioacústica não são expansíveis, limitando os tipos de experimentos de reconhecimento que possam ser conduzidos. Baseado neste cenário, esta dissertação propõe uma arquitetura de software que acomode múltiplos algoritmos de extração de descritores, fusão entre descritores e algoritmos de classificação para auxiliar cientistas e o grande público na identificação de animais através de seus sons. Esta arquitetura foi implementada no software WASIS, gratuitamente disponível na Internet. Diversos algoritmos foram implementados, servindo como base para um estudo comparativo que recomenda conjuntos de algoritmos de extração de descritores e de classificação para três grupos de animaisAbstract: Automatic identification of animal species based on their sounds is one of the means to conduct research in bioacoustics. This research domain provides, for instance, ways to monitor rare and endangered species, to analyze changes in ecological communities, or ways to study the social meaning of the animal calls in the behavior context. Identification mechanisms are typically executed in two stages: feature extraction and classification. Both stages present challenges, in computer science and in bioacoustics. The choice of effective feature extraction and classification algorithms is a challenge on any audio recognition system, especially in bioacoustics. Considering the wide variety of animal groups studied, algorithms are tailored to specific groups. Classification techniques are also sensitive to the extracted features, and conditions surrounding the recordings. As a results, most bioacoustic softwares are not extensible, therefore limiting the kinds of recognition experiments that can be conducted. Given this scenario, this dissertation proposes a software architecture that allows multiple feature extraction, feature fusion and classification algorithms to support scientists and the general public on the identification of animal species through their recorded sounds. This architecture was implemented by the WASIS software, freely available on the Web. A number of algorithms were implemented, serving as the basis for a comparative study that recommends sets of feature extraction and classification algorithms for three animal groupsMestradoCiência da ComputaçãoMestre em Ciência da Computação132849/2015-12013/02219-0CNPQFAPES

    Molecular characterization of the lipidome by mass spectrometry

    Get PDF
    Cells, whether bacterial, fungal or mammalian, are all equipped with metabolic pathways capable of producing an assortment of structurally and functionally distinct lipid species. Despite the structural diversity of lipids being recognized and correlated to specific cellular phenomena and disease states, the molecular mechanisms that underpin this structural diversity remain poorly understood. In part, this is due to the lack of adequate analytical techniques capable of measuring the structural details of lipid species in a direct, comprehensive and quantitative manner. The aim of my thesis study was to establish methodology for automated and quantitative analysis of molecular lipid species based on mass spectrometry. From this work a novel high-throughput methodology for lipidome analysis emerged. The main assets of the methodology were the structure-specific mass analysis by powerful hybrid mass spectrometers with high mass resolution, automated and sensitive infusion of total lipid extracts by a nanoelectrospray robot, and automated spectral deconvolution by dedicated Lipid Profiler software. The comprehensive characterization and quantification of molecular lipid species was achieved by spiking total lipid extracts with unique lipid standards, utilizing selective ionization conditions for sample infusion, and performing structure-specific mass analysis by hybrid quadrupole time-of-flight and ion trap mass spectrometry. The analytical routine allowed the comprehensive characterization and quantification of molecular glycerophospholipid species, molecular diacylglycerol species, molecular sphingolipid species including ceramides, glycosphingolipids and inositol-containing sphingolipids, and sterol lipids including cholesterol. The performance of the methodology was validated by comparing its dynamic quantification range to that of established methodology based on triple quandrupole mass spectrometry. Furthermore, its efficacy for lipidomics projects was demonstrated by the successful quantitative deciphering of the lipid composition of T cell receptor signaling domains, mammalian tissues including heart, brain and red blood cells, and the yeast Saccharomyces cerevisiae

    Characterisation of xenometabolome signatures in complex biomatrices for enhanced human population phenotyping

    Get PDF
    Metabolic phenotyping facilitates the analysis of low molecular weight compounds in complex biological samples, with resulting metabolite profiles providing a window on endogenous processes and xenobiotic exposures. Accurate characterisation of the xenobiotic component of the metabolome (the xenometabolome) is particularly valuable when metabolic phenotyping is used for epidemiological and clinical population studies where exposure of participants to xenobiotics is unknown or difficult to control/estimate. Additionally, as metabolic phenotyping has increasingly been incorporated into toxicology and drug metabolism research, phenotyping datasets may be exploited to study xenobiotic metabolism at the population level. This thesis describes novel analytical and data-driven strategies for broadening xenometabolome coverage to allow effective partitioning of endogenous and xenobiotic metabolome signatures. The data driven strategy was multi-faceted, involving the generation of a reference database and the application of statistical methodologies. The database contains over 100 common xenobiotics profiles - generated using established liquid chromatography-mass-spectrometry methods – and provided the basis for an empirically derived screen for human urine and blood samples. The prevalence of these xenobiotics was explored in an exemplar phenotyping dataset (ALZ; n = 650; urine), with 31 xenobiotics detected in an initial screen. Statistical based methods were tailored to extract xenobiotic-related signatures and evaluated using drugs with well-characterised human metabolism. To complement the data-driven strategies for xenometabolome coverage, a more analytical based strategy was additionally developed. A dispersive solid phase extraction sample preparation protocol for blood products was optimised, permitting efficient removal of lipids and proteins, with minimal effect on low molecular weight metabolites. The suitability and reproducibility of this method was evaluated in two independent blood sample sets (AZstudy12; n=171, MARS; n=285). Finally, these analytical and statistical strategies were applied to two existing large-scale phenotyping study datasets: AIRWAVE (n = 3000 urine, n=3000 plasma samples) and ALZ (n= 650 urine, n= 449 serum) and used to explore both xenobiotic and endogenous responses to triclosan and polyethylene glycol exposure. Exposure to triclosan highlighted affected pathways relating to sulfation, whilst exposure to PEG highlighted a possible perturbation in the glutathione cycle. The analytical and statistical strategies described in this thesis allow for a more comprehensive xenometabolome characterisation and have been used to uncover previously unreported relationships between xenobiotic and endogenous metabolism.Open Acces

    Time and frequency domain algorithms for speech coding

    Get PDF
    The promise of digital hardware economies (due to recent advances in VLSI technology), has focussed much attention on more complex and sophisticated speech coding algorithms which offer improved quality at relatively low bit rates. This thesis describes the results (obtained from computer simulations) of research into various efficient (time and frequency domain) speech encoders operating at a transmission bit rate of 16 Kbps. In the time domain, Adaptive Differential Pulse Code Modulation (ADPCM) systems employing both forward and backward adaptive prediction were examined. A number of algorithms were proposed and evaluated, including several variants of the Stochastic Approximation Predictor (SAP). A Backward Block Adaptive (BBA) predictor was also developed and found to outperform the conventional stochastic methods, even though its complexity in terms of signal processing requirements is lower. A simplified Adaptive Predictive Coder (APC) employing a single tap pitch predictor considered next provided a slight improvement in performance over ADPCM, but with rather greater complexity. The ultimate test of any speech coding system is the perceptual performance of the received speech. Recent research has indicated that this may be enhanced by suitable control of the noise spectrum according to the theory of auditory masking. Various noise shaping ADPCM configurations were examined, and it was demonstrated that a proposed pre-/post-filtering arrangement which exploits advantageously the predictor-quantizer interaction, leads to the best subjective performance in both forward and backward prediction systems. Adaptive quantization is instrumental to the performance of ADPCM systems. Both the forward adaptive quantizer (AQF) and the backward oneword memory adaptation (AQJ) were examined. In addition, a novel method of decreasing quantization noise in ADPCM-AQJ coders, which involves the application of correction to the decoded speech samples, provided reduced output noise across the spectrum, with considerable high frequency noise suppression. More powerful (and inevitably more complex) frequency domain speech coders such as the Adaptive Transform Coder (ATC) and the Sub-band Coder (SBC) offer good quality speech at 16 Kbps. To reduce complexity and coding delay, whilst retaining the advantage of sub-band coding, a novel transform based split-band coder (TSBC) was developed and found to compare closely in performance with the SBC. To prevent the heavy side information requirement associated with a large number of bands in split-band coding schemes from impairing coding accuracy, without forgoing the efficiency provided by adaptive bit allocation, a method employing AQJs to code the sub-band signals together with vector quantization of the bit allocation patterns was also proposed. Finally, 'pipeline' methods of bit allocation and step size estimation (using the Fast Fourier Transform (FFT) on the input signal) were examined. Such methods, although less accurate, are nevertheless useful in limiting coding delay associated with SRC schemes employing Quadrature Mirror Filters (QMF)

    Analysis and correction of the helium speech effect by autoregressive signal processing

    Get PDF
    SIGLELD:D48902/84 / BLDSC - British Library Document Supply CentreGBUnited Kingdo

    Implementation and Performance Evaluation of Acoustic Denoising Algorithms for UAV

    Full text link
    Unmanned Aerial Vehicles (UAVs) have become popular alternative for wildlife monitoring and border surveillance applications. Elimination of the UAV’s background noise and classifying the target audio signal effectively are still a major challenge. The main goal of this thesis is to remove UAV’s background noise by means of acoustic denoising techniques. Existing denoising algorithms, such as Adaptive Least Mean Square (LMS), Wavelet Denoising, Time-Frequency Block Thresholding, and Wiener Filter, were implemented and their performance evaluated. The denoising algorithms were evaluated for average Signal to Noise Ratio (SNR), Segmental SNR (SSNR), Log Likelihood Ratio (LLR), and Log Spectral Distance (LSD) metrics. To evaluate the effectiveness of the denoising algorithms on classification of target audio, we implemented Support Vector Machine (SVM) and Naive Bayes classification algorithms. Simulation results demonstrate that LMS and Discrete Wavelet Transform (DWT) denoising algorithm offered superior performance than other algorithms. Finally, we implemented the LMS and DWT algorithms on a DSP board for hardware evaluation. Experimental results showed that LMS algorithm’s performance is robust compared to DWT for various noise types to classify target audio signals

    On the design of visual feedback for the rehabilitation of hearing-impaired speech

    Get PDF

    Music Genre Classification Systems - A Computational Approach

    Get PDF

    Models and analysis of vocal emissions for biomedical applications

    Get PDF
    This book of Proceedings collects the papers presented at the 3rd International Workshop on Models and Analysis of Vocal Emissions for Biomedical Applications, MAVEBA 2003, held 10-12 December 2003, Firenze, Italy. The workshop is organised every two years, and aims to stimulate contacts between specialists active in research and industrial developments, in the area of voice analysis for biomedical applications. The scope of the Workshop includes all aspects of voice modelling and analysis, ranging from fundamental research to all kinds of biomedical applications and related established and advanced technologies
    corecore