49 research outputs found

    Novel Pitch Detection Algorithm With Application to Speech Coding

    Get PDF
    This thesis introduces a novel method for accurate pitch detection and speech segmentation, named Multi-feature, Autocorrelation (ACR) and Wavelet Technique (MAWT). MAWT uses feature extraction, and ACR applied on Linear Predictive Coding (LPC) residuals, with a wavelet-based refinement step. MAWT opens the way for a unique approach to modeling: although speech is divided into segments, the success of voicing decisions is not crucial. Experiments demonstrate the superiority of MAWT in pitch period detection accuracy over existing methods, and illustrate its advantages for speech segmentation. These advantages are more pronounced for gain-varying and transitional speech, and under noisy conditions

    Novel multiscale methods for nonlinear speech analysis

    Get PDF
    Cette thèse présente une recherche exploratoire sur l'application du Formalisme Microcanonique Multiéchelles (FMM) à l'analyse de la parole. Dérivé de principes issus en physique statistique, le FMM permet une analyse géométrique précise de la dynamique non linéaire des signaux complexes. Il est fondé sur l'estimation des paramètres géométriques locaux (les exposants de singularité) qui quantifient le degré de prédictibilité à chaque point du signal. Si correctement définis est estimés, ils fournissent des informations précieuses sur la dynamique locale de signaux complexes. Nous démontrons le potentiel du FMM dans l'analyse de la parole en développant: un algorithme performant pour la segmentation phonétique, un nouveau codeur, un algorithme robuste pour la détection précise des instants de fermeture glottale, un algorithme rapide pour l analyse par prédiction linéaire parcimonieuse et une solution efficace pour l approximation multipulse du signal source d'excitation.This thesis presents an exploratory research on the application of a nonlinear multiscale formalism, called the Microcanonical Multiscale Formalism (the MMF), to the analysis of speech signals. Derived from principles in Statistical Physics, the MMF allows accurate analysis of the nonlinear dynamics of complex signals. It relies on the estimation of local geometrical parameters, the singularity exponents (SE), which quantify the degree of predictability at each point of the signal domain. When correctly defined and estimated, these exponents can provide valuable information about the local dynamics of complex signals and has been successfully used in many applications ranging from signal representation to inference and prediction.We show the relevance of the MMF to speech analysis and develop several applications to show the strength and potential of the formalism. Using the MMF, in this thesis we introduce: a novel and accurate text-independent phonetic segmentation algorithm, a novel waveform coder, a robust accurate algorithm for detection of the Glottal Closure Instants, a closed-form solution for the problem of sparse linear prediction analysis and finally, an efficient algorithm for estimation of the excitation source signal.BORDEAUX1-Bib.electronique (335229901) / SudocSudocFranceF

    Sparsity in Linear Predictive Coding of Speech

    Get PDF
    nrpages: 197status: publishe

    An investigation into glottal waveform based speech coding

    Get PDF
    Coding of voiced speech by extraction of the glottal waveform has shown promise in improving the efficiency of speech coding systems. This thesis describes an investigation into the performance of such a system. The effect of reverberation on the radiation impedance at the lips is shown to be negligible under normal conditions. Also, the accuracy of the Image Method for adding artificial reverberation to anechoic speech recordings is established. A new algorithm, Pre-emphasised Maximum Likelihood Epoch Detection (PMLED), for Glottal Closure Instant detection is proposed. The algorithm is tested on natural speech and is shown to be both accurate and robust. Two techniques for giottai waveform estimation, Closed Phase Inverse Filtering (CPIF) and Iterative Adaptive Inverse Filtering (IAIF), are compared. In tandem with an LF model fitting procedure, both techniques display a high degree of accuracy However, IAIF is found to be slightly more robust. Based on these results, a Glottal Excited Linear Predictive (GELP) coding system for voiced speech is proposed and tested. Using a differential LF parameter quantisation scheme, the system achieves speech quality similar to that of U S Federal Standard 1016 CELP at a lower mean bit rate while incurring no extra delay

    An Investigation of Orthogonal Wavelet Division Multiplexing Techniques as an Alternative to Orthogonal Frequency Division Multiplex Transmissions and Comparison of Wavelet Families and Their Children

    Get PDF
    Recently, issues surrounding wireless communications have risen to prominence because of the increase in the popularity of wireless applications. Bandwidth problems, and the difficulty of modulating signals across carriers, represent significant challenges. Every modulation scheme used to date has had limitations, and the use of the Discrete Fourier Transform in OFDM (Orthogonal Frequency Division Multiplex) is no exception. The restriction on further development of OFDM lies primarily within the type of transform it uses in the heart of its system, Fourier transform. OFDM suffers from sensitivity to Peak to Average Power Ratio, carrier frequency offset and wasting some bandwidth to guard successive OFDM symbols. The discovery of the wavelet transform has opened up a number of potential applications from image compression to watermarking and encryption. Very recently, work has been done to investigate the potential of using wavelet transforms within the communication space. This research will further investigate a recently proposed, innovative, modulation technique, Orthogonal Wavelet Division Multiplex, which utilises the wavelet transform opening a new avenue for an alternative modulation scheme with some interesting potential characteristics. Wavelet transform has many families and each of those families has children which each differ in filter length. This research consider comprehensively investigates the new modulation scheme, and proposes multi-level dynamic sub-banding as a tool to adapt variable signal bandwidths. Furthermore, all compactly supported wavelet families and their associated children of those families are investigated and evaluated against each other and compared with OFDM. The linear computational complexity of wavelet transform is less than the logarithmic complexity of Fourier in OFDM. The more important complexity is the operational complexity which is cost effectiveness, such as the time response of the system, the memory consumption and the number of iterative operations required for data processing. Those complexities are investigated for all available compactly supported wavelet families and their children and compared with OFDM. The evaluation reveals which wavelet families perform more effectively than OFDM, and for each wavelet family identifies which family children perform the best. Based on these results, it is concluded that the wavelet modulation scheme has some interesting advantages over OFDM, such as lower complexity and bandwidth conservation of up to 25%, due to the elimination of guard intervals and dynamic bandwidth allocation, which result in better cost effectiveness

    HMM-based speech synthesis using an acoustic glottal source model

    Get PDF
    Parametric speech synthesis has received increased attention in recent years following the development of statistical HMM-based speech synthesis. However, the speech produced using this method still does not sound as natural as human speech and there is limited parametric flexibility to replicate voice quality aspects, such as breathiness. The hypothesis of this thesis is that speech naturalness and voice quality can be more accurately replicated by a HMM-based speech synthesiser using an acoustic glottal source model, the Liljencrants-Fant (LF) model, to represent the source component of speech instead of the traditional impulse train. Two different analysis-synthesis methods were developed during this thesis, in order to integrate the LF-model into a baseline HMM-based speech synthesiser, which is based on the popular HTS system and uses the STRAIGHT vocoder. The first method, which is called Glottal Post-Filtering (GPF), consists of passing a chosen LF-model signal through a glottal post-filter to obtain the source signal and then generating speech, by passing this source signal through the spectral envelope filter. The system which uses the GPF method (HTS-GPF system) is similar to the baseline system, but it uses a different source signal instead of the impulse train used by STRAIGHT. The second method, called Glottal Spectral Separation (GSS), generates speech by passing the LF-model signal through the vocal tract filter. The major advantage of the synthesiser which incorporates the GSS method, named HTS-LF, is that the acoustic properties of the LF-model parameters are automatically learnt by the HMMs. In this thesis, an initial perceptual experiment was conducted to compare the LFmodel to the impulse train. The results showed that the LF-model was significantly better, both in terms of speech naturalness and replication of two basic voice qualities (breathy and tense). In a second perceptual evaluation, the HTS-LF system was better than the baseline system, although the difference between the two had been expected to be more significant. A third experiment was conducted to evaluate the HTS-GPF system and an improved HTS-LF system, in terms of speech naturalness, voice similarity and intelligibility. The results showed that the HTS-GPF system performed similarly to the baseline. However, the HTS-LF system was significantly outperformed by the baseline. Finally, acoustic measurements were performed on the synthetic speech to investigate the speech distortion in the HTS-LF system. The results indicated that a problem in replicating the rapid variations of the vocal tract filter parameters at transitions between voiced and unvoiced sounds is the most significant cause of speech distortion. This problem encourages future work to further improve the system

    Detecting the presence of sperm whales echolocation clicks in noisy environments

    Full text link
    Sperm whales (Physeter macrocephalus) navigate underwater with a series of impulsive, click-like sounds known as echolocation clicks. These clicks are characterized by a multipulse structure (MPS) that serves as a distinctive pattern. In this work, we use the stability of the MPS as a detection metric for recognizing and classifying the presence of clicks in noisy environments. To distinguish between noise transients and to handle simultaneous emissions from multiple sperm whales, our approach clusters a time series of MPS measures while removing potential clicks that do not fulfil the limits of inter-click interval, duration and spectrum. As a result, our approach can handle high noise transients and low signal-to-noise ratio. The performance of our detection approach is examined using three datasets: seven months of recordings from the Mediterranean Sea containing manually verified ambient noise; several days of manually labelled data collected from the Dominica Island containing approximately 40,000 clicks from multiple sperm whales; and a dataset from the Bahamas containing 1,203 labelled clicks from a single sperm whale. Comparing with the results of two benchmark detectors, a better trade-off between precision and recall is observed as well as a significant reduction in false detection rates, especially in noisy environments. To ensure reproducibility, we provide our database of labelled clicks along with our implementation code.Comment: 10 pages and 10 figure

    Field Programmable Gate Arrays Usage in Industrial Automation Systems

    Get PDF
    Tato disertační práce se zabývá využitím programovatelných hradlových polí (FPGA) v diagnostice měničů, využívajících spínaných IGBT tranzistorů. Je zaměřena na budiče těchto výkonových tranzistorů a jejich struktury. Přechodné jevy veličin, jako jsou IG, VGE, VCE během procesu přepínání (zapnutí, vypnutí), mohou poukazovat na degradaci IGBT. Pro měření a monitorování těchto veličin byla navržena nová architektura budiče IGBT. Rychlé měření a monitorování během přepínacího děje vyžaduje vysokou vzorkovací frekvenci. Proto jsou navrhovány paralelní vysokorychlostní AD převodníky (> 50 MSPS). Práce je zaměřena převážně na návrh zařízení s FPGA včetně hardware a software. Byla navržena nová deska plošných spojů s FPGA, která plní požadované funkce, jako je řízení IGBT pomocí vícenásobných paralelních koncových stupňů, monitorování a diagnostiku, a propojení s řídicí jednotkou měniče.This doctoral thesis deals with the usage of Field Programmable Gate Arrays (FPGAs) in a diagnosis of power inverters which use the IGBTs transistors as switching devices. It is focused on the IGBT gate drives and their structures. As long as the transient phenomena and other quantities such as IG, VGE, VCE shows the IGBT degradation during the switching process (turn-on, turn-off), a new IGBT gate driver architecture is proposed for measuring and monitoring these quantities. Quick measurements and monitoring during the IGBT switching process require high sampling frequencies. Therefore, high speed parallel ADC converters (> 50MSPS) are proposed. The thesis is focused on the FPGA design (hardware, software). A new FPGA board is designed for desired functions implementation such as IGBT driving using multiple stages, IGBT monitoring and diagnosis, and interfacing to inverter controller.
    corecore