48 research outputs found

    Robust Auditory-Based Speech Processing Using the Average Localized Synchrony Detection

    Get PDF
    In this paper, a new auditory-based speech processing system based on the biologically rooted property of the average localized synchrony detection (ALSD) is proposed. The system detects periodicity in the speech signal at Bark-scaled frequencies while reducing the response’s spurious peaks and sensitivity to implementation mismatches, and hence presents a consistent and robust representation of the formants. The system is evaluated for its formant extraction ability while reducing spurious peaks. It is compared with other auditory-based and traditional systems in the tasks of vowel and consonant recognition on clean speech from the TIMIT database and in the presence of noise. The results illustrate the advantage of the ALSD system in extracting the formants and reducing the spurious peaks. They also indicate the superiority of the synchrony measures over the mean-rate in the presence of noise

    A Causal Rhythm Grouping

    Get PDF

    Glottal-synchronous speech processing

    No full text
    Glottal-synchronous speech processing is a field of speech science where the pseudoperiodicity of voiced speech is exploited. Traditionally, speech processing involves segmenting and processing short speech frames of predefined length; this may fail to exploit the inherent periodic structure of voiced speech which glottal-synchronous speech frames have the potential to harness. Glottal-synchronous frames are often derived from the glottal closure instants (GCIs) and glottal opening instants (GOIs). The SIGMA algorithm was developed for the detection of GCIs and GOIs from the Electroglottograph signal with a measured accuracy of up to 99.59%. For GCI and GOI detection from speech signals, the YAGA algorithm provides a measured accuracy of up to 99.84%. Multichannel speech-based approaches are shown to be more robust to reverberation than single-channel algorithms. The GCIs are applied to real-world applications including speech dereverberation, where SNR is improved by up to 5 dB, and to prosodic manipulation where the importance of voicing detection in glottal-synchronous algorithms is demonstrated by subjective testing. The GCIs are further exploited in a new area of data-driven speech modelling, providing new insights into speech production and a set of tools to aid deployment into real-world applications. The technique is shown to be applicable in areas of speech coding, identification and artificial bandwidth extension of telephone speec

    Methods for large-scale data analyses of regional language variation based on speech acoustics

    Get PDF

    Singing voice resynthesis using concatenative-based techniques

    Get PDF
    Tese de Doutoramento. Engenharia Informática. Faculdade de Engenharia. Universidade do Porto. 201

    Practical fault simulation on an earthing transformer using SFRA: A unique analysis approach towards simplifying SFRA results to assist with deformation diagnosis in Earthing Transformers

    Get PDF
    Earthing Transformers are an integral part of power and distribution systems around the world, although, little consideration is given to their ongoing monitoring and maintenance. The failure of an earthing transformer can cause a multitude of issues including compromised stability and safety of the electrical network. The necessity to maintain both safety and stability of electrical networks highlights valuable real world applications for an SFRA earthing transformer testing toolkit. As a starting point, the project adopted a review of existing research along with an analysis of earthing transformer design principles. Research found that because of an inherent design strategy, many ZN wound earthing transformers have a unique failure type in common; axial displacement of the inner and outer windings. The second project stage involved physical simulation of an earthing transformer’s axial windings displacement using SFRA as a diagnosis tool. Simulation results provided evidence that (for the given test subject) defect detection is possible using SFRA benchmarked comparisons. Analysis of benchmarked comparisons found deviation only at select resonances with general spectral shape retention for all other points along the SFRA trace. Spectral consistency of benchmarked comparisons allowed the implementation of a speech processing technique known as Mel-Frequency Cepstral Coefficients (MFCC). An adaption of the MFCC process introduced a way of encoding and distilling the SFRA trace data while exaggerating critical points of deviation. The third major project stage involved the development of code using Mathworks MATLAB as a platform to the fulfil data management and computational requirements of the adapted MFCC. Select variables were isolated throughout the code to ensure that the process was tune-able on multiple levels for future optimisation. By selecting and mapping the appropriate resultant cepstral coefficients against each other, it was found that a meaningful representation of the SFRA trace can be graphically presented as a single point on a two-dimensional plot. Simulated transformer defect scenarios had notable deviation on both the x and y axis when processed and plotted together. Analysis, processing and comparison of 28 different earthing transformer SFRA traces found possible real world applications for a single point spectrum classifier. The spectrum classifier was proposed as a substitution for pre-existing subjective analysis techniques, potentially building on the communal engineering toolbox for SFRA analysis

    Singing voice resynthesis using concatenative-based techniques

    Get PDF
    Dissertação submetida à Faculdade de Engenharia da Universidade do Porto para satisfação parcial dos requisitos do grau de doutor em Engenharia Informática.Singing has an important role in our life, and although synthesizers have been trying to replicate every musical instrument for decades, is was only during the last nine years that commercial singing synthesizers started to appear, allowing the ability to merge music and text, i.e., singing. These solutions may present realistic results on some situations, but they require time consuming processes and experienced users. The goal of this research work is to develop, create or adapt techniques that allow the resynthesis of the singing voice, i.e., allow the user to directly control a singing voice synthesizer using his/her own voice. The synthesizer should be able to replicate, as close as possible, the same melody, same phonetic sequence, and the same musical performance. Initially, some work was developed trying to resynthesize piano recordings with evolutionary approaches, using Genetic Algorithms, where a population of individuals (candidate solutions) representing a sequence of music notes evolved over time, tries to match an original audio stream. Later, the focus would return to the singing voice, exploring techniques as Hidden Markov Models, Neural Network Self Organized Maps, among others. Finally, a Concatenative Unit Selection approach was chosen as the core of a singing voice resynthesis system. By extracting energy, pitch and phonetic information (MFCC, LPC), and using it within a phonetic similarity Viterbi-based Unit Selection System, a sequence of internal sound library frames is chosen to replicate the original audio performance. Although audio artifacts still exist, preventing its use on professional applications, the concept of a new audio tool was created, that presents high potential for future work, not only in singing voice, but in other musical or speech domains.This dissertation had the kind support of FCT (Portuguese Foundation for Science and Technology, an agency of the Portuguese Ministry for Science, Technology and Higher Education) under grant SFRH / BD / 30300 / 2006, and has been articulated with research project PTDC/SAU-BEB/104995/2008 (Assistive Real-Time Technology in Singing) whose objectives include the development of interactive technologies helping the teaching and learning of singing

    Automatic robust classification of speech using analytical feature techniques

    Get PDF
    Aquest document és la memòria de la recerca efectuada dins del domini de la classificació automàtica de la parla durant una estada al laboratori Sony CSL per a la realització del projecte fi de carrera. El treball explora les possibilitats del sistema EDS, desenvolupat a Sony CSL, per resoldre problemes de reconeixement d’un petit nombre de mots aïllats, independentment del locutor i en presència de soroll de fons. EDS construeix automàticament features per problemes de classificació d’àudio. Això ho aconsegueix mitjançant la composició (funcional) d’operadors matemàtics i de processament de senyal. Per això aquestes features reben el nom de features analítiques, que el sistema construeix específicament per cada problema de classificació d’àudio, presentat sota la forma d’una base de dades d’entrenament i de test

    New linear predictive methods for digital speech processing

    Get PDF
    Speech processing is needed whenever speech is to be compressed, synthesised or recognised by the means of electrical equipment. Different types of phones, multimedia equipment and interfaces to various electronic devices, all require digital speech processing. As an example, a GSM phone applies speech processing in its RPE-LTP encoder/decoder (ETSI, 1997). In this coder, 20 ms of speech is first analysed in the short-term prediction (STP) part, and second in the long-term prediction (LTP) part. Finally, speech compression is achieved in the RPE encoding part, where only 1/3 of the encoded samples are selected to be transmitted. This thesis presents modifications for one of the most widely applied techniques in digital speech processing, namely linear prediction (LP). During recent decades linear prediction has played an important role in telecommunications and other areas related to speech compression and recognition. In linear prediction sample s(n) is predicted from its p previous samples by forming a linear combination of the p previous samples and by minimising the prediction error. This procedure in the time domain corresponds to modelling the spectral envelope of the speech spectrum in the frequency domain. The accuracy of the spectral envelope to the speech spectrum is strongly dependent on the order of the resulting all-pole filter. This, in turn, is usually related to the number of parameters required to define the model, and hence to be transmitted. Our study presents new predictive methods, which are modified from conventional linear prediction by taking the previous samples for linear combination differently. This algorithmic development aims at new all-pole techniques, which could present speech spectra with fewer parameters.reviewe

    A psychoacoustic engineering approach to machine sound source separation in reverberant environments

    Get PDF
    Reverberation continues to present a major problem for sound source separation algorithms, due to its corruption of many of the acoustical cues on which these algorithms rely. However, humans demonstrate a remarkable robustness to reverberation and many psychophysical and perceptual mechanisms are well documented. This thesis therefore considers the research question: can the reverberation–performance of existing psychoacoustic engineering approaches to machine source separation be improved? The precedence effect is a perceptual mechanism that aids our ability to localise sounds in reverberant environments. Despite this, relatively little work has been done on incorporating the precedence effect into automated sound source separation. Consequently, a study was conducted that compared several computational precedence models and their impact on the performance of a baseline separation algorithm. The algorithm included a precedence model, which was replaced with the other precedence models during the investigation. The models were tested using a novel metric in a range of reverberant rooms and with a range of other mixture parameters. The metric, termed Ideal Binary Mask Ratio, is shown to be robust to the effects of reverberation and facilitates meaningful and direct comparison between algorithms across different acoustic conditions. Large differences between the performances of the models were observed. The results showed that a separation algorithm incorporating a model based on interaural coherence produces the greatest performance gain over the baseline algorithm. The results from the study also indicated that it may be necessary to adapt the precedence model to the acoustic conditions in which the model is utilised. This effect is analogous to the perceptual Clifton effect, which is a dynamic component of the precedence effect that appears to adapt precedence to a given acoustic environment in order to maximise its effectiveness. However, no work has been carried out on adapting a precedence model to the acoustic conditions under test. Specifically, although the necessity for such a component has been suggested in the literature, neither its necessity nor benefit has been formally validated. Consequently, a further study was conducted in which parameters of each of the previously compared precedence models were varied in each room in order to identify if, and to what extent, the separation performance varied with these parameters. The results showed that the reverberation–performance of existing psychoacoustic engineering approaches to machine source separation can be improved and can yield significant gains in separation performance.EThOS - Electronic Theses Online ServiceGBUnited Kingdo
    corecore