517 research outputs found

    Audio Analysis/synthesis System

    Get PDF
    A method and apparatus for the automatic analysis, synthesis and modification of audio signals, based on an overlap-add sinusoidal model, is disclosed. Automatic analysis of amplitude, frequency and phase parameters of the model is achieved using an analysis-by-synthesis procedure which incorporates successive approximation, yielding synthetic waveforms which are very good approximations to the original waveforms and are perceptually identical to the original sounds. A generalized overlap-add sinusoidal model is introduced which can modify audio signals without objectionable artifacts. In addition, a new approach to pitch-scale modification allows for the use of arbitrary spectral envelope estimates and addresses the problems of high-frequency loss and noise amplification encountered with prior art methods. The overlap-add synthesis method provides the ability to synthesize sounds with computational efficiency rivaling that of synthesis using the discrete short-time Fourier transform (DSTFT) while eliminating the modification artifacts associated with that method.Georgia Tech Research Corporatio

    On the quality of synthetic speech : evaluation and improvements

    Get PDF

    Sparsity in Linear Predictive Coding of Speech

    Get PDF
    nrpages: 197status: publishe

    Robust cepstral feature for bird sound classification

    Get PDF
    Birds are excellent environmental indicators and may indicate sustainability of the ecosystem; birds may be used to provide provisioning, regulating, and supporting services. Therefore, birdlife conservation-related researches always receive centre stage. Due to the airborne nature of birds and the dense nature of the tropical forest, bird identifications through audio may be a better solution than visual identification. The goal of this study is to find the most appropriate cepstral features that can be used to classify bird sounds more accurately. Fifteen (15) endemic Bornean bird sounds have been selected and segmented using an automated energy-based algorithm. Three (3) types of cepstral features are extracted; linear prediction cepstrum coefficients (LPCC), mel frequency cepstral coefficients (MFCC), gammatone frequency cepstral coefficients (GTCC), and used separately for classification purposes using support vector machine (SVM). Through comparison between their prediction results, it has been demonstrated that model utilising GTCC features, with 93.3% accuracy, outperforms models utilising MFCC and LPCC features. This demonstrates the robustness of GTCC for bird sounds classification. The result is significant for the advancement of bird sound classification research, which has been shown to have many applications such as in eco-tourism and wildlife management

    Glottal-synchronous speech processing

    No full text
    Glottal-synchronous speech processing is a field of speech science where the pseudoperiodicity of voiced speech is exploited. Traditionally, speech processing involves segmenting and processing short speech frames of predefined length; this may fail to exploit the inherent periodic structure of voiced speech which glottal-synchronous speech frames have the potential to harness. Glottal-synchronous frames are often derived from the glottal closure instants (GCIs) and glottal opening instants (GOIs). The SIGMA algorithm was developed for the detection of GCIs and GOIs from the Electroglottograph signal with a measured accuracy of up to 99.59%. For GCI and GOI detection from speech signals, the YAGA algorithm provides a measured accuracy of up to 99.84%. Multichannel speech-based approaches are shown to be more robust to reverberation than single-channel algorithms. The GCIs are applied to real-world applications including speech dereverberation, where SNR is improved by up to 5 dB, and to prosodic manipulation where the importance of voicing detection in glottal-synchronous algorithms is demonstrated by subjective testing. The GCIs are further exploited in a new area of data-driven speech modelling, providing new insights into speech production and a set of tools to aid deployment into real-world applications. The technique is shown to be applicable in areas of speech coding, identification and artificial bandwidth extension of telephone speec

    New linear predictive methods for digital speech processing

    Get PDF
    Speech processing is needed whenever speech is to be compressed, synthesised or recognised by the means of electrical equipment. Different types of phones, multimedia equipment and interfaces to various electronic devices, all require digital speech processing. As an example, a GSM phone applies speech processing in its RPE-LTP encoder/decoder (ETSI, 1997). In this coder, 20 ms of speech is first analysed in the short-term prediction (STP) part, and second in the long-term prediction (LTP) part. Finally, speech compression is achieved in the RPE encoding part, where only 1/3 of the encoded samples are selected to be transmitted. This thesis presents modifications for one of the most widely applied techniques in digital speech processing, namely linear prediction (LP). During recent decades linear prediction has played an important role in telecommunications and other areas related to speech compression and recognition. In linear prediction sample s(n) is predicted from its p previous samples by forming a linear combination of the p previous samples and by minimising the prediction error. This procedure in the time domain corresponds to modelling the spectral envelope of the speech spectrum in the frequency domain. The accuracy of the spectral envelope to the speech spectrum is strongly dependent on the order of the resulting all-pole filter. This, in turn, is usually related to the number of parameters required to define the model, and hence to be transmitted. Our study presents new predictive methods, which are modified from conventional linear prediction by taking the previous samples for linear combination differently. This algorithmic development aims at new all-pole techniques, which could present speech spectra with fewer parameters.reviewe

    Start-and-End Point Detection at the Input of Speech Recognition Application

    Get PDF
    Este documento tiene por objetivo recoger la información relativa al proyecto sobre la creación de un algoritmo para Start-and-End point detection de una señal pregrabada. La intención inicial del desarrollo de este algoritmo es que pueda ser utilizado en la entrada de una aplicación de reconocimiento de voz. En términos generales, el resultado de este trabajo es un algoritmo que puede detectar el comienzo y el fin de una señal previamente grabada basado en un algoritmo de detección de la actividad de la voz previamente desarrollado por la Czech Technical University, Faculty of Electrical Engineering. Hay dos temas principales de estudio en este proyecto: detección de la actividad de la voz (VAD algorithm) y determinar el punto de inicio y fin de la señal (Start-and-End point detection). El primer paso para la construcción del algoritmo final es ser capaz de identificar la actividad de la voz en una señal mediante el VAD algorithm para después ser capaz de detectar el inicio y final de la actividad de la voz y descartar los silencios de la señal mediante el Startand- End point detection algorithm. Con el fin de demostrar el modo de funcionamiento de dicho algoritmo se ha creado una aplicación en MATLAB que permite ver gráficamente una señal previamente grabada y posteriormente su punto inicial y final después de aplicar los algoritmos. Por último, para proporcionar resultados más gráficos y dar al proyecto un valor añadido y con vistas a convertirse en una futura aplicación posible se ha añadido el reconocimiento de dígitos basado en de un algoritmo DTW (Dinamic Time Warping). English: The objective of the project is the creation of an algorithm for Start-and-End point detection of a pre-recorded signal. The initial reason for developing this algorithm is so it can be used at the input of a voice recognition application. Overall, the result of this work is an algorithm that can detect the beginning and end of a previously recorded signal based on a detection algorithm of the voice activity previously developed by the Czech Technical University, Faculty of Electrical Engineering. Two main issues are studied in this project: Detecting the Voice Activity (VAD algorithm) and determining the start and end point of the signal (Start-and-End point detection). To demonstrate the mode of operation of the algorithm, I have created an application in MATLAB to show graphically the process for a previously recorded signal and then the start and end points after applying the algorithms. Finally, to provide better graphic performance and provide added value to the project, I have added a digit recognition algorithm based on a DTW (Dynamic Time Warping)

    Hybrid techniques for speech coding

    Get PDF
    corecore