56 research outputs found

    Linear predictive modelling of speech : constraints and line spectrum pair decomposition

    Get PDF
    In an exploration of the spectral modelling of speech, this thesis presents theory and applications of constrained linear predictive (LP) models. Spectral models are essential in many applications of speech technology, such as speech coding, synthesis and recognition. At present, the prevailing approach in speech spectral modelling is linear prediction. In speech coding, spectral models obtained by LP are typically quantised using a polynomial transform called the Line Spectrum Pair (LSP) decomposition. An inherent drawback of conventional LP is its inability to include speech specific a priori information in the modelling process. This thesis, in contrast, presents different constraints applied to LP models, which are then shown to have relevant properties with respect to root loci of the model in its all-pole form. Namely, we show that LSP polynomials correspond to time domain constraints that force the roots of the model to the unit circle. Furthermore, this result is used in the development of advanced spectral models of speech that are represented by stable all-pole filters. Moreover, the theoretical results also include a generic framework for constrained linear predictive models in matrix notation. For these models, we derive sufficient criteria for stability of their all-pole form. Such models can be used to include a priori information in the generation of any application specific, linear predictive model. As a side result, we present a matrix decomposition rule for Toeplitz and Hankel matrices.reviewe

    Speech coding: code- excited linear prediction

    No full text
    This book provides scientific understanding of the most central techniques used in speech coding both for advanced students as well as professionals with a background in speech audio and or digital signal processing. It provides a clear connection between the whys hows and whats thus enabling a clear view of the necessity purpose and solutions provided by various tools as well as their strengths and weaknesses in each respect Equivalently this book sheds light on the following perspectives for each technology presented Objective What do we want to achieve and especially why is this goal important Resource Information What information is available and how can it be useful and Resource Platform What kind of platforms are we working with and what are their capabilities restrictions This includes computational memory and acoustic properties and the transmission capacity of devices used. The book goes on to address Solutions Which solutions have been proposed and how can they be used to reach the stated goals and Strengths and weaknesses In which ways do the solutions fulfill the objectives and where are they insufficient Are resources used efficiently. This book concentrates solely on code excited linear prediction and its derivatives since mainstream speech codecs are based on linear prediction It also concentrates exclusively on time domain techniques because frequency domain tools are to a large extent common with audio codecs

    End-to-End Optimization of Source Models for Speech and Audio Coding Using a Machine Learning Framework

    No full text
    Speech coding is the most commonly used application of speech processing. Accumulated layers of improvements have however made codecs so complex that optimization of individual modules becomes increasingly difficult. This work introduces machine learning methodology to speech and audio coding, such that we can optimize quality in terms of overall entropy. We can then use conventional quantization, coding and perceptual models without modification such that the codec adheres to conventional requirements on algorithmic complexity, latency and robustness to packet loss. Experiments demonstrate that end-to-end optimization of quantization accuracy of the spectral envelope can be used for a lossless reduction in bitrate of 0.4 kbits/s.Peer reviewe

    Estimation of the Probability Distribution of Spectral Fine Structure in the Speech Source

    No full text
    The efficiency of many speech processing methods rely on accurate modeling of the distribution of the signal spectrum and a majority of prior works suggest that the spectral components follow the Laplace distribution. To improve the probability distribution models based on our knowledge of speech source modeling, we argue that the model should in fact be a multiplicative mixture model, including terms for voiced and unvoiced utterances. While prior works have applied Gaussian mixture models, we demonstrate that a mixture of generalized Gaussian models more accurately follows the observations. The proposed estimation method is based on measuring the ratio of LpL_p-norms between spectral bands. Such ratios follow the Beta-distribution when the input signal is generalized Gaussian, whereby the estimated parameters can be used to determine the underlying parameters of the mixture of generalized Gaussian distributions.Peer reviewe

    Kliinisen tutkimusympÀristön kehittÀminen jatkuvan puheen analysointiin

    No full text
    TÀssÀ työssÀ esitellÀÀn kliinisen puheentutkimusympÀristön metodeita ja kehitystÀ erityisesti jatkuvan puheen analysointiin. Työn tarkoitus oli korvata edeltÀvÀ analoginen analysointiympÀristö tÀysin digitaalisella jÀrjestelmÀllÀ, kÀyttÀen hyvÀksi nykyaikaisia signaalin digitaalisen kÀsittelyn menetelmiÀ. JÀrjestelmÀn kÀyttöliittymÀ tuli toteuttaa siten, ettÀ se soveltuu kÀytettÀvÀksi sairaalaympÀristössÀ. Puhesignaalin analyysi on kolmevaiheinen muodostuen piirreirroituksesta, luokituksesta ja analyysistÀ. NÀistÀ kaksi ensimmÀistÀ muodostavat yhdessÀ olennaisesti klassisen hahmontunnistustehtÀvÀn. Piirreirroituksessa analysoidaan puheen energia, perustaajuus ja stationÀÀrisyys. NÀiden perusteella puhesignaali luokitellaan kolmeen luokkaan: Hiljaisuus, soinnillinen ÀÀnne tai soinniton ÀÀnne. TÀmÀn luokituksen perusteella lasketaan puheesta edelleen useita erilaisia puheÀÀnen laatuun liittyviÀ mitta-arvoja kuten perustaajuuden keskiarvo, kokonaispuheaika sekÀ soinnillisten ÀÀnteiden kokonaisaika. PuheÀÀnen analyysin kohderyhmÀnÀ oli sellaisia ammatteja edustavat ihmiset, joiden pÀÀasiallinen työvÀline oli heidÀn oma ÀÀnensÀ. Siksi analyysissÀ keskityttiin erityisesti ÀÀnen vÀsymiseen liittyviin mitta-arvoihin

    Speech Coding, Speech Interfaces and IoT - Opportunities and Challenges

    No full text
    invited paper, abstract reviewed but not whole paper Will be published online in early 2019Recent speech and audio coding standards such as 3GPP Enhanced Voice Services match the foreseeable needs and requirements in transmission of speech and audio, when using current transmission infrastructure and applications. Trends in Internet-of-Things technology and development in personal digital assistants (PDAs) however begs us to consider future requirements for speech and audio codecs. The opportunities and challenges are here summarized in three concepts: collaboration, unification and privacy. First, an increasing number of devices will in the future be speech-operated, whereby the ability to focus voice commands to a specific devices becomes essential. We therefore need methods which allows collaboration between devices, such that ambiguities can be resolved. Second, such collaboration can be achieved with a unified and standardized communication protocol between voice-operated devices. To achieve such collaboration protocols, we need to develop distributed speech coding technology for ad-hoc IoT networks. Finally however, collaboration will increase the demand for privacy protection in speech interfaces and it is therefore likely that technologies for supporting privacy and generating trust will be in high demand.Non peer reviewe
    • 

    corecore