5 research outputs found

    Automatic threshold determination for a local approach of change detection in long-term signal recordings

    Get PDF
    CUSUM (cumulative sum) is a well-known method that can be used to detect changes in a signal when the parameters of this signal are known. This paper presents an adaptation of the CUSUM-based change detection algorithms to long-term signal recordings where the various hypotheses contained in the signal are unknown. The starting point of the work was the dynamic cumulative sum (DCS) algorithm, previously developed for application to long-term electromyography (EMG) recordings. DCS has been improved in two ways. The first was a new procedure to estimate the distribution parameters to ensure the respect of the detectability property. The second was the definition of two separate, automatically determined thresholds. One of them (lower threshold) acted to stop the estimation process, the other one (upper threshold) was applied to the detection function. The automatic determination of the thresholds was based on the Kullback-Leibler distance which gives information about the distance between the detected segments (events). Tests on simulated data demonstrated the efficiency of these improvements of the DCS algorithm

    Videounterstützte, robuste Merkmalsextraktion für die Spracherkennung in Echtzeit [online]

    Get PDF

    Speech synchronized facial animation: phonetic context dependent visemes for Brazilian portuguese

    Get PDF
    Orientadores: Leo Pini Magalhães, Fabio ViolaroTese (doutorado) - Universidade Estadual de Campinas, Faculdade de Engenharia Eletrica e de ComputaçãoResumo: aparência de uma face sintética ao longo do tempo. Animação facial sincronizada com a fala está relacionada ao controle da movimentação da face virtual comandada pelos eventos fonéticos de uma locução. Tal controle implica na manipulação da face virtual de forma coordenada e em sincro-nismo com o sinal acústico da fala. A coordenação é a1cançada pela reprodução, na face virtual, da movimentação articulatória visível necessária à produção dos sons da fala. O objetivo do trabalho é estudar e propor uma metodologia para a definição de representações para os padrões visuais de movimentação articulatória observáveis na face durante a fala, os denominados visemas. A metodo-logia proposta estabelece visemas dependentes do contexto fonético que contemplam o fenômeno da coarticulação perseveratória e antecipatória. Além disso, a partir da descrição geométrica e temporal de visemas estabelecidos pela análise de um corpus lingüístico do português do Brasil, são derivados modelos para a movimentação da articulação temporomandibular e do tecido dos lábios. Apesar do material fonético utilizado no trabalho estar restrito ao português do Brasil, a metodologia proposta é aplicável a outras línguasAbstract: Computer facial animation refers to the techniques for specifying and controlling the positioning, motion, and appearance of a synthetic face over time. Speech synchronized facial animation addres-ses the control of a virtual face conducted by the phonetic events of an utterance. Such control implies the manipulation of the virtual face synchronized and coordinated with the speech signal. The coor-dination is achieved by reproducing on the virtual face the visible articulatory movements necessary for speech production. The objective of the work is to study and propose a methodology to establish representations for, the visual articulatory pattems displayed on the face during speech production, the so called visemes. The proposed methodology identifies phonetic context dependent visemes that cope with persevera tive and anticipatory coarticulation. Additionally, the movements of the temporo-mandibular joint and the lip tissue are modelled from a set of visemes established by the analysis of a Brazilian Portuguese linguistic corpus. Although the corpus is restricted to Brazilian Portuguese, the methodology is general enough to be applied to other languagesDoutoradoEngenharia de ComputaçãoDoutor em Engenharia Elétric

    Adaptive audio-visuelle Synthese : Automatische Trainingsverfahren für Unit-Selection-basierte audio-visuelle Sprachsynthese

    Get PDF
    In dieser Arbeit wurden Algorithmen und Verfahren entwickelt und angewendet, die es ermöglichen eine video-realistische audio-visuelle Synthese durchzuführen. Das generierte audio-visuelle Signal zeigt einen Talking-Head, der aus zuvor aufgenommenen Videodaten und einem zugrunde liegenden TTS-System konstruiert wurde. Die Arbeit gliedert sich in drei Teile: statistische Lernverfahren Verfahren, konkatenative Sprachsynthese sowie video-realistische audio-visuelle Synthese. Bei dem entwickelten Sprachsynthese System wird die Verkettung natürlichsprachlicher Einheiten verwendet. Die ist gemeinhin als Unit-Selection-basierte Text-to-Speech bekannt. Das Verfahren der Auswahl und Konkatenation wird ebenso für die visuelle Synthese verwendet, wobei hier auf natürliche Videosequenzen zurückgegriffen wird. Als statistische Lernverfahren werden vor allem Graphen-basierte Verfahren entwickelt und angewendet. Hier ist der Einsatz von Hidden-Markov Modellen und bedingten Zufallsfeldern (Conditional-Random-Fields) hervorgehoben, die zur Auswahl der geeigneten Sprachrepresentationseinheiten dienen. Bei der visuellen Synthese kommt ein Prototypen-basiertes Lernverfahren zum Einsatz, welches weithin als K-Nächster-Nachbar Algorithmus bekannt ist. Das Training des Systems benötigt ein annotiertes Sprachdatenkorpus, sowie ein annotiertes Videodatenkorpus. Zur Evaluation der eingesetzten Verfahren wurde eine video-realistische audio-visuelle Synthese Software entwickelt, welche vollautomatisch die Texteingabe in die gewünschte Videosequenz umsetzt. Alle Schritte bis zur Signalausgabe bedürfen keinerlei manuellen Eingriffs

    A system for video-based analysis of face motion during speech

    Get PDF
    During face-to-face interaction, facial motion conveys information at various levels. These include a person's emotional condition, position in a discourse, and, while speaking, phonetic details about the speech sounds being produced. Trivially, the measurement of face motion is a prerequisite for any further analysis of its functional characteristics or information content. It is possible to make precise measures of locations on the face using systems that track the motion by means of active or passive markers placed directly on the face. Such systems, however, have the disadvantages of requiring specialised equipment, thus restricting the use outside the lab, and being invasive in the sense that the markers have to be attached to the subject's face. To overcome these limitations we developed a video-based system to measure face motion from standard video recordings by deforming the surface of an ellipsoidal mesh fit to the face. The mesh is initialised manually for a reference frame and then projected onto subsequent video frames. Location changes (between successive frames) for each mesh node are determined adaptively within a well-defined area around each mesh node, using a two-dimensional cross-correlation analysis on a two-dimensional wavelet transform of the frames. Position parameters are propagated in three steps from a coarser mesh and a correspondingly higher scale of the wavelet transform to the final fine mesh and lower scale of the wavelet transform. The sequential changes in position of the mesh nodes represent the facial motion. The method takes advantage of inherent constraints of the facial surfaces which distinguishes it from more general image motion estimation methods and it returns measurement points globally distributed over the facial surface contrary to feature-based methods
    corecore