58 research outputs found

    Evidential Markov chains and trees with applications to non stationary processes segmentation

    Get PDF
    The triplet Markov chains (TMC) generalize the pairwise Markov chains (PMC), and the latter generalize the hidden Markov chains (HMC). Otherwise, in an HMC the posterior distribution of the hidden process can be viewed as a particular case of the so called "Dempster's combination rule" of its prior Markov distribution p with a probability q defined from the observations. When we place ourselves in the theory of evidence context by replacing p by a mass function m, the result of the Dempster's combination of m with q generalizes the conventional posterior distribution of the hidden process. Although this result is not necessarily a Markov distribution, it has been recently shown that it is a TMC, which renders traditional restoration methods applicable. Further, these results remain valid when replacing the Markov chains with Markov trees. We propose to extend these results to Pairwise Markov trees. Further, we show the practical interest of such combination in the unsupervised segmentation of non stationary hidden Markov chains, with application to unsupervised image segmentation.Les chaĂźnes de Markov Triplet (CMT) gĂ©nĂ©ralisent les chaĂźnes de Markov Couple (CMCouple), ces derniĂšres gĂ©nĂ©ralisant les chaĂźnes de Markov cachĂ©es (CMC). Par ailleurs, dans une CMC la loi a posteriori du processus cachĂ©, qui est de Markov, peut ĂȘtre vue comme une combinaison de Dempster de sa loi a priori p avec une probabilitĂ© q dĂ©finie Ă  partir des observations. Lorsque l'on se place dans le contexte de la thĂ©orie de l'Ă©vidence en remplaçant p par une fonction de masse m, sa combinaison de Dempster avec q gĂ©nĂ©ralise ainsi la probabilitĂ© a posteriori. Bien que le rĂ©sultat de cette fusion ne soit pas nĂ©cessairement une chaĂźne de Markov, il a Ă©tĂ© rĂ©cemment Ă©tabli qu'il est une CMT, ce qui autorise les divers traitements d'intĂ©rĂȘt. De plus, les rĂ©sultats analogues restent valables lorsque l'on gĂ©nĂ©ralise les diffĂ©rentes chaĂźnes de Markov aux arbres de Markov. Nous proposons d'Ă©tendre ces rĂ©sultats aux arbres de Markov Couple, dans lesquels la loi du processus cachĂ© n'est pas nĂ©cessairement de Markov. Nous montrons Ă©galement l'intĂ©rĂȘt pratique de ce type de fusion dans la segmentation non supervisĂ©e des chaĂźnes de Markov non stationnaires, avec application Ă  la segmentation d'images

    Arbres de Markov Triplet et théorie de l'évidence

    Get PDF
    Les chaĂźnes de Markov Triplet (CMT) gĂ©nĂ©ralisent les chaĂźnes de Markov Couple (CMCouple), ces derniĂšres gĂ©nĂ©ralisant les chaĂźnes de Markov cachĂ©es (CMC). Par ailleurs, dans une CMC la loi a posteriori du processus cachĂ©, qui est de Markov, peut ĂȘtre vue comme une fusion de Dempster-Shafer (fusion DS) de sa loi p avec une probabilitĂ© q dĂ©finie Ă  partir des observations. Lorsque l'on se place dans le contexte de la thĂ©orie de l'Ă©vidence en remplaçant p par une fonction de masse M, sa fusion DS avec q gĂ©nĂ©ralise la probabilitĂ© a posteriori. Bien que le rĂ©sultat de cette fusion ne soit pas nĂ©cessairement une chaĂźne de Markov, il a Ă©tĂ© Ă©tabli qu'il est une CMT, ce qui autorise les divers traitements d'intĂ©rĂȘt. De plus, les rĂ©sultats analogues restent valables lorsque l'on gĂ©nĂ©ralise les diffĂ©rentes chaĂźnes de Markov aux arbres de Markov. Nous proposons d'Ă©tendre ces rĂ©sultats aux chaĂźnes et arbres de Markov Couple, dans les quels la loi du processus cachĂ© n'est pas nĂ©cessairement de Markov. Nous montrons Ă©galement l'intĂ©rĂȘt pratique de la thĂ©orie de l'Ă©vidence dans la segmentation non supervisĂ©e des chaĂźnes de Markov non stationnaires

    Improving lightly supervised training for broadcast transcription

    Get PDF
    This paper investigates improving lightly supervised acoustic model training for an archive of broadcast data. Standard lightly supervised training uses automatically derived decoding hypotheses using a biased language model. However, as the actual speech can deviate significantly from the original programme scripts that are supplied, the quality of standard lightly supervised hypotheses can be poor. To address this issue, word and segment level combination approaches are used between the lightly supervised transcripts and the original programme scripts which yield improved transcriptions. Experimental results show that systems trained using these improved transcriptions consistently outperform those trained using only the original lightly supervised decoding hypotheses. This is shown to be the case for both the maximum likelihood and minimum phone error trained systems.The research leading to these results was supported by EPSRC Programme Grant EP/I031022/1 (Natural Speech Technology).This is the accepted manuscript version. The final version is available at http://www.isca-speech.org/archive/interspeech_2013/i13_2187.html

    Second-Order Belief Hidden Markov Models

    Get PDF
    Hidden Markov Models (HMMs) are learning methods for pattern recognition. The probabilistic HMMs have been one of the most used techniques based on the Bayesian model. First-order probabilistic HMMs were adapted to the theory of belief functions such that Bayesian probabilities were replaced with mass functions. In this paper, we present a second-order Hidden Markov Model using belief functions. Previous works in belief HMMs have been focused on the first-order HMMs. We extend them to the second-order model

    The MGB Challenge: Evaluating Multi-genre Broadcast Media Recognition

    Get PDF
    This paper describes the Multi-Genre Broadcast (MGB) Challenge at ASRU 2015, an evaluation focused on speech recognition, speaker diarization, and "lightly supervised" alignment of BBC TV recordings. The challenge training data covered the whole range of seven weeks BBC TV output across four channels, resulting in about 1,600 hours of broadcast audio. In addition several hundred million words of BBC subtitle text was provided for language modelling. A novel aspect of the evaluation was the exploration of speech recognition and speaker diarization in a longitudinal setting - i.e. recognition of several episodes of the same show, and speaker diarization across these episodes, linking speakers. The longitudinal tasks also offered the opportunity for systems to make use of supplied metadata including show title, genre tag, and date/time of transmission. This paper describes the task data and evaluation process used in the MGB challenge, and summarises the results obtained

    Automatic transcription of multi-genre media archives

    Get PDF
    This paper describes some recent results of our collaborative work on developing a speech recognition system for the automatic transcription or media archives from the British Broadcasting Corporation (BBC). The material includes a wide diversity of shows with their associated metadata. The latter are highly diverse in terms of completeness, reliability and accuracy. First, we investigate how to improve lightly supervised acoustic training, when timestamp information is inaccurate and when speech deviates significantly from the transcription, and how to perform evaluations when no reference transcripts are available. An automatic timestamp correction method as well as a word and segment level combination approaches between the lightly supervised transcripts and the original programme scripts are presented which yield improved metadata. Experimental results show that systems trained using the improved metadata consistently outperform those trained with only the original lightly supervised decoding hypotheses. Secondly, we show that the recognition task may benefit from systems trained on a combination of in-domain and out-of-domain data. Working with tandem HMMs, we describe Multi-level Adaptive Networks, a novel technique for incorporating information from out-of domain posterior features using deep neural network. We show that it provides a substantial reduction in WER over other systems including a PLP-based baseline, in-domain tandem features, and the best out-of-domain tandem features.This research was supported by EPSRC Programme Grant EP/I031022/1 (Natural Speech Technology).This paper was presented at the First Workshop on Speech, Language and Audio in Multimedia, August 22-23, 2013; Marseille. It was published in CEUR Workshop Proceedings at http://ceur-ws.org/Vol-1012/

    Transcription of multi-genre media archives using out-of-domain data

    Get PDF
    We describe our work on developing a speech recognition system for multi-genre media archives. The high diversity of the data makes this a challenging recognition task, which may benefit from systems trained on a combination of in-domain and out-of-domain data. Working with tandem HMMs, we present Multi-level Adaptive Networks (MLAN), a novel technique for incorporating information from out-of-domain posterior features using deep neural networks. We show that it provides a substantial reduction in WER over other systems, with relative WER reductions of 15 % over a PLP baseline, 9 % over in-domain tandem features and 8 % over the best out-of-domain tandem features
    • 

    corecore