58 research outputs found
Evidential Markov chains and trees with applications to non stationary processes segmentation
The triplet Markov chains (TMC) generalize the pairwise Markov chains (PMC), and the latter generalize the hidden
Markov chains (HMC). Otherwise, in an HMC the posterior distribution of the hidden process can be viewed as a
particular case of the so called "Dempster's combination rule" of its prior Markov distribution p with a probability q defined
from the observations. When we place ourselves in the theory of evidence context by replacing p by a mass function
m, the result of the Dempster's combination of m with q generalizes the conventional posterior distribution of the
hidden process. Although this result is not necessarily a Markov distribution, it has been recently shown that it is a TMC,
which renders traditional restoration methods applicable. Further, these results remain valid when replacing the Markov
chains with Markov trees. We propose to extend these results to Pairwise Markov trees. Further, we show the practical
interest of such combination in the unsupervised segmentation of non stationary hidden Markov chains, with application
to unsupervised image segmentation.Les chaßnes de Markov Triplet (CMT) généralisent les chaßnes de Markov Couple (CMCouple), ces derniÚres
généralisant les chaßnes de Markov cachées (CMC). Par ailleurs, dans une CMC la loi a posteriori du processus
cachĂ©, qui est de Markov, peut ĂȘtre vue comme une combinaison de Dempster de sa loi a priori p avec une
probabilité q définie à partir des observations. Lorsque l'on se place dans le contexte de la théorie de
l'évidence en remplaçant p par une fonction de masse m, sa combinaison de Dempster avec q généralise
ainsi la probabilité a posteriori. Bien que le résultat de cette fusion ne soit pas nécessairement une chaßne de
Markov, il a Ă©tĂ© rĂ©cemment Ă©tabli qu'il est une CMT, ce qui autorise les divers traitements d'intĂ©rĂȘt. De plus,
les résultats analogues restent valables lorsque l'on généralise les différentes chaßnes de Markov aux arbres
de Markov. Nous proposons d'étendre ces résultats aux arbres de Markov Couple, dans lesquels la loi du
processus cachĂ© n'est pas nĂ©cessairement de Markov. Nous montrons Ă©galement l'intĂ©rĂȘt pratique de ce type
de fusion dans la segmentation non supervisée des chaßnes de Markov non stationnaires, avec application à la
segmentation d'images
Arbres de Markov Triplet et théorie de l'évidence
Les chaĂźnes de Markov Triplet (CMT) gĂ©nĂ©ralisent les chaĂźnes de Markov Couple (CMCouple), ces derniĂšres gĂ©nĂ©ralisant les chaĂźnes de Markov cachĂ©es (CMC). Par ailleurs, dans une CMC la loi a posteriori du processus cachĂ©, qui est de Markov, peut ĂȘtre vue comme une fusion de Dempster-Shafer (fusion DS) de sa loi p avec une probabilitĂ© q dĂ©finie Ă partir des observations. Lorsque l'on se place dans le contexte de la thĂ©orie de l'Ă©vidence en remplaçant p par une fonction de masse M, sa fusion DS avec q gĂ©nĂ©ralise la probabilitĂ© a posteriori. Bien que le rĂ©sultat de cette fusion ne soit pas nĂ©cessairement une chaĂźne de Markov, il a Ă©tĂ© Ă©tabli qu'il est une CMT, ce qui autorise les divers traitements d'intĂ©rĂȘt. De plus, les rĂ©sultats analogues restent valables lorsque l'on gĂ©nĂ©ralise les diffĂ©rentes chaĂźnes de Markov aux arbres de Markov. Nous proposons d'Ă©tendre ces rĂ©sultats aux chaĂźnes et arbres de Markov Couple, dans les quels la loi du processus cachĂ© n'est pas nĂ©cessairement de Markov. Nous montrons Ă©galement l'intĂ©rĂȘt pratique de la thĂ©orie de l'Ă©vidence dans la segmentation non supervisĂ©e des chaĂźnes de Markov non stationnaires
Improving lightly supervised training for broadcast transcription
This paper investigates improving lightly supervised acoustic
model training for an archive of broadcast data. Standard
lightly supervised training uses automatically derived decoding
hypotheses using a biased language model. However, as the
actual speech can deviate significantly from the original programme
scripts that are supplied, the quality of standard lightly
supervised hypotheses can be poor. To address this issue, word
and segment level combination approaches are used between
the lightly supervised transcripts and the original programme
scripts which yield improved transcriptions. Experimental results
show that systems trained using these improved transcriptions
consistently outperform those trained using only the original
lightly supervised decoding hypotheses. This is shown to be
the case for both the maximum likelihood and minimum phone
error trained systems.The research leading to these results was supported by EPSRC Programme Grant EP/I031022/1 (Natural Speech Technology).This is the accepted manuscript version. The final version is available at http://www.isca-speech.org/archive/interspeech_2013/i13_2187.html
Recommended from our members
A Pulse Model in Log-domain for a Uniform Synthesizer
The quality of the vocoder plays a crucial role in the performance of parametric speech synthesis systems. In order to improve the vocoder quality, it is necessary to reconstruct as much of the perceived components of the speech signal as possible. In this paper, we first show that the noise component is currently not accurately modelled in the widely used STRAIGHT vocoder, thus, limiting the voice range that can be covered and also limiting the overall quality. In order to motivate a new, alternative, approach to this issue, we present a new synthesizer, which uses a uniform representation for voiced and unvoiced segments. This synthesizer has also the advantage of using a simple signal model compared to other approaches, thus offering a convenient and controlled alternative for future developments. Experiments analysing the synthesis quality of the noise component shows improved speech reconstruction using the suggested synthesizer compared to STRAIGHT. Additionally an experiment about analysis/resynthesis shows that the suggested synthesizer solves some of the issues of another uniform vocoder, Harmonic Model plus Phase Distortion (HMPD). In text-to-speech synthesis, it outperforms HMPD and exhibits a similar, or only slightly worse, quality to STRAIGHTâs quality, which is encouraging for a new vocoding approach.This project has received funding from the European Unionâs
Horizon 2020 research and innovation programme under the
Marie Sklodowska-Curie grant agreement No 655764. The research
for this paper was also partly supported by EPSRC grant
EP/I031022/1 (Natural Speech Technology)
Recommended from our members
Speaker diarisation and longitudinal linking in multi-genre broadcast data
This paper presents a multi-stage speaker diarisation system with longitudinal linking developed on BBC multi-genre data for the 2015 Multi-Genre Broadcast (MGB) challenge. The basic speaker diarisation system draws on techniques from the Cambridge March 2005 system with a new deep neural network (DNN)-based speech/non speech segmenter. A newly developed linking stage is next added to the basic diarisation output aiming at the identification of speakers across multiple episodes of the same series. The longitudinal constraint imposes an incremental processing of the episodes, where speaker labels for each episode can be obtained using only material from the episode in question, and those broadcast earlier in time. The nature of the data as well as the longitudinal linking constraint position this diarisation task as a new open-research topic, and a particularly challenging one. Different linking clustering metrics are compared and the lowest within-episode and cross-episode DER scores are achieved on the MGB challenge evaluation set.This work is in part supported by EPSRC Programme Grant EP/I031022/1 (Natural Speech Technology). C. Zhang is also supported by a Cambridge International Scholarship from the Cambridge Commonwealth, European & International Trust.This is the author accepted manuscript. The final version is available from IEEE via http://dx.doi.org/10.1109/ASRU.2015.740485
Second-Order Belief Hidden Markov Models
Hidden Markov Models (HMMs) are learning methods for pattern recognition. The
probabilistic HMMs have been one of the most used techniques based on the
Bayesian model. First-order probabilistic HMMs were adapted to the theory of
belief functions such that Bayesian probabilities were replaced with mass
functions. In this paper, we present a second-order Hidden Markov Model using
belief functions. Previous works in belief HMMs have been focused on the
first-order HMMs. We extend them to the second-order model
The MGB Challenge: Evaluating Multi-genre Broadcast Media Recognition
This paper describes the Multi-Genre Broadcast (MGB) Challenge at ASRU 2015, an evaluation focused on speech recognition, speaker diarization, and "lightly supervised" alignment of BBC TV recordings. The challenge training data covered the whole range of seven weeks BBC TV output across four channels, resulting in about 1,600 hours of broadcast audio. In addition several hundred million words of BBC subtitle text was provided for language modelling. A novel aspect of the evaluation was the exploration of speech recognition and speaker diarization in a longitudinal setting - i.e. recognition of several episodes of the same show, and speaker diarization across these episodes, linking speakers. The longitudinal tasks also offered the opportunity for systems to make use of supplied metadata including show title, genre tag, and date/time of transmission. This paper describes the task data and evaluation process used in the MGB challenge, and summarises the results obtained
Automatic transcription of multi-genre media archives
This paper describes some recent results of our collaborative work on
developing a speech recognition system for the automatic transcription
or media archives from the British Broadcasting Corporation (BBC). The
material includes a wide diversity of shows with their associated
metadata. The latter are highly diverse in terms of completeness,
reliability and accuracy. First, we investigate how to improve lightly
supervised acoustic training, when timestamp information is inaccurate
and when speech deviates significantly from the transcription, and how
to perform evaluations when no reference transcripts are available.
An automatic timestamp correction method as well as a word and segment
level combination approaches between the lightly supervised transcripts
and the original programme scripts are presented which yield improved
metadata. Experimental results show that systems trained using the
improved metadata consistently outperform those trained with only the
original lightly supervised decoding hypotheses. Secondly, we show that
the recognition task may benefit from systems trained on a combination
of in-domain and out-of-domain data. Working with tandem HMMs, we
describe Multi-level Adaptive Networks, a novel technique for
incorporating information from out-of domain posterior features using
deep neural network. We show that it provides a substantial reduction in
WER over other systems including a PLP-based baseline, in-domain tandem
features, and the best out-of-domain tandem features.This research was supported by EPSRC Programme Grant EP/I031022/1 (Natural Speech Technology).This paper was presented at the First Workshop on Speech, Language and Audio in Multimedia, August 22-23, 2013; Marseille. It was published in CEUR Workshop Proceedings at http://ceur-ws.org/Vol-1012/
Transcription of multi-genre media archives using out-of-domain data
We describe our work on developing a speech recognition system for multi-genre media archives. The high diversity of the data makes this a challenging recognition task, which may benefit from systems trained on a combination of in-domain and out-of-domain data. Working with tandem HMMs, we present Multi-level Adaptive Networks (MLAN), a novel technique for incorporating information from out-of-domain posterior features using deep neural networks. We show that it provides a substantial reduction in WER over other systems, with relative WER reductions of 15 % over a PLP baseline, 9 % over in-domain tandem features and 8 % over the best out-of-domain tandem features
- âŠ