The CereProc Blizzard Entry 2009: Some dumb algorithms that don't work

Abstract

Within unit selection systems there is a constant tension between data sparsity and quality. This limits the control possible in a unit selection system. The RP data used in Blizzard this year and last year is expressive and spoken in a spirited manner. Last years entry focused on maintaining expressiveness, this year we focused on two simple algorithms to restrain and control this prosodic variation. 1) Variable width valley floor pruning on duration and pitch (Applied to the full database entry EH1), 2) Bulking of data with average HTS data (Applied to small database entry EH2). Results for both techniques were disappointing. The full database system achieved an MOS of around 2 (compared to 4 for a similar system attempting to emphasise variation in 2008), while the small database entry achieved an MOS of also 2 (compared to 3 for a similar system, but with a difference voice, entered in 2007). Index Terms: speech synthesis, unit selection. 1

    Similar works