Search CORE

9 research outputs found

17 ways to say yes:Toward nuanced tone of voice in AAC and speech technology

Author: Alm N.
Astrinaki M.
Beukelman D
Blackstone S.
Bunnell H. T.
Campbell N
Clark H. H
Clark R.
Cook A
Crystal D
Crystal D
Crystal D
Dunne A.
Flach G
Fox A
Fukasawa N
Goodwin C
Graham Pullin
Hammett D
Hennig S
Hennig S.
Higginbotham J
Holmes J.
Jakobson R
Jones D
Jreige C.
Kiely O
Light J.
Light J.
Light J.
Light J.
Neurath O
Norman D
Pieraccini R
Pullin G
Pullin G
Pullin G.
Pullin G.
Pullin G.
Pullin G.
Robillard A
Scherer K
Schröder M.
Shannon Hennig
Shaw G. B
Székely É.
Todman J
Waller A.
Yamagishi J.
Publication venue: 'Informa UK Limited'
Publication date: 01/01/2015
Field of study

People with complex communication needs who use speech-generating devices have very little expressive control over their tone of voice. Despite its importance in human interaction, the issue of tone of voice remains all but absent from AAC research and development however. In this paper, we describe three interdisciplinary projects, past, present and future: The critical design collection Six Speaking Chairs has provoked deeper discussion and inspired a social model of tone of voice; the speculative concept Speech Hedge illustrates challenges and opportunities in designing more expressive user interfaces; the pilot project Tonetable could enable participatory research and seed a research network around tone of voice. We speculate that more radical interactions might expand frontiers of AAC and disrupt speech technology as a whole

Crossref

PubMed Central

University of Dundee Online Publications

Gesture Control of HMM-Based Singing Voice Synthesis

Author: Astrinaki Maria
Clark Robert
Oura K.
Veaux Christophe
Yamagishi Junichi
Publication venue
Publication date: 01/08/2013
Field of study

Edinburgh Research Explorer

Stress and Accent Transmission In HMM-Based Syllable-Context Very Low Bit Rate Speech Coding

Author: Alexandros Lazaridis
Milos Cernak
Petr Motlicek
Philip N Garner
Publication venue
Publication date: 11/04/2020
Field of study

Abstract In this paper, we propose a solution to reconstruct stress and accent contextual factors at the receiver of a very low bitrate speech codec built on recognition/synthesis architecture. In speech synthesis, accent and stress symbols are predicted from the text, which is not available at the receiver side of the speech codec. Therefore, speech signal-based symbols, generated as syllable-level log average F0 and energy acoustic measures, quantized using a scalar quantization, are used instead of accentual and stress symbols for HMM-based speech synthesis. Results from incremental real-time speech synthesis confirmed, that a combination of F0 and energy signal-based symbols can replace their counterparts of text-based binary accent and stress symbols developed for text-to-speech systems. The estimated transmission bit-rate overhead is about 14 bits/second per acoustic measure

CiteSeerX

Phonological vocoding using artificial neural networks

Author: Cernak Milos
Garner Philip N.
Potard Blaise
Publication venue: Idiap
Publication date: 19/02/2015
Field of study

We investigate a vocoder based on artificial neural networks using a phonological speech representation. Speech decomposition is based on the phonological encoders, realised as neural network classifiers, that are trained for a particular language. The speech reconstruction process involves using a Deep Neural Network (DNN) to map phonological features posteriors to speech parameters -- line spectra and glottal signal parameters -- followed by LPC resynthesis. This DNN is trained on a target voice without transcriptions, in a semi-supervised manner. Both encoder and decoder are based on neural networks and thus the vocoding is achieved using a simple fast forward pass. An experiment with French vocoding and a target male voice trained on 21 hour long audio book is presented. An application of the phonological vocoder to low bit rate speech coding is shown, where transmitted phonological posteriors are pruned and quantized. The vocoder with scalar quantization operates at 1 kbps, with potential for lower bit-rate

Infoscience - École polytechnique fédérale de Lausanne

Crossref

Reactive Statistical Mapping: Towards the Sketching of Performative Control with Data

Author: Astrinaki Maria
Babacan Onur
Barbulescu Adela
Cakmak Huseyin
Dall Rasmus
d’Alessandro Nicolas
Hu Qiong
Hueber Thomas
Huguenin Victor
Kalaycı Emine Sümeyye
Moinet Alexis
Parfait Valentin
Ravet Thierry
Tilmanne Joëlle
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 15/07/2013
Field of study

Part 1: Fundamental IssuesInternational audienceThis paper presents the results of our participation to the ninth eNTERFACE workshop on multimodal user interfaces. Our target for this workshop was to bring some technologies currently used in speech recognition and synthesis to a new level, i.e. being the core of a new HMM-based mapping system. The idea of statistical mapping has been investigated, more precisely how to use Gaussian Mixture Models and Hidden Markov Models for realtime and reactive generation of new trajectories from inputted labels and for realtime regression in a continuous-to-continuous use case. As a result, we have developed several proofs of concept, including an incremental speech synthesiser, a software for exploring stylistic spaces for gait and facial motion in realtime, a reactive audiovisual laughter and a prototype demonstrating the realtime reconstruction of lower body gait motion strictly from upper body motion, with conservation of the stylistic properties. This project has been the opportunity to formalise HMM-based mapping, integrate various of these innovations into the Mage library and explore the development of a realtime gesture recognition tool

CiteSeerX

Crossref

Hal - Université Grenoble Alpes

INRIA a CCSD electronic archive server

Incremental Syllable-Context Phonetic Vocoding

Author
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date
Field of study

Crossref

DAVID: An open-source platform for real-time transformation of infra-segmental emotional cues in running speech

Author: A Camacho
A Lind
A Roebel
A Todorov
A Öhman
Andreas Lind
AR Damasio
AW Siegman
C Dromey
CA Klofstad
CA Klofstad
CF Lima
D Matsumoto
D Matsumoto
Daniel Richardson
DM Tice
E Goeleven
E Moulines
EB Roesch
G Pullin
H Elfenbein
HG Wallbott
HL Wagner
J Aucouturier
J Pittam
J Vukovic
JA Bachorowski
JB Engelmann
JB Russ
Jean-Julien Aucouturier
JM Richards
JN Parker
JW Peirce
K Hammerschmidt
K Scherer
Katsumi Watanabe
KO McGraw
KR Scherer
KR Scherer
Lars Hall
Laura Rachman
M Arminjon
M Biehl
M Bulut
Marco Liuni
MC Mangini
MD Pell
MD Pell
MH Kayyal
P Belin
P Laukka
P Prablanc
Pablo Arias
PEG Bestelmeyer
Petter Johansson
PN Juslin
PN Juslin
PN Juslin
PY Oudeyer
Q Xiao
R Banse
R Behroozmand
R Jürgens
R Rosenthal
RB Slatcher
S Cheung
S Grimm
S Kitayama
S Kitayama
S Paquette
S Paulmann
S Shahidi
S Takeda
SL Neuberg
Stéphanie Dubal
T Bänziger
T Ethofer
T Mills
V Tartter
W Ma
W Sato
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Crossref

REACTIVE AND CONTINUOUS CONTROL OF HMM-BASED SPEECH SYNTHESIS

Author: Benjamin Picart
Maria Astrinaki
Thierry Dutoit
Thomas Drugman
Publication venue
Publication date
Field of study

In this paper, we present a modified version of HTS, called performative HTS or pHTS. The objective of pHTS is to enhance the control ability and reactivity of HTS. pHTS reduces the phonetic context used for training the models and generates the speech parameters within a 2-label window. Speech waveforms are generated on-the-fly and the models can be reactively modified, impacting the synthesized speech with a delay of only one phoneme. It is shown that HTS and pHTS have comparable output quality. We use this new system to achieve reactive model interpolation and conduct a new test where articulation degree is modified within the sentence. Index Terms — speech synthesis, HTS, reactive control 1

CiteSeerX

Reactive and continuous control of HMM-based speech synthesis

Author
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date
Field of study

Crossref