Search CORE

4,257 research outputs found

Using morphology and phoneme history to improve grapheme-to-phoneme conversion

Author: Reichel Uwe D.
Schiel Florian
Publication venue
Publication date: 01/01/2005
Field of study

A hardware spinal decoder

Author: Balakrishnan Hari
Fleming Kermin Elliott
Iannucci Peter A.
Perry Jonathan
Shah Devavrat
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2012
Field of study

Spinal codes are a recently proposed capacity-achieving rateless code. While hardware encoding of spinal codes is straightforward, the design of an efficient, high-speed hardware decoder poses significant challenges. We present the first such decoder. By relaxing data dependencies inherent in the classic M-algorithm decoder, we obtain area and throughput competitive with 3GPP turbo codes as well as greatly reduced latency and complexity. The enabling architectural feature is a novel alpha-beta incremental approximate selection algorithm. We also present a method for obtaining hints which anticipate successful or failed decoding, permitting early termination and/or feedback-driven adaptation of the decoding parameters. We have validated our implementation in FPGA with on-air testing. Provisional hardware synthesis suggests that a near-capacity implementation of spinal codes can achieve a throughput of 12.5 Mbps in a 65 nm technology while using substantially less area than competitive 3GPP turbo code implementations.Irwin Mark Jacobs and Joan Klein Jacobs Presidential FellowshipIntel Corporation (Fellowship)Claude E. Shannon Research Assistantshi

CiteSeerX

DSpace@MIT

Crossref

Composition of Deep and Spiking Neural Networks for Very Low Bit Rate Speech Coding

Author: Asaei Afsaneh
Cernak Milos
Garner Philip N.
Lazaridis Alexandros
Publication venue: Idiap
Publication date: 19/04/2016
Field of study

Most current very low bit rate (VLBR) speech coding systems use hidden Markov model (HMM) based speech recognition/synthesis techniques. This allows transmission of information (such as phonemes) segment by segment that decreases the bit rate. However, the encoder based on a phoneme speech recognition may create bursts of segmental errors. Segmental errors are further propagated to optional suprasegmental (such as syllable) information coding. Together with the errors of voicing detection in pitch parametrization, HMM-based speech coding creates speech discontinuities and unnatural speech sound artefacts. In this paper, we propose a novel VLBR speech coding framework based on neural networks (NNs) for end-to-end speech analysis and synthesis without HMMs. The speech coding framework relies on phonological (sub-phonetic) representation of speech, and it is designed as a composition of deep and spiking NNs: a bank of phonological analysers at the transmitter, and a phonological synthesizer at the receiver, both realised as deep NNs, and a spiking NN as an incremental and robust encoder of syllable boundaries for coding of continuous fundamental frequency (F0). A combination of phonological features defines much more sound patterns than phonetic features defined by HMM-based speech coders, and the finer analysis/synthesis code contributes into smoother encoded speech. Listeners significantly prefer the NN-based approach due to fewer discontinuities and speech artefacts of the encoded speech. A single forward pass is required during the speech encoding and decoding. The proposed VLBR speech coding operates at a bit rate of approximately 360 bits/s

Infoscience - École polytechnique fédérale de Lausanne

arXiv.org e-Print Archive

Continuous Interaction with a Virtual Human

Author: A Gravano
A Kendon
A Nijholt
AC Norwine
AH Anderson
AW Black
Bart van Straalen
C Goodwin
C Goodwin
CC Lee
D Heylen
D Heylen
D Neiberg
D Neiberg
D Reidsma
Daniel Neiberg
Dennis Reidsma
DT Fujimoto
E Kurtic
E Schegloff
F Eyben
G Skantze
H Sacks
H Welbergen van
H Welbergen van
Herwin van Welbergen
HH Clark
HH Clark
HH Clark
I Kok de
Iwan de Kok
J Allwood
J Edlund
J Gustafson
JB Bavelas
JB Bavelas
JC Carletta
Khiet Truong
KR Thórisson
M Heldner
M Maat ter
M Schröder
M Schröder
M Schröder
M Thiebaux
MB Walker
MF McKinneya
N Ward
N Ward
P French
PT Brady
S Benus
S Duncan Jr
S Goldwater
S Kopp
S Kopp
Sathish Chandra Pammi
T Toda
V Manusov
Publication venue: University of Amsterdam
Publication date: 01/01/2010
Field of study

Attentive Speaking and Active Listening require that a Virtual Human be capable of simultaneous perception/interpretation and production of communicative behavior. A Virtual Human should be able to signal its attitude and attention while it is listening to its interaction partner, and be able to attend to its interaction partner while it is speaking – and modify its communicative behavior on-the-fly based on what it perceives from its partner. This report presents the results of a four week summer project that was part of eNTERFACE’10. The project resulted in progress on several aspects of continuous interaction such as scheduling and interrupting multimodal behavior, automatic classification of listener responses, generation of response eliciting behavior, and models for appropriate reactions to listener responses. A pilot user study was conducted with ten participants. In addition, the project yielded a number of deliverables that are released for public access

Crossref

Springer - Publisher Connector

Publications at Bielefeld University

University of Twente Research Information