Search CORE

1,887 research outputs found

AutoPrep: An Automatic Preprocessing Framework for In-the-Wild Speech Data

Author: Bian Yanyao
Chen Hangting
Jiang Jiayi
Li Xiang
Liu Mengyang
Luo Yi
Tian Jinchuan
Wang Shuai
Yu Jianwei
Publication venue
Publication date: 25/09/2023
Field of study

Recently, the utilization of extensive open-sourced text data has significantly advanced the performance of text-based large language models (LLMs). However, the use of in-the-wild large-scale speech data in the speech technology community remains constrained. One reason for this limitation is that a considerable amount of the publicly available speech data is compromised by background noise, speech overlapping, lack of speech segmentation information, missing speaker labels, and incomplete transcriptions, which can largely hinder their usefulness. On the other hand, human annotation of speech data is both time-consuming and costly. To address this issue, we introduce an automatic in-the-wild speech data preprocessing framework (AutoPrep) in this paper, which is designed to enhance speech quality, generate speaker labels, and produce transcriptions automatically. The proposed AutoPrep framework comprises six components: speech enhancement, speech segmentation, speaker clustering, target speech extraction, quality filtering and automatic speech recognition. Experiments conducted on the open-sourced WenetSpeech and our self-collected AutoPrepWild corpora demonstrate that the proposed AutoPrep framework can generate preprocessed data with similar DNSMOS and PDNSMOS scores compared to several open-sourced TTS datasets. The corresponding TTS system can achieve up to 0.68 in-domain speaker similarity

arXiv.org e-Print Archive

Lightly Supervised Discriminative Training of Grapheme Models for Improved Sentence-level Alignment of Speech and Text Data

Author: Bell Peter
King Simon
Stan Adriana
Yamagishi Junichi
Publication venue
Publication date: 01/08/2013
Field of study

Edinburgh Research Explorer

A conversational system for multi-session child-robot interaction with several games

Author: Athanasopoulos G
Beck A
Cañamero L
Cosi P
Cuayáhuitl H
Demiris Y
Enescu V
Hiolle A
Kiefer B
Kruijff-Korbayová I
Racioppa S
Ros R
Sahli H
Schröder M
Sommavilla G
Tesser F
Verhelst W
Wang W
Publication venue
Publication date: 24/09/2011
Field of study

Spiral - Imperial College Digital Repository

Continuous Interaction with a Virtual Human

Author: A Gravano
A Kendon
A Nijholt
AC Norwine
AH Anderson
AW Black
Bart van Straalen
C Goodwin
C Goodwin
CC Lee
D Heylen
D Heylen
D Neiberg
D Neiberg
D Reidsma
Daniel Neiberg
Dennis Reidsma
DT Fujimoto
E Kurtic
E Schegloff
F Eyben
G Skantze
H Sacks
H Welbergen van
H Welbergen van
Herwin van Welbergen
HH Clark
HH Clark
HH Clark
I Kok de
Iwan de Kok
J Allwood
J Edlund
J Gustafson
JB Bavelas
JB Bavelas
JC Carletta
Khiet Truong
KR Thórisson
M Heldner
M Maat ter
M Schröder
M Schröder
M Schröder
M Thiebaux
MB Walker
MF McKinneya
N Ward
N Ward
P French
PT Brady
S Benus
S Duncan Jr
S Goldwater
S Kopp
S Kopp
Sathish Chandra Pammi
T Toda
V Manusov
Publication venue: University of Amsterdam
Publication date: 01/01/2010
Field of study

Attentive Speaking and Active Listening require that a Virtual Human be capable of simultaneous perception/interpretation and production of communicative behavior. A Virtual Human should be able to signal its attitude and attention while it is listening to its interaction partner, and be able to attend to its interaction partner while it is speaking – and modify its communicative behavior on-the-fly based on what it perceives from its partner. This report presents the results of a four week summer project that was part of eNTERFACE’10. The project resulted in progress on several aspects of continuous interaction such as scheduling and interrupting multimodal behavior, automatic classification of listener responses, generation of response eliciting behavior, and models for appropriate reactions to listener responses. A pilot user study was conducted with ten participants. In addition, the project yielded a number of deliverables that are released for public access

Crossref

Springer - Publisher Connector

Publications at Bielefeld University

University of Twente Research Information