32,166 research outputs found
An Empirical Evaluation of Zero Resource Acoustic Unit Discovery
Acoustic unit discovery (AUD) is a process of automatically identifying a
categorical acoustic unit inventory from speech and producing corresponding
acoustic unit tokenizations. AUD provides an important avenue for unsupervised
acoustic model training in a zero resource setting where expert-provided
linguistic knowledge and transcribed speech are unavailable. Therefore, to
further facilitate zero-resource AUD process, in this paper, we demonstrate
acoustic feature representations can be significantly improved by (i)
performing linear discriminant analysis (LDA) in an unsupervised self-trained
fashion, and (ii) leveraging resources of other languages through building a
multilingual bottleneck (BN) feature extractor to give effective cross-lingual
generalization. Moreover, we perform comprehensive evaluations of AUD efficacy
on multiple downstream speech applications, and their correlated performance
suggests that AUD evaluations are feasible using different alternative language
resources when only a subset of these evaluation resources can be available in
typical zero resource applications.Comment: 5 pages, 1 figure; Accepted for publication at ICASSP 201
A Speaker Diarization System for Studying Peer-Led Team Learning Groups
Peer-led team learning (PLTL) is a model for teaching STEM courses where
small student groups meet periodically to collaboratively discuss coursework.
Automatic analysis of PLTL sessions would help education researchers to get
insight into how learning outcomes are impacted by individual participation,
group behavior, team dynamics, etc.. Towards this, speech and language
technology can help, and speaker diarization technology will lay the foundation
for analysis. In this study, a new corpus is established called CRSS-PLTL, that
contains speech data from 5 PLTL teams over a semester (10 sessions per team
with 5-to-8 participants in each team). In CRSS-PLTL, every participant wears a
LENA device (portable audio recorder) that provides multiple audio recordings
of the event. Our proposed solution is unsupervised and contains a new online
speaker change detection algorithm, termed G 3 algorithm in conjunction with
Hausdorff-distance based clustering to provide improved detection accuracy.
Additionally, we also exploit cross channel information to refine our
diarization hypothesis. The proposed system provides good improvements in
diarization error rate (DER) over the baseline LIUM system. We also present
higher level analysis such as the number of conversational turns taken in a
session, and speaking-time duration (participation) for each speaker.Comment: 5 Pages, 2 Figures, 2 Tables, Proceedings of INTERSPEECH 2016, San
Francisco, US
- …