4,173 research outputs found
An Empirical Evaluation of Zero Resource Acoustic Unit Discovery
Acoustic unit discovery (AUD) is a process of automatically identifying a
categorical acoustic unit inventory from speech and producing corresponding
acoustic unit tokenizations. AUD provides an important avenue for unsupervised
acoustic model training in a zero resource setting where expert-provided
linguistic knowledge and transcribed speech are unavailable. Therefore, to
further facilitate zero-resource AUD process, in this paper, we demonstrate
acoustic feature representations can be significantly improved by (i)
performing linear discriminant analysis (LDA) in an unsupervised self-trained
fashion, and (ii) leveraging resources of other languages through building a
multilingual bottleneck (BN) feature extractor to give effective cross-lingual
generalization. Moreover, we perform comprehensive evaluations of AUD efficacy
on multiple downstream speech applications, and their correlated performance
suggests that AUD evaluations are feasible using different alternative language
resources when only a subset of these evaluation resources can be available in
typical zero resource applications.Comment: 5 pages, 1 figure; Accepted for publication at ICASSP 201
Optical tomography: Image improvement using mixed projection of parallel and fan beam modes
Mixed parallel and fan beam projection is a technique used to increase the quality images. This research focuses on enhancing the image quality in optical tomography. Image quality can be defined by measuring the Peak Signal to Noise Ratio (PSNR) and Normalized Mean Square Error (NMSE) parameters. The findings of this research prove that by combining parallel and fan beam projection, the image quality can be increased by more than 10%in terms of its PSNR value and more than 100% in terms of its NMSE value compared to a single parallel beam
Feature Trajectory Dynamic Time Warping for Clustering of Speech Segments
Dynamic time warping (DTW) can be used to compute the similarity between two
sequences of generally differing length. We propose a modification to DTW that
performs individual and independent pairwise alignment of feature trajectories.
The modified technique, termed feature trajectory dynamic time warping (FTDTW),
is applied as a similarity measure in the agglomerative hierarchical clustering
of speech segments. Experiments using MFCC and PLP parametrisations extracted
from TIMIT and from the Spoken Arabic Digit Dataset (SADD) show consistent and
statistically significant improvements in the quality of the resulting clusters
in terms of F-measure and normalised mutual information (NMI).Comment: 10 page
Linguistic unit discovery from multi-modal inputs in unwritten languages: Summary of the "Speaking Rosetta" JSALT 2017 Workshop
We summarize the accomplishments of a multi-disciplinary workshop exploring
the computational and scientific issues surrounding the discovery of linguistic
units (subwords and words) in a language without orthography. We study the
replacement of orthographic transcriptions by images and/or translated text in
a well-resourced language to help unsupervised discovery from raw speech.Comment: Accepted to ICASSP 201
- …