Search CORE

9,989 research outputs found

Reducing Audible Spectral Discontinuities

Author: Klabbers Esther
Veldhuis Raymond
Publication venue: IEEE Computer Society Press
Publication date: 01/01/2001
Field of study

In this paper, a common problem in diphone synthesis is discussed, viz., the occurrence of audible discontinuities at diphone boundaries. Informal observations show that spectral mismatch is most likely the cause of this phenomenon.We first set out to find an objective spectral measure for discontinuity. To this end, several spectral distance measures are related to the results of a listening experiment. Then, we studied the feasibility of extending the diphone database with context-sensitive diphones to reduce the occurrence of audible discontinuities. The number of additional diphones is limited by clustering consonant contexts that have a similar effect on the surrounding vowels on the basis of the best performing distance measure. A listening experiment has shown that the addition of these context-sensitive diphones significantly reduces the amount of audible discontinuities

Crossref

Pure OAI Repository

University of Twente Research Information

Experiments on Jet Flows and Jet Noise Far- Field Spectra and Directivity Patterns

Author: Kolpin Marc A.
Martuccelli John R.
Mollo-Christensen Erik
Publication venue
Publication date: 01/01/1964
Field of study

Jet flows and jet noise far-field spectral and direction pattern

NASA Technical Reports Server

Probabilistic Intra-Retinal Layer Segmentation in 3-D OCT Images Using Global Shape Regularization

Author: Rathke Fabian
Schmidt Stefan
Schnörr Christoph
Publication venue
Publication date: 01/01/2014
Field of study

With the introduction of spectral-domain optical coherence tomography (OCT), resulting in a significant increase in acquisition speed, the fast and accurate segmentation of 3-D OCT scans has become evermore important. This paper presents a novel probabilistic approach, that models the appearance of retinal layers as well as the global shape variations of layer boundaries. Given an OCT scan, the full posterior distribution over segmentations is approximately inferred using a variational method enabling efficient probabilistic inference in terms of computationally tractable model components: Segmenting a full 3-D volume takes around a minute. Accurate segmentations demonstrate the benefit of using global shape regularization: We segmented 35 fovea-centered 3-D volumes with an average unsigned error of 2.46

\pm

0.22 {\mu}m as well as 80 normal and 66 glaucomatous 2-D circular scans with errors of 2.92

\pm

0.53 {\mu}m and 4.09

\pm

0.98 {\mu}m respectively. Furthermore, we utilized the inferred posterior distribution to rate the quality of the segmentation, point out potentially erroneous regions and discriminate normal from pathological scans. No pre- or postprocessing was required and we used the same set of parameters for all data sets, underlining the robustness and out-of-the-box nature of our approach.Comment: Accepted for publication in Medical Image Analysis (MIA), Elsevie

arXiv.org e-Print Archive

CiteSeerX

Duration and spectral balance of intervocalic consonants: A case for efficient communication

Author: Aylett
Borsky
Byrd
Byrd
Chennoukh
Clark
Cutler
Cutler
de Jong
de Jong
Dodge
Fougeron
Fourakis
Hanson
Hanson
Jan P.H. van Santen
Jongman
Lieberman
Lindblom
O’Shaughnessy
R.J.J.H. van Son
Rietveld
Sluijter
Sluijter
Sproat
Tabain
Turk
Turk
Turk
Umeda
Van Santen
Van Santen
Van Santen
Van Son
Van Son
Van Son
Van Son
Vitevitch
Wightman
Zue
Publication venue: 'Elsevier BV'
Publication date: 01/01/2005
Field of study

Crossref

International Migration, Integration and Social Cohesion online publications

Self-Supervised Contrastive Learning for Unsupervised Phoneme Segmentation

Author: Adi Yossi
Keshet Joseph
Kreuk Felix
Publication venue
Publication date: 06/08/2020
Field of study

We propose a self-supervised representation learning model for the task of unsupervised phoneme boundary detection. The model is a convolutional neural network that operates directly on the raw waveform. It is optimized to identify spectral changes in the signal using the Noise-Contrastive Estimation principle. At test time, a peak detection algorithm is applied over the model outputs to produce the final boundaries. As such, the proposed model is trained in a fully unsupervised manner with no manual annotations in the form of target boundaries nor phonetic transcriptions. We compare the proposed approach to several unsupervised baselines using both TIMIT and Buckeye corpora. Results suggest that our approach surpasses the baseline models and reaches state-of-the-art performance on both data sets. Furthermore, we experimented with expanding the training set with additional examples from the Librispeech corpus. We evaluated the resulting model on distributions and languages that were not seen during the training phase (English, Hebrew and German) and showed that utilizing additional untranscribed data is beneficial for model performance.Comment: Interspeech 2020 pape

arXiv.org e-Print Archive

Crossref