9,989 research outputs found
Reducing Audible Spectral Discontinuities
In this paper, a common problem in diphone synthesis is discussed, viz., the occurrence of audible discontinuities at diphone boundaries. Informal observations show that spectral mismatch is most likely the cause of this phenomenon.We first set out to find an objective spectral measure for discontinuity. To this end, several spectral distance measures are related to the results of a listening experiment. Then, we studied the feasibility of extending the diphone database with context-sensitive diphones to reduce the occurrence of audible discontinuities. The number of additional diphones is limited by clustering consonant contexts that have a similar effect on the surrounding vowels on the basis of the best performing distance measure. A listening experiment has shown that the addition of these context-sensitive diphones significantly reduces the amount of audible discontinuities
Experiments on Jet Flows and Jet Noise Far- Field Spectra and Directivity Patterns
Jet flows and jet noise far-field spectral and direction pattern
Probabilistic Intra-Retinal Layer Segmentation in 3-D OCT Images Using Global Shape Regularization
With the introduction of spectral-domain optical coherence tomography (OCT),
resulting in a significant increase in acquisition speed, the fast and accurate
segmentation of 3-D OCT scans has become evermore important. This paper
presents a novel probabilistic approach, that models the appearance of retinal
layers as well as the global shape variations of layer boundaries. Given an OCT
scan, the full posterior distribution over segmentations is approximately
inferred using a variational method enabling efficient probabilistic inference
in terms of computationally tractable model components: Segmenting a full 3-D
volume takes around a minute. Accurate segmentations demonstrate the benefit of
using global shape regularization: We segmented 35 fovea-centered 3-D volumes
with an average unsigned error of 2.46 0.22 {\mu}m as well as 80 normal
and 66 glaucomatous 2-D circular scans with errors of 2.92 0.53 {\mu}m
and 4.09 0.98 {\mu}m respectively. Furthermore, we utilized the inferred
posterior distribution to rate the quality of the segmentation, point out
potentially erroneous regions and discriminate normal from pathological scans.
No pre- or postprocessing was required and we used the same set of parameters
for all data sets, underlining the robustness and out-of-the-box nature of our
approach.Comment: Accepted for publication in Medical Image Analysis (MIA), Elsevie
Self-Supervised Contrastive Learning for Unsupervised Phoneme Segmentation
We propose a self-supervised representation learning model for the task of
unsupervised phoneme boundary detection. The model is a convolutional neural
network that operates directly on the raw waveform. It is optimized to identify
spectral changes in the signal using the Noise-Contrastive Estimation
principle. At test time, a peak detection algorithm is applied over the model
outputs to produce the final boundaries. As such, the proposed model is trained
in a fully unsupervised manner with no manual annotations in the form of target
boundaries nor phonetic transcriptions. We compare the proposed approach to
several unsupervised baselines using both TIMIT and Buckeye corpora. Results
suggest that our approach surpasses the baseline models and reaches
state-of-the-art performance on both data sets. Furthermore, we experimented
with expanding the training set with additional examples from the Librispeech
corpus. We evaluated the resulting model on distributions and languages that
were not seen during the training phase (English, Hebrew and German) and showed
that utilizing additional untranscribed data is beneficial for model
performance.Comment: Interspeech 2020 pape
- …