51 research outputs found
Estimating Speaking Rate by Means of Rhythmicity Parameters
In this paper we present a speech rate estimator based on so-called rhythmicity features derived from a modified version of the short-time energy envelope. To evaluate the new method, it is compared to a traditional speech rate estimator on the basis of semi-automatic segmentation. Speech material from the Alcohol Language Corpus (ALC) covering intoxicated and sober speech of different speech styles provides a statistically sound foundation to test upon. The proposed measure clearly correlates with the semi-automatically determined speech rate and seems to be robust across speech styles and speaker states
Finding the Most Uniform Changes in Vowel Polygon Caused by Psychological Stress
Using vowel polygons, exactly their parameters, is chosen as the criterion for achievement of differences between normal state of speaker and relevant speech under real psychological stress. All results were experimentally obtained by created software for vowel polygon analysis applied on ExamStress database. Selected 6 methods based on cross-correlation of different features were classified by the coefficient of variation and for each individual vowel polygon, the efficiency coefficient marking the most significant and uniform differences between stressed and normal speech were calculated. As the best method for observing generated differences resulted method considered mean of cross correlation values received for difference area value with vector length and angle parameter couples. Generally, best results for stress detection are achieved by vowel triangles created by /i/-/o/-/u/ and /a/-/i/-/o/ vowel triangles in formant planes containing the fifth formant F5 combined with other formants
A Speech Feature Vector based on its Maximum Phase Component
This paper examines the performance of a vowel classification scheme using a new form of feature vector
derived from a decomposition of the speech segment into Maximum Phase and Minimum Phase components.
Justification for this approach in terms of its perceptual relevance is first made, followed by a signal processing
scheme to obtain the components. The form for the feature vector is then discussed. Lastly, experimental work
compares the performance of this new feature vector under a variety of distortion conditions with the contemporary popular choice of Mel-Frequency Cepstral Coefficients
A Speech Feature Vector based on its Maximum Phase Component
This paper examines the performance of a vowel classification scheme using a new form of feature vector
derived from a decomposition of the speech segment into Maximum Phase and Minimum Phase components.
Justification for this approach in terms of its perceptual relevance is first made, followed by a signal processing
scheme to obtain the components. The form for the feature vector is then discussed. Lastly, experimental work
compares the performance of this new feature vector under a variety of distortion conditions with the contemporary popular choice of Mel-Frequency Cepstral Coefficients
Language Model Adaptation for Statistical Machine Translation with Structured Query Models
We explore unsupervised language model adaptation techniques for Statistical Machine Translation. The hypotheses from the machine translation output are converted into queries at different levels of representation power and used to extract similar sentences from very large monolingual text collection. Specific language models are then build from the retrieved data and interpolated with a general background model. Experiments show significant improvements when translating with these adapted language models
Visualizing Classifier Adjacency Relations: A Case Study in Speaker Verification and Voice Anti-Spoofing
Whether it be for results summarization, or the analysis of classifier
fusion, some means to compare different classifiers can often provide
illuminating insight into their behaviour, (dis)similarity or complementarity.
We propose a simple method to derive 2D representation from detection scores
produced by an arbitrary set of binary classifiers in response to a common
dataset. Based upon rank correlations, our method facilitates a visual
comparison of classifiers with arbitrary scores and with close relation to
receiver operating characteristic (ROC) and detection error trade-off (DET)
analyses. While the approach is fully versatile and can be applied to any
detection task, we demonstrate the method using scores produced by automatic
speaker verification and voice anti-spoofing systems. The former are produced
by a Gaussian mixture model system trained with VoxCeleb data whereas the
latter stem from submissions to the ASVspoof 2019 challenge.Comment: Accepted to Interspeech 2021. Example code available at
https://github.com/asvspoof-challenge/classifier-adjacenc
English-Latvian SMT: the challenge of translating into a free word order language
This paper presents a comparative study of two approaches to
statistical machine translation (SMT) and their application to
a task of English-to-Latvian translation, which is still an open
research line in the field of automatic translation.
We consider a state-of-the-art phrase-based SMT and an
alternative N-gram-based SMT systems. The major differences
between these two approaches lie in the distinct representations
of bilingual units, which are the components of the
bilingual model driving translation process and in the statistical
modeling of the translation context.
Latvian being a rather free word order language implies
additional difficulties to the translation process. We contrast
different reordering models and investigate how well they
deal with the word ordering issue.
Moving beyond automatic scores of translation quality
that are classically presented in MT research papers, we contribute
presenting a manual error analysis of MT systems output
that helps to shed light on advantages and disadvantages
of the SMT systems under consideration and identify the most
prominent source of errors typical for both SMT systems.Postprint (published version
- âŠ