51 research outputs found

    Estimating Speaking Rate by Means of Rhythmicity Parameters

    Get PDF
    In this paper we present a speech rate estimator based on so-called rhythmicity features derived from a modified version of the short-time energy envelope. To evaluate the new method, it is compared to a traditional speech rate estimator on the basis of semi-automatic segmentation. Speech material from the Alcohol Language Corpus (ALC) covering intoxicated and sober speech of different speech styles provides a statistically sound foundation to test upon. The proposed measure clearly correlates with the semi-automatically determined speech rate and seems to be robust across speech styles and speaker states

    Finding the Most Uniform Changes in Vowel Polygon Caused by Psychological Stress

    Get PDF
    Using vowel polygons, exactly their parameters, is chosen as the criterion for achievement of differences between normal state of speaker and relevant speech under real psychological stress. All results were experimentally obtained by created software for vowel polygon analysis applied on ExamStress database. Selected 6 methods based on cross-correlation of different features were classified by the coefficient of variation and for each individual vowel polygon, the efficiency coefficient marking the most significant and uniform differences between stressed and normal speech were calculated. As the best method for observing generated differences resulted method considered mean of cross correlation values received for difference area value with vector length and angle parameter couples. Generally, best results for stress detection are achieved by vowel triangles created by /i/-/o/-/u/ and /a/-/i/-/o/ vowel triangles in formant planes containing the fifth formant F5 combined with other formants

    A Speech Feature Vector based on its Maximum Phase Component

    Get PDF
    This paper examines the performance of a vowel classification scheme using a new form of feature vector derived from a decomposition of the speech segment into Maximum Phase and Minimum Phase components. Justification for this approach in terms of its perceptual relevance is first made, followed by a signal processing scheme to obtain the components. The form for the feature vector is then discussed. Lastly, experimental work compares the performance of this new feature vector under a variety of distortion conditions with the contemporary popular choice of Mel-Frequency Cepstral Coefficients

    A Speech Feature Vector based on its Maximum Phase Component

    Get PDF
    This paper examines the performance of a vowel classification scheme using a new form of feature vector derived from a decomposition of the speech segment into Maximum Phase and Minimum Phase components. Justification for this approach in terms of its perceptual relevance is first made, followed by a signal processing scheme to obtain the components. The form for the feature vector is then discussed. Lastly, experimental work compares the performance of this new feature vector under a variety of distortion conditions with the contemporary popular choice of Mel-Frequency Cepstral Coefficients

    A Parallel Recurrent Neural Network for Language Modeling with POS Tags

    Get PDF

    Language Model Adaptation for Statistical Machine Translation with Structured Query Models

    Get PDF
    We explore unsupervised language model adaptation techniques for Statistical Machine Translation. The hypotheses from the machine translation output are converted into queries at different levels of representation power and used to extract similar sentences from very large monolingual text collection. Specific language models are then build from the retrieved data and interpolated with a general background model. Experiments show significant improvements when translating with these adapted language models

    Visualizing Classifier Adjacency Relations: A Case Study in Speaker Verification and Voice Anti-Spoofing

    Full text link
    Whether it be for results summarization, or the analysis of classifier fusion, some means to compare different classifiers can often provide illuminating insight into their behaviour, (dis)similarity or complementarity. We propose a simple method to derive 2D representation from detection scores produced by an arbitrary set of binary classifiers in response to a common dataset. Based upon rank correlations, our method facilitates a visual comparison of classifiers with arbitrary scores and with close relation to receiver operating characteristic (ROC) and detection error trade-off (DET) analyses. While the approach is fully versatile and can be applied to any detection task, we demonstrate the method using scores produced by automatic speaker verification and voice anti-spoofing systems. The former are produced by a Gaussian mixture model system trained with VoxCeleb data whereas the latter stem from submissions to the ASVspoof 2019 challenge.Comment: Accepted to Interspeech 2021. Example code available at https://github.com/asvspoof-challenge/classifier-adjacenc

    English-Latvian SMT: the challenge of translating into a free word order language

    Get PDF
    This paper presents a comparative study of two approaches to statistical machine translation (SMT) and their application to a task of English-to-Latvian translation, which is still an open research line in the field of automatic translation. We consider a state-of-the-art phrase-based SMT and an alternative N-gram-based SMT systems. The major differences between these two approaches lie in the distinct representations of bilingual units, which are the components of the bilingual model driving translation process and in the statistical modeling of the translation context. Latvian being a rather free word order language implies additional difficulties to the translation process. We contrast different reordering models and investigate how well they deal with the word ordering issue. Moving beyond automatic scores of translation quality that are classically presented in MT research papers, we contribute presenting a manual error analysis of MT systems output that helps to shed light on advantages and disadvantages of the SMT systems under consideration and identify the most prominent source of errors typical for both SMT systems.Postprint (published version
    • 

    corecore