6,194 research outputs found

    Automatic Segmentation of Punjabi Speech Signal using Group Delay

    Get PDF
    Th is paper describes the concept of automatic segmentation of continuous speech signal. The language used for segmentation is the most widely spoken language i.e. Punjabi. Like all other Indian languages, Punjabi is a syllabic language, thus syllables are selected as the basic unit of segmentation. The traditional way of representing the speech signal is in terms of features derived from short - time Fourier analysis. It is difficult to compute the phase and processing the phase function from the FT phase. By processing the derivative of the FT phase, the information in the short - time FT phase function can be extrac ted. This paper describes the process of automatic segmentation of speech using group delay technique. This includes segmentation of continuous Punjabi speech into syllable like units by using the high resolution properties of group delay. This group delay function is found to be a better representative of the STE function for syllable boundary detection

    Mandarin Singing Voice Synthesis Based on Harmonic Plus Noise Model and Singing Expression Analysis

    Full text link
    The purpose of this study is to investigate how humans interpret musical scores expressively, and then design machines that sing like humans. We consider six factors that have a strong influence on the expression of human singing. The factors are related to the acoustic, phonetic, and musical features of a real singing signal. Given real singing voices recorded following the MIDI scores and lyrics, our analysis module can extract the expression parameters from the real singing signals semi-automatically. The expression parameters are used to control the singing voice synthesis (SVS) system for Mandarin Chinese, which is based on the harmonic plus noise model (HNM). The results of perceptual experiments show that integrating the expression factors into the SVS system yields a notable improvement in perceptual naturalness, clearness, and expressiveness. By one-to-one mapping of the real singing signal and expression controls to the synthesizer, our SVS system can simulate the interpretation of a real singer with the timbre of a speaker.Comment: 8 pages, technical repor

    Estimating Speaking Rate by Means of Rhythmicity Parameters

    Get PDF
    In this paper we present a speech rate estimator based on so-called rhythmicity features derived from a modified version of the short-time energy envelope. To evaluate the new method, it is compared to a traditional speech rate estimator on the basis of semi-automatic segmentation. Speech material from the Alcohol Language Corpus (ALC) covering intoxicated and sober speech of different speech styles provides a statistically sound foundation to test upon. The proposed measure clearly correlates with the semi-automatically determined speech rate and seems to be robust across speech styles and speaker states

    Acoustic correlates of linguistic rhythm: Perspectives

    Get PDF
    The empirical grounding of a typology of languages' rhythm is again a hot issue. The currently popular approach is based on the durations of vocalic and intervocalic intervals and their variability. Despite some successes, many questions remain. The main findings still need to be generalised to much larger corpora including many more languages. But a straightforward continuation of the current work faces many difficulties. Perspectives are outlined for future work, including proposals for the cross-linguistic control of speech rate, improvements on the statistical analyses, and prospects raised by automatic speech processing

    BAStat : New Statistical Resources at the Bavarian Archive for Speech Signals

    Get PDF
    A new type of language resource ’BAStat’ has been released by the Bavarian Archive for Speech Signals. In contrast to primary resources like speech and text corpora BAStat comprises statistical estimates based on a number of primary resources: first and second order occurrence probability of phones, syllables and words, duration statistics, probabilities of pronunciation variants of words and probabilities of context information. Unlike other statistical speech resources BAStat is based solely on recordings of conversational German and therefore models spoken language. It consists of 7-bit ASCII tables and matrices to maximize inter-operability between different platforms and can be downloaded from the BAS web-site. This paper gives a detailed description about the empirical basis, the contained data types, some interesting interpretations and a brief comparison to the text-based statistical resource CELEX

    Spoken content retrieval: A survey of techniques and technologies

    Get PDF
    Speech media, that is, digital audio and video containing spoken content, has blossomed in recent years. Large collections are accruing on the Internet as well as in private and enterprise settings. This growth has motivated extensive research on techniques and technologies that facilitate reliable indexing and retrieval. Spoken content retrieval (SCR) requires the combination of audio and speech processing technologies with methods from information retrieval (IR). SCR research initially investigated planned speech structured in document-like units, but has subsequently shifted focus to more informal spoken content produced spontaneously, outside of the studio and in conversational settings. This survey provides an overview of the field of SCR encompassing component technologies, the relationship of SCR to text IR and automatic speech recognition and user interaction issues. It is aimed at researchers with backgrounds in speech technology or IR who are seeking deeper insight on how these fields are integrated to support research and development, thus addressing the core challenges of SCR
    • …
    corecore