1,374 research outputs found

    Strategies for analyzing tone languages

    Get PDF
    This paper outlines a method of auditory and acoustic analysis for determining the tonemes of a language starting from scratch, drawing on the author’s experience of recording and analyzing tone languages of north-east India. The methodology is applied to a preliminary analysis of tone in the Thang dialect of Khiamniungan, a virtually undocumented language of extreme eastern Nagaland and adjacent areas of the Sagaing Division Myanmar (Burma). Following a discussion of strategies for ensuring that data appropriate for tonal analysis will be recorded, the practical demonstration begins with a description of how tone categories can be established according to their syllable type in the preliminary auditory analysis. The paper then uses this data to describe a method of acoustic analysis that ultimately permits the representation of pitch shapes as a function of absolute mean duration. The analysis of grammatical tones, floating tones and tone sandhi are exemplified with Mongsen Ao data, and a description of a perception test demonstrates how this can be used to corroborate the auditory and acoustic analysis of a tone system. *This paper is in the series How to Study a Tone Language, edited by Steven Bird and Larry HymanNational Foreign Language Resource Cente

    Motivic Pattern Classification of Music Audio Signals Combining Residual and LSTM Networks

    Get PDF
    Motivic pattern classification from music audio recordings is a challenging task. More so in the case of a cappella flamenco cantes, characterized by complex melodic variations, pitch instability, timbre changes, extreme vibrato oscillations, microtonal ornamentations, and noisy conditions of the recordings. Convolutional Neural Networks (CNN) have proven to be very effective algorithms in image classification. Recent work in large-scale audio classification has shown that CNN architectures, originally developed for image problems, can be applied successfully to audio event recognition and classification with little or no modifications to the networks. In this paper, CNN architectures are tested in a more nuanced problem: flamenco cantes intra-style classification using small motivic patterns. A new architecture is proposed that uses the advantages of residual CNN as feature extractors, and a bidirectional LSTM layer to exploit the sequential nature of musical audio data. We present a full end-to-end pipeline for audio music classification that includes a sequential pattern mining technique and a contour simplification method to extract relevant motifs from audio recordings. Mel-spectrograms of the extracted motifs are then used as the input for the different architectures tested. We investigate the usefulness of motivic patterns for the automatic classification of music recordings and the effect of the length of the audio and corpus size on the overall classification accuracy. Results show a relative accuracy improvement of up to 20.4% when CNN architectures are trained using acoustic representations from motivic patterns

    Hierachical methods for large population speaker identification using telephone speech

    Get PDF
    This study focuses on speaker identificat ion. Several problems such as acoustic noise, channel noise, speaker variability, large population of known group of speakers wi thin the system and many others limit good SiD performance. The SiD system extracts speaker specific features from digitised speech signa] for accurate identification. These feature sets are clustered to form the speaker template known as a speaker model. As the number of speakers enrolling into the system gets larger, more models accumulate and the interspeaker confusion results. This study proposes the hierarchical methods which aim to split the large population of enrolled speakers into smaller groups of model databases for minimising interspeaker confusion

    More than words: Recognizing speech of people with Parkinson's disease

    Get PDF
    Parkinson’s disease (PD) is the fastest-growing neurological disorder in the world, with approximately 10 million people currently living with the diagnosis. Hypokinetic dysarthria (HD) is one of the symptoms that appear in early stages of the disease progression. The main aim of this dissertation is to gain insights into listeners’ impressions of dysarthric speech and to uncover acoustic correlates of those impressions. We do this by exploring two sides of communication: speech production of people with PD, and listeners’ recognition of speech of people with PD. Therefore, the studies in this dissertation approach the topic of speech changes in PD from both the speakers' side - via acoustic analysis of speech, and the listeners' side - via experiments exploring the influence of expertise and language background on recognition of speech of people with PD. Moreover, to obtain a more comprehensive picture of these perspectives, the studies of this dissertation are multifaceted, explore cross-linguistic aspects of dysarthric speech recognition and include both cross-sectional and longitudinal designs. The results demonstrate that listeners' ability to recognize speech of people with PD as unhealthy is rooted in the acoustic changes in speech, not in its content. Listeners’ experience in the fields of speech and language therapy or speech sciences affect dysarthric speech recognition. The results also suggest that tracking speech parameters is a useful tool for monitoring the progression and/or development of dysarthria and objectively evaluating long-term effects of speech therapy

    Speech Recognition

    Get PDF
    Chapters in the first part of the book cover all the essential speech processing techniques for building robust, automatic speech recognition systems: the representation for speech signals and the methods for speech-features extraction, acoustic and language modeling, efficient algorithms for searching the hypothesis space, and multimodal approaches to speech recognition. The last part of the book is devoted to other speech processing applications that can use the information from automatic speech recognition for speaker identification and tracking, for prosody modeling in emotion-detection systems and in other speech processing applications that are able to operate in real-world environments, like mobile communication services and smart homes
    • …
    corecore