1,539 research outputs found

    Exploiting Contextual Information for Prosodic Event Detection Using Auto-Context

    Get PDF
    Prosody and prosodic boundaries carry significant information regarding linguistics and paralinguistics and are important aspects of speech. In the field of prosodic event detection, many local acoustic features have been investigated; however, contextual information has not yet been thoroughly exploited. The most difficult aspect of this lies in learning the long-distance contextual dependencies effectively and efficiently. To address this problem, we introduce the use of an algorithm called auto-context. In this algorithm, a classifier is first trained based on a set of local acoustic features, after which the generated probabilities are used along with the local features as contextual information to train new classifiers. By iteratively using updated probabilities as the contextual information, the algorithm can accurately model contextual dependencies and improve classification ability. The advantages of this method include its flexible structure and the ability of capturing contextual relationships. When using the auto-context algorithm based on support vector machine, we can improve the detection accuracy by about 3% and F-score by more than 7% on both two-way and four-way pitch accent detections in combination with the acoustic context. For boundary detection, the accuracy improvement is about 1% and the F-score improvement reaches 12%. The new algorithm outperforms conditional random fields, especially on boundary detection in terms of F-score. It also outperforms an n-gram language model on the task of pitch accent detection

    P3b reflects periodicity in linguistic sequences

    Get PDF
    Temporal predictability is thought to affect stimulus processing by facilitating the allocation of attentional resources. Recent studies have shown that periodicity of a tonal sequence results in a decreased peak latency and a larger amplitude of the P3b compared with temporally random, i.e., aperiodic sequences. We investigated whether this applies also to sequences of linguistic stimuli (syllables), although speech is usually aperiodic. We compared aperiodic syllable sequences with two temporally regular conditions. In one condition, the interval between syllable onset was fixed, whereas in a second condition the interval between the syllables’ perceptual center (p-center) was kept constant. Event-related potentials were assessed in 30 adults who were instructed to detect irregularities in the stimulus sequences. We found larger P3b amplitudes for both temporally predictable conditions as compared to the aperiodic condition and a shorter P3b latency in the p-center condition than in both other conditions. These findings demonstrate that even in acoustically more complex sequences such as syllable streams, temporal predictability facilitates the processing of deviant stimuli. Furthermore, we provide first electrophysiological evidence for the relevance of the p-center concept in linguistic stimulus processing

    Saliency or template? ERP evidence for long-term representation of word stress

    Get PDF
    The present study investigated the event-related brain potential (ERP) correlates of word stress processing. Previous results showed that the violation of a legal stress pattern elicited two consecutive Mismatch Negativity (MMN) components synchronized to the changes on the first and second syllable. The aim of the present study was to test whether ERPs reflect only the detection of salient features present on the syllables, or they reflect the activation of long-term stress related representations. We examined ERPs elicited by pseudowords with no lexical representation in two conditions: the standard having a legal stress patterns, and the deviant an illegal one, and the standard having an illegal stress pattern, and the deviant a legal one. We found that the deviant having an illegal stress pattern elicited two consecutive MMN components, whereas the deviant having a legal stress pattern did not elicit MMN. Moreover, pseudowords with a legal stress pattern elicited the same ERP responses irrespective of their role in the oddball sequence, i.e., if they were standards or deviants. The results suggest that stress pattern changes are processed relying on long-term representation of word stress. To account for these results, we propose that the processing of stress cues is based on language-specific, pre-lexical stress templates

    Obtaining prominence judgments from naïve listeners – Influence of rating scales, linguistic levels and normalisation

    Get PDF
    A frequently replicated finding is that higher frequency words tend to be shorter and contain more strongly reduced vowels. However, little is known about potential differences in the articulatory gestures for high vs. low frequency words. The present study made use of electromagnetic articulography to investigate the production of two German vowels, [i] and [a], embedded in high and low frequency words. We found that word frequency differently affected the production of [i] and [a] at the temporal as well as the gestural level. Higher frequency of use predicted greater acoustic durations for long vowels; reduced durations for short vowels; articulatory trajectories with greater tongue height for [i] and more pronounced downward articulatory trajectories for [a]. These results show that the phonological contrast between short and long vowels is learned better with experience, and challenge both the Smooth Signal Redundancy Hypothesis and current theories of German phonology

    Hierarchical Representation and Estimation of Prosody using Continuous Wavelet Transform

    Get PDF
    Prominences and boundaries are the essential constituents of prosodic struc- ture in speech. They provide for means to chunk the speech stream into linguis- tically relevant units by providing them with relative saliences and demarcating them within utterance structures. Prominences and boundaries have both been widely used in both basic research on prosody as well as in text-to-speech syn- thesis. However, there are no representation schemes that would provide for both estimating and modelling them in a unified fashion. Here we present an unsupervised unified account for estimating and representing prosodic promi- nences and boundaries using a scale-space analysis based on continuous wavelet transform. The methods are evaluated and compared to earlier work using the Boston University Radio News corpus. The results show that the proposed method is comparable with the best published supervised annotation methods.Peer reviewe

    Fluency-related Temporal Features and Syllable Prominence as Prosodic Proficiency Predictors for Learners of English with Different Language Backgrounds

    Get PDF
    Prosodic features are important in achieving intelligibility, comprehensibility, and fluency in a second or foreign language (L2). However, research on the assessment of prosody as part of oral proficiency remains scarce. Moreover, the acoustic analysis of L2 prosody has often focused on fluency-related temporal measures, neglecting language-dependent stress features that can be quantified in terms of syllable prominence. Introducing the evaluation of prominence-related measures can be of use in developing both teaching and assessment of L2 speaking skills. In this study we compare temporal measures and syllable prominence estimates as predictors of prosodic proficiency in non-native speakers of English with respect to the speaker's native language (L1). The predictive power of temporal and prominence measures was evaluated for utterance-sized samples produced by language learners from four different L1 backgrounds: Czech, Slovak, Polish, and Hungarian. Firstly, the speech samples were assessed using the revised Common European Framework of Reference scale for prosodic features. The assessed speech samples were then analyzed to derive articulation rate and three fluency measures. Syllable-level prominence was estimated by a continuous wavelet transform analysis using combinations of F0, energy, and syllable duration. The results show that the temporal measures serve as reliable predictors of prosodic proficiency in the L2, with prominence measures providing a small but significant improvement to prosodic proficiency predictions. The predictive power of the individual measures varies both quantitatively and qualitatively depending on the L1 of the speaker. We conclude that the possible effects of the speaker's L1 on the production of L2 prosody in terms of temporal features as well as syllable prominence deserve more attention in applied research and developing teaching and assessment methods for spoken L2.Peer reviewe

    The acoustic basis of lexical stress perception

    Get PDF
    Peer reviewe

    Infants segment words from songs - an EEG study

    No full text
    Children’s songs are omnipresent and highly attractive stimuli in infants’ input. Previous work suggests that infants process linguistic–phonetic information from simplified sung melodies. The present study investigated whether infants learn words from ecologically valid children’s songs. Testing 40 Dutch-learning 10-month-olds in a familiarization-then-test electroencephalography (EEG) paradigm, this study asked whether infants can segment repeated target words embedded in songs during familiarization and subsequently recognize those words in continuous speech in the test phase. To replicate previous speech work and compare segmentation across modalities, infants participated in both song and speech sessions. Results showed a positive event-related potential (ERP) familiarity effect to the final compared to the first target occurrences during both song and speech familiarization. No evidence was found for word recognition in the test phase following either song or speech. Comparisons across the stimuli of the present and a comparable previous study suggested that acoustic prominence and speech rate may have contributed to the polarity of the ERP familiarity effect and its absence in the test phase. Overall, the present study provides evidence that 10-month-old infants can segment words embedded in songs, and it raises questions about the acoustic and other factors that enable or hinder infant word segmentation from songs and speech
    • …
    corecore