3,751 research outputs found

    Tone classification of syllable -segmented Thai speech based on multilayer perceptron

    Get PDF
    Thai is a monosyllabic and tonal language. Thai makes use of tone to convey lexical information about the meaning of a syllable. Thai has five distinctive tones and each tone is well represented by a single F0 contour pattern. In general, a Thai syllable with a different tone has a different lexical meaning. Thus, to completely recognize a spoken Thai syllable, a speech recognition system has not only to recognize a base syllable but also to correctly identify a tone. Hence, tone classification of Thai speech is an essential part of a Thai speech recognition system.;In this study, a tone classification of syllable-segmented Thai speech which incorporates the effects of tonal coarticulation, stress and intonation was developed. Automatic syllable segmentation, which performs the segmentation on the training and test utterances into syllable units, was also developed. The acoustical features including fundamental frequency (F0), duration, and energy extracted from the processing syllable and neighboring syllables were used as the main discriminating features. A multilayer perceptron (MLP) trained by backpropagation method was employed to classify these features. The proposed system was evaluated on 920 test utterances spoken by five male and three female Thai speakers who also uttered the training speech. The proposed system achieved an average accuracy rate of 91.36%

    Language experience enhances early cortical pitch-dependent responses

    Get PDF
    AbstractPitch processing at cortical and subcortical stages of processing is shaped by language experience. We recently demonstrated that specific components of the cortical pitch response (CPR) index the more rapidly-changing portions of the high rising Tone 2 of Mandarin Chinese, in addition to marking pitch onset and sound offset. In this study, we examine how language experience (Mandarin vs. English) shapes the processing of different temporal attributes of pitch reflected in the CPR components using stimuli representative of within-category variants of Tone 2. Results showed that the magnitude of CPR components (Na–Pb and Pb–Nb) and the correlation between these two components and pitch acceleration were stronger for the Chinese listeners compared to English listeners for stimuli that fell within the range of Tone 2 citation forms. Discriminant function analysis revealed that the Na–Pb component was more than twice as important as Pb–Nb in grouping listeners by language affiliation. In addition, a stronger stimulus-dependent, rightward asymmetry was observed for the Chinese group at the temporal, but not frontal, electrode sites. This finding may reflect selective recruitment of experience-dependent, pitch-specific mechanisms in right auditory cortex to extract more complex, time-varying pitch patterns. Taken together, these findings suggest that long-term language experience shapes early sensory level processing of pitch in the auditory cortex, and that the sensitivity of the CPR may vary depending on the relative linguistic importance of specific temporal attributes of dynamic pitch

    Prosodic detail in Neapolitan Italian

    Get PDF
    Recent findings on phonetic detail have been taken as supporting exemplar-based approaches to prosody. Through four experiments on both production and perception of both melodic and temporal detail in Neapolitan Italian, we show that prosodic detail is not incompatible with abstractionist approaches either. Specifically, we suggest that the exploration of prosodic detail leads to a refined understanding of the relationships between the richly specified and continuous varying phonetic information on one side, and coarse phonologically structured contrasts on the other, thus offering insights on how pragmatic information is conveyed by prosody

    Prosodic detail in Neapolitan Italian

    Get PDF
    Recent findings on phonetic detail have been taken as supporting exemplar-based approaches to prosody. Through four experiments on both production and perception of both melodic and temporal detail in Neapolitan Italian, we show that prosodic detail is not incompatible with abstractionist approaches either. Specifically, we suggest that the exploration of prosodic detail leads to a refined understanding of the relationships between the richly specified and continuous varying phonetic information on one side, and coarse phonologically structured contrasts on the other, thus offering insights on how pragmatic information is conveyed by prosody

    Prosodic detail in Neapolitan Italian

    Get PDF
    Recent findings on phonetic detail have been taken as supporting exemplar-based approaches to prosody. Through four experiments on both production and perception of both melodic and temporal detail in Neapolitan Italian, we show that prosodic detail is not incompatible with abstractionist approaches either. Specifically, we suggest that the exploration of prosodic detail leads to a refined understanding of the relationships between the richly specified and continuous varying phonetic information on one side, and coarse phonologically structured contrasts on the other, thus offering insights on how pragmatic information is conveyed by prosody

    Are tones aligned with articulatory events? Evidence from Italian and French

    No full text
    International audienceTonal alignment work has suggested that the temporal location of tonal targets relative to segmental "anchors" might be governed by principles of synchrony and stability (Arvaniti et al 1998, Ladd et al. 1999, inter alia). However, a number of discrepancies have emerged in the cross-linguistic study of alignment. For instance, despite some regularities in the alignment of L targets (Caspers and van Heuven 1993; Prieto et al. 1995), the alignment of H targets appears to be quite controversial. In fact, it is sometimes difficult to find definite segmental landmarks to which such targets might be aligned. Also, most of the alignment proposals so far inherently assume that if some anchors for tonal alignment do exist they must be acoustic in nature. A plausible alternative would be to assume that such anchors are primarily articulatory, which would explain why in some cases the underlying regularities would be masked. Hence, we adopt a new experimental paradigm for alignment research in which articulatory measures are performed simultaneously with acoustic measures. In order to test the constant alignment hypothesis, a preliminary study (D'Imperio et al. 2003) was conducted in which various latency measures, both acoustically and articulatorily based, were analyzed. Specifically, the kinematics of OPTOTRAK markers attached to the speaker's upper and lower lip was tracked over time during the production of the corpus sentences. The melodic target considered is the H tone of LH nuclear rises in Neapolitan Italian. In this variety, yes/no question LH rises are systematically later than (narrow focus) statement LH rises (D'Imperio 2000, 2001, 2002; D'Imperio and House 1997). In order to test the hypothesis of constant anchoring of H targets, the materials were produced with two different rates of speech, i.e. normal and fast. Summarizing the results, H targets of nuclear rises in Neapolitan statements and questions appear to be more closely phased with the articulatory dimension of between-lip distance than with two of the most commonly employed acoustic segmental landmarks for tonal alignment (i.e., onset and offset of stressed vowel). Statement H tones are phased with maximum between-lip distance within the stressed syllable. Note that this location does not correspond to any identifiable segmental boundary, acoustic event or phonological unit, and does not overlap with RMS peak amplitude. In fact, RMS peaks were generally much earlier than articulatory peaks, hence further away from H peaks. This calls for the collection and analysis of more articulatory data (especially jaw and tongue movements) to shed light on tonal alignment issues.In a second study, a French corpus was collected on the basis of the alignment contrast found by Welby (2003, in press). Welby's results show that listeners use the alignment of the initial rise (LHi) in French Accentual Phrases as a cue to speech segmentation. Specifically, listeners exploit the presence of an early rise to demarcate the beginning of a content word. In the present study, a corpus was built with a set of utterances displaying this specific alignment contrast. The kinematics of 10 pellets (8 on the face and tongue, 2 references) was tracked over time using an electromagnetometer (EMA, Carstens). The phasing of several articulatory events relative to the L and H part of the early rise were examined. The preliminary results seem to point to some kind of fine alignment specification for the L and H target. Specifically, we hypothesize that tonal target commands of Neapolitan as well as French rises are phased with commands of the supralaryngeal articulator involved to produce the segments to which the tone is associated. Regarding the word segmentation issue for French, it is important to study alignment in a diachronic perspective since we know of case of speech segmentation errors that can lead to lexical reinterpretation and change (l'abondance "abundance" > la bondance, from Welby 2003). We also take these results to suggest that not all rises align in the same way with the associated syllable. Though the role of articulatory constraints is important, the exact phasing properties of prosodic events are language-specific. Since prosody has recently become the realm of investigation of the Task Dynamics program (Byrd and Saltzman 2003), our alignment work will be cast under such a perspective

    An end-to-end machine learning system for harmonic analysis of music

    Full text link
    We present a new system for simultaneous estimation of keys, chords, and bass notes from music audio. It makes use of a novel chromagram representation of audio that takes perception of loudness into account. Furthermore, it is fully based on machine learning (instead of expert knowledge), such that it is potentially applicable to a wider range of genres as long as training data is available. As compared to other models, the proposed system is fast and memory efficient, while achieving state-of-the-art performance.Comment: MIREX report and preparation of Journal submissio
    • …
    corecore