45,460 research outputs found

    Temporal malleability to auditory feedback perturbation is modulated by rhythmic abilities and auditory acuity

    Get PDF
    Auditory feedback perturbation studies have indicated a link between feedback and feedforward mechanisms in speech production when participants compensate for applied shifts. In spectral perturbation studies, speakers with a higher perceptual auditory acuity typically compensate more than individuals with lower acuity. However, the reaction to feedback perturbation is unlikely to be merely a matter of perceptual acuity but also affected by the prediction and production of precise motor action. This interplay between prediction, perception, and motor execution seems to be crucial for the timing of speech and non-speech motor actions. In this study, to examine the relationship between the responses to temporally perturbed auditory feedback and rhythmic abilities, we tested 45 adult speakers on the one hand with a temporal auditory feedback perturbation paradigm, and on the other hand with rhythm perception and production tasks. The perturbation tasks temporally stretched and compressed segments (onset + vowel or vowel + coda) in fluent speech in real-time. This technique sheds light on the temporal representation and the production flexibility of timing mechanisms in fluent speech with respect to the structure of the syllable. The perception tasks contained staircase paradigms capturing duration discrimination abilities and beat-alignment judgments. The rhythm production tasks consisted of finger tapping tasks taken from the BAASTA tapping battery and additional speech tapping tasks. We found that both auditory acuity and motor stability in finger tapping affected responses to temporal auditory feedback perturbation. In general, speakers with higher auditory acuity and higher motor variability compensated more. However, we observed a different weighting of auditory acuity and motor stability dependent on the prosodic structure of the perturbed sequence and the nature of the response as purely online or adaptive. These findings shed light on the interplay of phonological structure with feedback and feedforward integration for timing mechanisms in speech

    A system for adaptive high-variability segmental perceptual training: Implementation, effectiveness, transfer

    Get PDF
    Many types of L2 phonological perception are often difficult to acquire without instruction. These difficulties with perception may also be related to intelligibility in production. Instruction on perception contrasts is more likely to be successful with the use of phonetically variable input made available through computer-assisted pronunciation training. However, few computer-assisted programs have demonstrated flexibility in diagnosing and treating individual learner problems or have made effective use of linguistic resources such as corpora for creating training materials. This study introduces a system for segmental perceptual training that uses a computational approach to perception utilizing corpus-based word frequency lists, high variability phonetic input, and text-to-speech technology to automatically create discrimination and identification perception exercises customized for individual learners. The effectiveness of the system is evaluated in an experiment with pre- and post-test design, involving 32 adult Russian-speaking learners of English as a foreign language. The participants’ perceptual gains were found to transfer to novel voices, but not to untrained words. Potential factors underlying the absence of word-level transfer are discussed. The results of the training model provide an example for replication in language teaching and research settings

    The Sound Manifesto

    Full text link
    Computing practice today depends on visual output to drive almost all user interaction. Other senses, such as audition, may be totally neglected, or used tangentially, or used in highly restricted specialized ways. We have excellent audio rendering through D-A conversion, but we lack rich general facilities for modeling and manipulating sound comparable in quality and flexibility to graphics. We need co-ordinated research in several disciplines to improve the use of sound as an interactive information channel. Incremental and separate improvements in synthesis, analysis, speech processing, audiology, acoustics, music, etc. will not alone produce the radical progress that we seek in sonic practice. We also need to create a new central topic of study in digital audio research. The new topic will assimilate the contributions of different disciplines on a common foundation. The key central concept that we lack is sound as a general-purpose information channel. We must investigate the structure of this information channel, which is driven by the co-operative development of auditory perception and physical sound production. Particular audible encodings, such as speech and music, illuminate sonic information by example, but they are no more sufficient for a characterization than typography is sufficient for a characterization of visual information.Comment: To appear in the conference on Critical Technologies for the Future of Computing, part of SPIE's International Symposium on Optical Science and Technology, 30 July to 4 August 2000, San Diego, C

    Effects of Palatal Expansion on Speech Production

    Get PDF
    Introduction: Rapid palatal expanders (RPEs) are a commonly used orthodontic adjunct for the treatment of posterior crossbites. RPEs are cemented to bilateral posterior teeth across the palate and thus may interfere with proper tongue movement and linguopalatal contact. The purpose of this study was to identify what specific role RPEs have on speech sound production for the child and early adolescent orthodontic patient. Materials and Methods: RPEs were treatment planned for patients seeking orthodontics at Marquette University. Speech recordings were made using a phonetically balanced reading passage (“The Caterpillar”) at 3 time points: 1) before RPE placement; 2) immediately after cementation; and 3) 10-14 days post appliance delivery. Measures of vocal tract resonance (formant center frequencies) were obtained for vowels and measures of noise distribution (spectral moments) were obtained for consonants. Two-way repeated measures (ANOVA) was used along with post-hoc tests for statistical analysis. Results: For the vowel /i/, the first formant increased and the second formant decreased indicating a more inferior and posterior tongue position. For /e/, only the second formant decreased resulting in a more posterior tongue position. The formants did not return to baseline within the two-week study period. For the fricatives /s/, //, /t/, and /k/, a significant shift from high to low frequencies indicated distortion upon appliance placement. Of these, only /t/ fully returned to baseline during the study period. Conclusion: Numerous phonemes were distorted upon RPE placement which indicated altered speech sound production. For most phonemes, it takes longer than two weeks for speech to return to baseline, if at all. Clinically, the results of this study will help with pre-treatment and interdisciplinary counseling for orthodontic patients receiving palatal expanders

    The DayOne project: how far can a robot develop in 24 hours?

    Get PDF
    What could a robot learn in one day? This paper describes the DayOne project, an endeavor to build an epigenetic robot that can bootstrap from a very rudimentary state to relatively sophisticated perception of objects and activities in a matter of hours. The project is inspired by the astonishingly rapidity with which many animals such as foals and lambs adapt to their surroundings on the first day of their life. While such plasticity may not be a sufficient basis for long-term cognitive development, it may be at least necessary, and share underlying infrastructure. This paper suggests that a sufficiently flexible perceptual system begins to look and act like it contains cognitive structures

    Contextual Influences on Phonetic Categorization in Developmental Populations

    Get PDF
    A major goal of research in the domain of speech perception has been to describe how listeners recover individual consonants and vowels from the speech stream. A major challenge to this task is explaining how this is possible given the extreme variability present in the speech signal. In healthy adults, findings have repeatedly demonstrated that the key to healthy perceptual processing system is dynamically adjusting phonetic boundaries to accommodate contextual influences in speech production. The current study seeks to determine if school-aged children demonstrate the same functional plasticity for systematic variation. Collectively, we found that older children (8-10 years of age) demonstrated boundary flexibility similar to adults. For younger children (5-7 years of age), the results were less definitive, which indicates that the paradigm may not be appropriate for young school-aged children. The results of the current work add to our knowledge of language processing in three ways. First, the results indicate that the modified paradigm successfully measured categorical processing in healthy adults and typically-developing older children. Second, the results provide evidence in support of modifications to the paradigm, such as discrimination paradigms and imaging paradigms, to further assess the effects of context on the perceptual systems of younger children. Finally, the results point to specific considerations for informing the locus of language impairment in children, particularly for the specific language impairment population

    Deep attractor network for single-microphone speaker separation

    Full text link
    Despite the overwhelming success of deep learning in various speech processing tasks, the problem of separating simultaneous speakers in a mixture remains challenging. Two major difficulties in such systems are the arbitrary source permutation and unknown number of sources in the mixture. We propose a novel deep learning framework for single channel speech separation by creating attractor points in high dimensional embedding space of the acoustic signals which pull together the time-frequency bins corresponding to each source. Attractor points in this study are created by finding the centroids of the sources in the embedding space, which are subsequently used to determine the similarity of each bin in the mixture to each source. The network is then trained to minimize the reconstruction error of each source by optimizing the embeddings. The proposed model is different from prior works in that it implements an end-to-end training, and it does not depend on the number of sources in the mixture. Two strategies are explored in the test time, K-means and fixed attractor points, where the latter requires no post-processing and can be implemented in real-time. We evaluated our system on Wall Street Journal dataset and show 5.49\% improvement over the previous state-of-the-art methods.Comment: 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP
    • 

    corecore