139 research outputs found

    Artificial Vocal Learning guided by Phoneme Recognition and Visual Information

    Get PDF
    This paper introduces a paradigm shift regarding vocal learning simulations, in which the communicative function of speech acquisition determines the learning process and intelligibility is considered the primary measure of learning success. Thereby, a novel approach for artificial vocal learning is presented that utilizes deep neural network-based phoneme recognition in order to calculate the speech acquisition objective function. This function guides a learning framework that involves the state-of-the-art articulatory speech synthesizer VocalTractLab as the motor-to-acoustic forward model. In this way, an extensive set of German phonemes, including most of the consonants and all stressed vowels, was produced successfully. The synthetic phonemes were rated as highly intelligible by human listeners. Furthermore, it is shown that visual speech information, such as lip and jaw movements, can be extracted from video recordings and be incorporated into the learning framework as an additional loss component during the optimization process. It was observed that this visual loss did not increase the overall intelligibility of phonemes. Instead, the visual loss acted as a regularization mechanism that facilitated the finding of more biologically plausible solutions in the articulatory domain

    Exploration strategies for articulatory synthesis of complex syllable onsets

    Get PDF
    High-quality articulatory speech synthesis has many potential applications in speech science and technology. However, developing appropriate mappings from linguistic specification to articulatory gestures is difficult and time consuming. In this paper we construct an optimisation-based framework as a first step towards learning these mappings without manual intervention. We demonstrate the production of CCV syllables and discuss the quality of the articulatory gestures with reference to coarticulation

    Self-Supervised Solution to the Control Problem of Articulatory Synthesis

    Get PDF
    Given an articulatory-to-acoustic forward model, it is a priori unknown how its motor control must be operated to achieve a desired acoustic result. This control problem is a fundamental issue of articulatory speech synthesis and the cradle of acousticto-articulatory inversion, a discipline which attempts to address the issue by the means of various methods. This work presents an end-to-end solution to the articulatory control problem, in which synthetic motor trajectories of Monte-Carlo-generated artificial speech are linked to input modalities (such as natural speech recordings or phoneme sequence input) via speakerindependent latent representations of a vector-quantized variational autoencoder. The proposed method is self-supervised and thus, in principle, synthesizer and speaker model independent

    Proximal tibial dimensions in a formalin-fixed neonatal cadaver sample : an intraosseous infusion approach

    Get PDF
    DATA AVAILABILITY : The quantitative and qualitative data used to support the findings of this study are included within the article, and additional data may be requested from the corresponding author.PURPOSE : Methods to administer intramedullary medication and fluid infusion in both adults and children date to the early twentieth century. Studies have shown that intraosseous access in the proximal tibia is ideal for resuscitation efforts as fewer critical structures are at risk, and neither is the blood flow to the lower limbs compromised. Insertion of a needle in children younger than 5 years does have the risk to damage to the epiphyseal growth plate. Therefore, the aim of this study was to determine the ideal intraosseous insertion site distal to the epiphyseal growth plate in neonates. METHODS : The samples consisted of both the left and right sides of 15 formalin-fixed neonatal cadavers. The dimensions were measured on the superior surfaces of each section, anteromedial border, cortical thickness, and medullary space. RESULTS : The most desirable location to gain vascular access is at 10 mm inferior to the tibial tuberosity. CONCLUSION : The smallest cortical thickness (1.32 mm), the largest medullary space (4.50 mm), and the largest anteromedial surface (7.72 mm) were observed at 10 mm inferior to the tibial tuberosity. It is imperative that health care professionals are familiar with the osteological sites that could be safely used for an intraosseous infusion procedure.https://link.springer.com/journal/276hj2023AnatomySurger

    Modelling English diphthongs with dynamic articulatory targets

    Get PDF
    The nature of English diphthongs has been much disputed. By now, the most influential account argues that diphthongs are phoneme entities rather than vowel combinations. However, mixed results have been reported regarding whether the rate of formant transition is the most reliable attribute in the perception and production of diphthongs. Here, we used computational modelling to explore the underlying forms of diphthongs. We tested the assumption that diphthongs have dynamic articulatory targets by training an articulatory synthesiser with a three-dimensional (3D) vocal tract model to learn English words. An automatic phoneme recogniser was constructed to guide the learning of the diphthongs. Listening experiments by native listeners indicated that the model succeeded in learning highly intelligible diphthongs, providing support for the dynamic target assumption. The modelling approach paves a new way for validating hypotheses of speech perception and production

    Simulating vocal learning of spoken language: Beyond imitation

    Get PDF
    Computational approaches have an important role to play in understanding the complex process of speech acquisition, in general, and have recently been popular in studies of vocal learning in particular. In this article we suggest that two significant problems associated with imitative vocal learning of spoken language, the speaker normalisation and phonological correspondence problems, can be addressed by linguistically grounded auditory perception. In particular, we show how the articulation of consonant-vowel syllables may be learnt from auditory percepts that can represent either individual utterances by speakers with different vocal tract characteristics or ideal phonetic realisations. The result is an optimisation-based implementation of vocal exploration – incorporating semantic, auditory, and articulatory signals – that can serve as a basis for simulating vocal learning beyond imitation

    Supporting Self-Management of Cardiovascular Diseases Through Remote Monitoring Technologies:Metaethnography Review of Frameworks, Models, and Theories Used in Research and Development

    Get PDF
    Background: Electronic health (eHealth) is a rapidly evolving field informed by multiple scientific disciplines. Because of this, the use of different terms and concepts to explain the same phenomena and lack of standardization in reporting interventions often leaves a gap that hinders knowledge accumulation. Interventions focused on self-management support of cardiovascular diseases through the use of remote monitoring technologies are a cross-disciplinary area potentially affected by this gap. A review of the underlying frameworks, models, and theories that have informed projects at this crossroad could advance future research and development efforts. Objective: This research aimed to identify and compare underlying approaches that have informed interventions focused on self-management support of cardiovascular diseases through the use of remote monitoring technologies. The objective was to achieve an understanding of the distinct approaches by highlighting common or conflicting principles, guidelines, and methods. Methods: The metaethnography approach was used to review and synthesize researchers' reports on how they applied frameworks, models, and theories in their projects. Literature was systematically searched in 7 databases: Scopus, Web of Science, EMBASE, CINAHL, PsycINFO, Association for Computing Machinery Digital Library, and Cochrane Library. Included studies were thoroughly read and coded to extract data for the synthesis. Studies were mainly related by the key ingredients of the underlying approaches they applied. The key ingredients were finally translated across studies and synthesized into thematic clusters. Results: Of 1224 initial results, 17 articles were included. The articles described research and development of 10 different projects. Frameworks, models, and theories (n=43) applied by the projects were identified. Key ingredients (n=293) of the included articles were mapped to the following themes of eHealth development: (1) it is a participatory process; (2) it creates new infrastructures for improving health care, health, and well-being; (3) it is intertwined with implementation; (4) it integrates theory, evidence, and participatory approaches for persuasive design; (5) it requires continuous evaluation cycles; (6) it targets behavior change; (7) it targets technology adoption; and (8) it targets health-related outcomes. Conclusions: The findings of this review support and exemplify the numerous possibilities in the use of frameworks, models, and theories to guide research and development of eHealth. Participatory, user-centered design, and integration with empirical evidence and theoretical modeling were widely identified principles in the literature. On the contrary, less attention has been given to the integration of implementation in the development process and supporting novel eHealth-based health care infrastructures. To better integrate theory and evidence, holistic approaches can combine patient-centered studies with consolidated knowledge from expert-based approaches
    • …
    corecore