157 research outputs found
Artificial Vocal Learning guided by Phoneme Recognition and Visual Information
This paper introduces a paradigm shift regarding vocal learning simulations, in which the communicative function of speech acquisition determines the learning process and intelligibility is considered the primary measure of learning success. Thereby, a novel approach for artificial vocal learning is presented that utilizes deep neural network-based phoneme recognition in order to calculate the speech acquisition objective function. This function guides a learning framework that involves the state-of-the-art articulatory speech synthesizer VocalTractLab as the motor-to-acoustic forward model. In this way, an extensive set of German phonemes, including most of the consonants and all stressed vowels, was produced successfully. The synthetic phonemes were rated as highly intelligible by human listeners. Furthermore, it is shown that visual speech information, such as lip and jaw movements, can be extracted from video recordings and be incorporated into the learning framework as an additional loss component during the optimization process. It was observed that this visual loss did not increase the overall intelligibility of phonemes. Instead, the visual loss acted as a regularization mechanism that facilitated the finding of more biologically plausible solutions in the articulatory domain
Exploration strategies for articulatory synthesis of complex syllable onsets
High-quality articulatory speech synthesis has many potential applications in speech science and technology. However, developing appropriate mappings from linguistic specification to articulatory gestures is difficult and time consuming. In this paper we construct an optimisation-based framework as a first step towards learning these mappings without manual intervention. We demonstrate the production of CCV syllables and discuss the quality of the articulatory gestures with reference to coarticulation
Self-Supervised Solution to the Control Problem of Articulatory Synthesis
Given an articulatory-to-acoustic forward model, it is a priori
unknown how its motor control must be operated to achieve a
desired acoustic result. This control problem is a fundamental
issue of articulatory speech synthesis and the cradle of acousticto-articulatory inversion, a discipline which attempts to address
the issue by the means of various methods. This work presents
an end-to-end solution to the articulatory control problem, in
which synthetic motor trajectories of Monte-Carlo-generated
artificial speech are linked to input modalities (such as natural speech recordings or phoneme sequence input) via speakerindependent latent representations of a vector-quantized variational autoencoder. The proposed method is self-supervised and
thus, in principle, synthesizer and speaker model independent
Modelling English diphthongs with dynamic articulatory targets
The nature of English diphthongs has been much disputed. By
now, the most influential account argues that diphthongs are
phoneme entities rather than vowel combinations. However,
mixed results have been reported regarding whether the rate of
formant transition is the most reliable attribute in the perception
and production of diphthongs. Here, we used computational
modelling to explore the underlying forms of diphthongs. We
tested the assumption that diphthongs have dynamic
articulatory targets by training an articulatory synthesiser with
a three-dimensional (3D) vocal tract model to learn English
words. An automatic phoneme recogniser was constructed to
guide the learning of the diphthongs. Listening experiments by
native listeners indicated that the model succeeded in learning
highly intelligible diphthongs, providing support for the
dynamic target assumption. The modelling approach paves a
new way for validating hypotheses of speech perception and
production
Simulating vocal learning of spoken language: Beyond imitation
Computational approaches have an important role to play in understanding the complex process of speech acquisition, in general, and have recently been popular in studies of vocal learning in particular. In this article we suggest that two significant problems associated with imitative vocal learning of spoken language, the speaker normalisation and phonological correspondence problems, can be addressed by linguistically grounded auditory perception. In particular, we show how the articulation of consonant-vowel syllables may be learnt from auditory percepts that can represent either individual utterances by speakers with different vocal tract characteristics or ideal phonetic realisations. The result is an optimisation-based implementation of vocal exploration – incorporating semantic, auditory, and articulatory signals – that can serve as a basis for simulating vocal learning beyond imitation
Proximal tibial dimensions in a formalin-fixed neonatal cadaver sample : an intraosseous infusion approach
DATA AVAILABILITY : The quantitative and qualitative data used to support the findings of this study are included within the article, and additional data may be requested from the corresponding author.PURPOSE : Methods to administer intramedullary medication and fluid infusion in both adults and children date to the early twentieth century. Studies have shown that intraosseous access in the proximal tibia is ideal for resuscitation efforts as fewer critical structures are at risk, and neither is the blood flow to the lower limbs compromised. Insertion of a needle in children younger than 5 years does have the risk to damage to the epiphyseal growth plate. Therefore, the aim of this study was to determine the ideal intraosseous insertion site distal to the epiphyseal growth plate in neonates.
METHODS : The samples consisted of both the left and right sides of 15 formalin-fixed neonatal cadavers. The dimensions were measured on the superior surfaces of each section, anteromedial border, cortical thickness, and medullary space.
RESULTS : The most desirable location to gain vascular access is at 10 mm inferior to the tibial tuberosity.
CONCLUSION : The smallest cortical thickness (1.32 mm), the largest medullary space (4.50 mm), and the largest anteromedial surface (7.72 mm) were observed at 10 mm inferior to the tibial tuberosity. It is imperative that health care professionals are familiar with the osteological sites that could be safely used for an intraosseous infusion procedure.https://link.springer.com/journal/276hj2023AnatomySurger
Supporting Self-Management of Cardiovascular Diseases Through Remote Monitoring Technologies:Metaethnography Review of Frameworks, Models, and Theories Used in Research and Development
Background: Electronic health (eHealth) is a rapidly evolving field informed by multiple scientific disciplines. Because of this, the use of different terms and concepts to explain the same phenomena and lack of standardization in reporting interventions often leaves a gap that hinders knowledge accumulation. Interventions focused on self-management support of cardiovascular diseases through the use of remote monitoring technologies are a cross-disciplinary area potentially affected by this gap. A review of the underlying frameworks, models, and theories that have informed projects at this crossroad could advance future research and development efforts. Objective: This research aimed to identify and compare underlying approaches that have informed interventions focused on self-management support of cardiovascular diseases through the use of remote monitoring technologies. The objective was to achieve an understanding of the distinct approaches by highlighting common or conflicting principles, guidelines, and methods. Methods: The metaethnography approach was used to review and synthesize researchers' reports on how they applied frameworks, models, and theories in their projects. Literature was systematically searched in 7 databases: Scopus, Web of Science, EMBASE, CINAHL, PsycINFO, Association for Computing Machinery Digital Library, and Cochrane Library. Included studies were thoroughly read and coded to extract data for the synthesis. Studies were mainly related by the key ingredients of the underlying approaches they applied. The key ingredients were finally translated across studies and synthesized into thematic clusters. Results: Of 1224 initial results, 17 articles were included. The articles described research and development of 10 different projects. Frameworks, models, and theories (n=43) applied by the projects were identified. Key ingredients (n=293) of the included articles were mapped to the following themes of eHealth development: (1) it is a participatory process; (2) it creates new infrastructures for improving health care, health, and well-being; (3) it is intertwined with implementation; (4) it integrates theory, evidence, and participatory approaches for persuasive design; (5) it requires continuous evaluation cycles; (6) it targets behavior change; (7) it targets technology adoption; and (8) it targets health-related outcomes. Conclusions: The findings of this review support and exemplify the numerous possibilities in the use of frameworks, models, and theories to guide research and development of eHealth. Participatory, user-centered design, and integration with empirical evidence and theoretical modeling were widely identified principles in the literature. On the contrary, less attention has been given to the integration of implementation in the development process and supporting novel eHealth-based health care infrastructures. To better integrate theory and evidence, holistic approaches can combine patient-centered studies with consolidated knowledge from expert-based approaches
- …