194 research outputs found

    Let the agents do the talking: On the influence of vocal tract anatomy no speech during ontogeny

    Get PDF

    Decoding ECoG signal into 3D hand translation using deep learning

    Full text link
    Motor brain-computer interfaces (BCIs) are a promising technology that may enable motor-impaired people to interact with their environment. Designing real-time and accurate BCI is crucial to make such devices useful, safe, and easy to use by patients in a real-life environment. Electrocorticography (ECoG)-based BCIs emerge as a good compromise between invasiveness of the recording device and good spatial and temporal resolution of the recorded signal. However, most ECoG signal decoders used to predict continuous hand movements are linear models. These models have a limited representational capacity and may fail to capture the relationship between ECoG signal and continuous hand movements. Deep learning (DL) models, which are state-of-the-art in many problems, could be a solution to better capture this relationship. In this study, we tested several DL-based architectures to predict imagined 3D continuous hand translation using time-frequency features extracted from ECoG signals. The dataset used in the analysis is a part of a long-term clinical trial (ClinicalTrials.gov identifier: NCT02550522) and was acquired during a closed-loop experiment with a tetraplegic subject. The proposed architectures include multilayer perceptron (MLP), convolutional neural networks (CNN), and long short-term memory networks (LSTM). The accuracy of the DL-based and multilinear models was compared offline using cosine similarity. Our results show that CNN-based architectures outperform the current state-of-the-art multilinear model. The best architecture exploited the spatial correlation between neighboring electrodes with CNN and benefited from the sequential character of the desired hand trajectory by using LSTMs. Overall, DL increased the average cosine similarity, compared to the multilinear model, by up to 60%, from 0.189 to 0.302 and from 0.157 to 0.249 for the left and right hand, respectively

    Artificial Intelligence in Oral Health

    Get PDF
    This Special Issue is intended to lay the foundation of AI applications focusing on oral health, including general dentistry, periodontology, implantology, oral surgery, oral radiology, orthodontics, and prosthodontics, among others

    The character strengths of class clowns

    Get PDF
    Class clowns traditionally were studied as a type concept and identified via sociometric procedures. In the present study a variable-centered approach was favored and class clown behaviors were studied in the context of character strengths, orientations to happiness and satisfaction with life. A sample of 672 Swiss children and adolescents filled in an 18 item self-report instrument depicting class clown behaviors. A hierarchical model of class clown behaviors was developed distinguishing a general factor and the four positively correlated dimensions of “identified as a class clown,” “comic talent,” “disruptive rule-breaker,” and “subversive joker.” Analysis of the general factor showed that class clowns were primarily male, and tended to be seen as class clowns by the teacher. Analyses of the 24 character strengths of the VIA-Youth (Park and Peterson, 2006) showed that class clowns were high in humor and leadership, and low in strengths like prudence, self-regulation, modesty, honesty, fairness, perseverance, and love of learning. An inspection of signature strengths revealed that 75% of class clowns had humor as a signature strength. Furthermore, class clown behaviors were generally shown by students indulging in a life of pleasure, but low life of engagement. The four dimensions yielded different character strengths profiles. While all dimensions of class clowns behaviors were low in temperance strengths, the factors “identified as the class clown” and “comic talent” were correlated with leadership strengths and the two negative factors (“disruptive rule-breaker,” “subversive joker”) were low in other directed strengths. The disruptive rule breaking class clown was additionally low in intellectual strengths. While humor predicted life satisfaction, class clowning tended to go along with diminished satisfaction with life. It is concluded that different types of class clowns need to be kept apart and need different attention by teachers

    Models and Analysis of Vocal Emissions for Biomedical Applications

    Get PDF
    The International Workshop on Models and Analysis of Vocal Emissions for Biomedical Applications (MAVEBA) came into being in 1999 from the particularly felt need of sharing know-how, objectives and results between areas that until then seemed quite distinct such as bioengineering, medicine and singing. MAVEBA deals with all aspects concerning the study of the human voice with applications ranging from the newborn to the adult and elderly. Over the years the initial issues have grown and spread also in other fields of research such as occupational voice disorders, neurology, rehabilitation, image and video analysis. MAVEBA takes place every two years in Firenze, Italy. This edition celebrates twenty-two years of uninterrupted and successful research in the field of voice analysis

    Computer lipreading via hybrid deep neural network hidden Markov models

    Get PDF
    Constructing a viable lipreading system is a challenge because it is claimed that only 30% of information of speech production is visible on the lips. Nevertheless, in small vocabulary tasks, there have been several reports of high accuracies. However, investigation of larger vocabulary tasks is rare. This work examines constructing a large vocabulary lipreading system using an approach based-on Deep Neural Network Hidden Markov Models (DNN-HMMs). We present the historical development of computer lipreading technology and the state-ofthe-art results in small and large vocabulary tasks. In preliminary experiments, we evaluate the performance of lipreading and audiovisual speech recognition in small vocabulary data sets. We then concentrate on the improvement of lipreading systems in a more substantial vocabulary size with a multi-speaker data set. We tackle the problem of lipreading an unseen speaker. We investigate the effect of employing several stepstopre-processvisualfeatures. Moreover, weexaminethecontributionoflanguage modelling in a lipreading system where we use longer n-grams to recognise visual speech. Our lipreading system is constructed on the 6000-word vocabulary TCDTIMIT audiovisual speech corpus. The results show that visual-only speech recognition can definitely reach about 60% word accuracy on large vocabularies. We actually achieved a mean of 59.42% measured via three-fold cross-validation on the speaker independent setting of the TCD-TIMIT corpus using Deep autoencoder features and DNN-HMM models. This is the best word accuracy of a lipreading system in a large vocabulary task reported on the TCD-TIMIT corpus. In the final part of the thesis, we examine how the DNN-HMM model improves lipreading performance. We also give an insight into lipreading by providing a feature visualisation. Finally, we present an analysis of lipreading results and suggestions for future development

    EEG-based Brain-Computer Interfaces (BCIs): A Survey of Recent Studies on Signal Sensing Technologies and Computational Intelligence Approaches and Their Applications.

    Full text link
    Brain-Computer interfaces (BCIs) enhance the capability of human brain activities to interact with the environment. Recent advancements in technology and machine learning algorithms have increased interest in electroencephalographic (EEG)-based BCI applications. EEG-based intelligent BCI systems can facilitate continuous monitoring of fluctuations in human cognitive states under monotonous tasks, which is both beneficial for people in need of healthcare support and general researchers in different domain areas. In this review, we survey the recent literature on EEG signal sensing technologies and computational intelligence approaches in BCI applications, compensating for the gaps in the systematic summary of the past five years. Specifically, we first review the current status of BCI and signal sensing technologies for collecting reliable EEG signals. Then, we demonstrate state-of-the-art computational intelligence techniques, including fuzzy models and transfer learning in machine learning and deep learning algorithms, to detect, monitor, and maintain human cognitive states and task performance in prevalent applications. Finally, we present a couple of innovative BCI-inspired healthcare applications and discuss future research directions in EEG-based BCI research

    A head model with anatomical structure for facial modelling and animation

    Get PDF
    In this dissertation, I describe a virtual head model with anatomical structure. The model is animated in a physical-based manner by use of muscle contractions that in turn cause skin deformations; the simulation is efficient enough to achieve real-time frame rates on current PC hardware. Construction of head models is eased in my approach by deriving new models from a prototype, employing a deformation method that reshapes the complete virtual head structure. Without additional modeling tasks, this results in an immediately animatable model. The general deformation method allows for several applications such as adaptation to individual scan data for creation of animated head models of real persons. The basis for the deformation method is a set of facial feature points, which leads to other interesting uses when this set is chosen according to an anthropometric standard set of facial landmarks: I present algorithms for simulation of human head growth and reconstruction of a face from a skull.In dieser Dissertation beschreibe ich ein nach der menschlichen Anatomie strukturiertes virtuelles Kopfmodell. Dieses Modell wird physikbasiert durch Muskelkontraktionen bewegt, die wiederum Hautdeformationen hervorrufen; die Simulation ist effizient genug, um Echtzeitanimation auf aktueller PC-Hardware zu ermöglichen. Die Konstruktion eines Kopfmodells wird in meinem Ansatz durch Ableitung von einem Prototypen erleichtert, wozu eine Deformationstechnik verwendet wird, die die gesamte Struktur des virtuellen Kopfes transformiert. Ein vollständig animierbares Modell entsteht so ohne weitere Modellierungsschritte. Die allgemeine Deformationsmethode gestattet eine Vielzahl von Anwendungen, wie beispielsweise die Anpassung an individuelle Scandaten für die Erzeugung von animierten Kopfmodellen realer Personen. Die Deformationstechnik basiert auf einer Menge von Markierungspunkten im Gesicht, was zu weiteren interessanten Einsatzgebieten führt, wenn diese mit Standard- Meßpunkten aus der Anthropometrie identifiziert werden: Ich stelle Algorithmen zur Simulation des menschlichen Kopfwachstums sowie der Rekonstruktion eines Gesichtes aus Schädeldaten vor

    Predictive Articulatory speech synthesis Utilizing Lexical Embeddings (PAULE)

    Get PDF
    Das Predictive Articulatory speech synthesis Utilizing Lexical Embeddings (PAULE) Modell ist ein neues Modell zur Kontrolle des artikulatorischen Sprachsynthesizers VocalTractLab (VTL) [15] . Mit PAULE lassen sich deutsche Wörter synthetisieren. Die Wortsynthese kann entweder mit Hilfe eines semantischen Vektors, der die Wortbedeu- tung kodiert, und der gewünschten Dauer der Wortsynthese gestartet werden oder es kann eine Resynthese von einer Audiodatei gemacht werden. Die Audiodatei kann beliebige Aufnahmen von Sprecher:innen enthalten, wobei die Resynthese immer über den Standardsprecher des VTL erfolgt. Abhängig von der Wortbedeutung und der Audiodatei variiert die Synthesequalität. Neu an PAULE ist, dass es einen prädiktiven Ansatz verwendet, indem es aus der geplanten Artikulation die dazugehörige perzeptuelle Akustik vorhersagt und daraus die Wortbedeutung ableitet. Sowohl die Akustik als auch die Wortbedeutung sind als metrische Vektorräume implementiert. Dadurch lässt sich ein Fehler zu einer gewünschten Zielakustik und Zielbedeutung berechnen und minimieren. Bei dem minimierten Fehler handelt es sich nicht um den tatsächlichen Fehler, der aus der Synthese mit dem VTL entsteht, sondern um den Fehler, der aus den Vorhersagen eines prädiktiven Modells generiert wird. Obwohl es nicht der tatsächliche Fehler ist, kann dieser Fehler genutzt werden, um die tatsächliche Artikulation zu verbessern. Um das prädiktive Modell mit der tatsächlichen Akustik in Einklang zu bringen, hört sich PAULE selbst zu. Ein in der Sprachsynthese zentrales Eins-Zu-Viele-Problem ist, dass eine Akustik durch viele verschiedene Artikulationen erzeugt werden kann. Dieses Eins-Zu-Viele-Problem wird durch die Vorhersagefehlerminimierung in PAULE aufgelöst, zusammen mit der Bedingung, dass die Artikulation möglichst stationär und mit möglichst konstanter Kraft ausgeführt wird. PAULE funktioniert ohne jegliche symbolische Repräsentation in der Akustik (Phoneme) und in der Artikulation (motorische Gesten oder Ziele). Damit zeigt PAULE, dass sich gesprochene Wörter ohne symbolische Beschreibungsebene model- lieren lassen. Der gesprochenen Sprache könnte daher im Vergleich zur geschriebenen Sprache eine fundamental andere Verarbeitungsebene zugrunde liegen. PAULE integriert Erfahrungswissen sukzessive. Damit findet PAULE nicht die global beste Artikulation sondern lokal gute Artikulationen. Intern setzt PAULE auf künstliche neuronale Netze und die damit verbundenen Gradienten, die zur Fehlerkorrektur verwendet werden. PAULE kann weder ganze Sätze synthetisieren noch wird somatosensorisches Feedback berücksichtigt. Zu Beidem gibt es Vorarbeiten, die in zukünftige Versionen integriert werden sollen.The Predictive Articulatory speech synthesis Utilizing Lexical Embeddings (PAULE) model is a new control model for the VocalTractLab (VTL) [15] speech synthesizer, a simulator of the human speech system. It is capable of synthesizing single words in the German language. The speech synthesis can be based on a target semantic vector or on target acoustics, i.e., a recorded word token. VTL is controlled by 30 parameters. These parameters have to be estimated for each time point during the production of a word, which is roughly every 2.5 milliseconds. The time-series of these 30 control parameters (cps) of the VTL are the control parameter trajectories (cp-trajectories). The high dimensionality of the cp-trajectories in combination with non-linear interactions leads to a many-to-one mapping problem, where many sets of cp-trajectories produce highly similar synthesized audio. PAULE solves this many-to-one mapping problem by anticipating the effects of cp- trajectories and minimizing a semantic and acoustic error between this nticipation and a targeted meaning and acoustics. The quality of the anticipation is improved by an outer loop, where PAULE listens to itself. PAULE has three central design features that distinguish it from other control models: First, PAULE does not use any symbolic units, neither motor primitives, articulatory targets, or gestural scores on the movement side, nor any phone or syllable representation on the acoustic side. Second, PAULE is a learning model that accumulates experience with articulated words. As a consequence, PAULE will not find a global optimum for the inverse kinematic optimization task it has to solve. Instead, it finds a local optimum that is conditioned on its past experience. Third, PAULE uses gradient-based internal prediction errors of a predictive forward model to plan cp-trajectories for a given semantic or acoustic target. Thus, PAULE is an error-driven model that takes its previous experiences into account. Pilot study results indicate that PAULE is able to minimize an acoustic semantic and acoustic error in the resynthesized audio. This allows PAULE to find cp-trajectories that are correctly classified by a classification model as the correct word with an accuracy of 60 %, which is close to the accuracy for human recordings of 63 %. Furthermore, PAULE seems to model vowel-to-vowel anticipatory coarticulation in terms of formant shifts correctly and can be compared to human electromagnetic articulography (EMA) recordings in a straightforward way. Furthermore, with PAULE it is possible to condition on already executed past cp-trajectories and to smoothly continue the cp-trajectories from the current state. As a side-effect of developing PAULE, it is possible to create large amounts of training data for the VTL through an automated segment-based approach. Next steps, in the development of PAULE, include adding a somatosensory feedback channel, extending PAULE from producing single words to the articulation of small utterances and adding a thorough evaluation

    Integrating Socially Assistive Robots into Language Tutoring Systems. A Computational Model for Scaffolding Young Children's Foreign Language Learning

    Get PDF
    Schodde T. Integrating Socially Assistive Robots into Language Tutoring Systems. A Computational Model for Scaffolding Young Children's Foreign Language Learning. Bielefeld: Universität Bielefeld; 2019.Language education is a global and important issue nowadays, especially for young children since their later educational success build on it. But learning a language is a complex task that is known to work best in a social interaction and, thus, personalized sessions tailored to the individual knowledge and needs of each child are needed to allow for teachers to optimally support them. However, this is often costly regarding time and personnel resources, which is one reasons why research of the past decades investigated the benefits of Intelligent Tutoring Systems (ITSs). But although ITSs can help out to provide individualized one-on-one tutoring interactions, they often lack of social support. This dissertation provides new insights on how a Socially Assistive Robot (SAR) can be employed as a part of an ITS, building a so-called "Socially Assistive Robot Tutoring System" (SARTS), to provide social support as well as to personalize and scaffold foreign language learning for young children in the age of 4-6 years. As basis for the SARTS a novel approach called A-BKT is presented, which allows to autonomously adapt the tutoring interaction to the children's individual knowledge and needs. The corresponding evaluation studies show that the A-BKT model can significantly increase student's learning gains and maintain a higher engagement during the tutoring interaction. This is partly due to the models ability to simulate the influences of potential actions on all dimensions of the learning interaction, i.e., the children's learning progress (cognitive learning), affective state, engagement (affective learning) and believed knowledge acquisition (perceived learning). This is particularly important since all dimensions are strongly interconnected and influence each other, for example, a low engagement can cause bad learning results although the learner is already quite proficient. However, this also yields the necessity to not only focus on the learner's cognitive learning but to equally support all dimensions with appropriate scaffolding actions. Therefore an extensive literature review, observational video recordings and expert interviews were conducted to find appropriate actions applicable for a SARTS to support each learning dimension. The subsequent evaluation study confirms that the developed scaffolding techniques are able to support young children’s learning process either by re-engaging them or by providing transparency to support their perception of the learning process and to reduce uncertainty. Finally, based on educated guesses derived from the previous studies, all identified strategies are integrated into the A-BKT model. The resulting model called ProTM is evaluated by simulating different learner types, which highlight its ability to autonomously adapt the tutoring interactions based on the learner's answers and provided dis-engagement cues. Summarized, this dissertation yields new insights into the field of SARTS to provide personalized foreign language learning interactions for young children, while also rising new important questions to be studied in the future
    corecore