130 research outputs found

    Augev Method and an Innovative Use of Vocal Spectroscopy in Evaluating and Monitoring the Rehabilitation Path of Subjects Showing Severe Communication Pathologies

    Get PDF
    A strongly connotative element of developmental disorders (DS) is the total or partial impairment of verbal communication and, more generally, of social interaction. The method of Vocal-verb self-management (Augev) is a systemic organicistic method able to intervene in problems regarding verbal, spoken and written language development successfully. This study intends to demonstrate that it is possible to objectify these progresses through a spectrographic examination of vocal signals, which detects voice phonetic-acoustic parameters. This survey allows an objective evaluation of how effective an educational-rehabilitation intervention is. This study was performed on a population of 40 subjects (34 males and 6 females) diagnosed with developmental disorders (DS), specifically with a diagnosis of the autism spectrum disorders according to the DSM-5. The 40 subjects were treated in “la Comunicazione” centers, whose headquarters are near Bari, Brindisi and Rome. The results demonstrate a statistical significance in a correlation among the observed variables: supervisory status, attention, general dynamic coordination, understanding and execution of orders, performing simple unshielded rhythmic beats, word rhythm, oral praxies, phono-articulatory praxies, pronunciation of vowels, execution of graphemes, visual perception, acoustic perception, proprioceptive sensitivity, selective attention, short-term memory, segmental coordination, performance of simple rhythmic beatings, word rhythm, voice setting, intonation of sounds within a fifth, vowel pronunciation, consonant pronunciation, graphematic decoding, syllabic decoding, pronunciation of caudate syllables, coding of final syllable consonant, lexical decoding, phoneme-grapheme conversion, homographic grapheme decoding, homogeneous grapheme decoding, graphic stroke

    The influence of dynamics and speech on understanding humanoid facial expressions

    Get PDF
    Human communication relies mostly on nonverbal signals expressed through body language. Facial expressions, in particular, convey emotional information that allows people involved in social interactions to mutually judge the emotional states and to adjust its behavior appropriately. First studies aimed at investigating the recognition of facial expressions were based on static stimuli. However, facial expressions are rarely static, especially in everyday social interactions. Therefore, it has been hypothesized that the dynamics inherent in a facial expression could be fundamental in understanding its meaning. In addition, it has been demonstrated that nonlinguistic and linguistic information can contribute to reinforce the meaning of a facial expression making it easier to be recognized. Nevertheless, few studies have been performed on realistic humanoid robots. This experimental work aimed at demonstrating the human-like expressive capability of a humanoid robot by examining whether the effect of motion and vocal content influenced the perception of its facial expressions. The first part of the experiment aimed at studying the recognition capability of two kinds of stimuli related to the six basic expressions (i.e. anger, disgust, fear, happiness, sadness, and surprise): static stimuli, that is, photographs, and dynamic stimuli, that is, video recordings. The second and third parts were focused on comparing the same six basic expressions performed by a virtual avatar and by a physical robot under three different conditions: (1) muted facial expressions, (2) facial expressions with nonlinguistic vocalizations, and (3) facial expressions with an emotionally neutral verbal sentence. The results show that static stimuli performed by a human being and by the robot were more ambiguous than the corresponding dynamic stimuli on which motion and vocalization were associated. This hypothesis has been also investigated with a 3-dimensional replica of the physical robot demonstrating that even in case of a virtual avatar, dynamic and vocalization improve the emotional conveying capability

    Prosody generation for text-to-speech synthesis

    Get PDF
    The absence of convincing intonation makes current parametric speech synthesis systems sound dull and lifeless, even when trained on expressive speech data. Typically, these systems use regression techniques to predict the fundamental frequency (F0) frame-by-frame. This approach leads to overlysmooth pitch contours and fails to construct an appropriate prosodic structure across the full utterance. In order to capture and reproduce larger-scale pitch patterns, we propose a template-based approach for automatic F0 generation, where per-syllable pitch-contour templates (from a small, automatically learned set) are predicted by a recurrent neural network (RNN). The use of syllable templates mitigates the over-smoothing problem and is able to reproduce pitch patterns observed in the data. The use of an RNN, paired with connectionist temporal classification (CTC), enables the prediction of structure in the pitch contour spanning the entire utterance. This novel F0 prediction system is used alongside separate LSTMs for predicting phone durations and the other acoustic features, to construct a complete text-to-speech system. Later, we investigate the benefits of including long-range dependencies in duration prediction at frame-level using uni-directional recurrent neural networks. Since prosody is a supra-segmental property, we consider an alternate approach to intonation generation which exploits long-term dependencies of F0 by effective modelling of linguistic features using recurrent neural networks. For this purpose, we propose a hierarchical encoder-decoder and multi-resolution parallel encoder where the encoder takes word and higher level linguistic features at the input and upsamples them to phone-level through a series of hidden layers and is integrated into a Hybrid system which is then submitted to Blizzard challenge workshop. We then highlight some of the issues in current approaches and a plan for future directions of investigation is outlined along with on-going work

    Models and analysis of vocal emissions for biomedical applications

    Get PDF
    This book of Proceedings collects the papers presented at the 3rd International Workshop on Models and Analysis of Vocal Emissions for Biomedical Applications, MAVEBA 2003, held 10-12 December 2003, Firenze, Italy. The workshop is organised every two years, and aims to stimulate contacts between specialists active in research and industrial developments, in the area of voice analysis for biomedical applications. The scope of the Workshop includes all aspects of voice modelling and analysis, ranging from fundamental research to all kinds of biomedical applications and related established and advanced technologies

    A Study of Non-Linguistic Utterances for Social Human-Robot Interaction

    Get PDF
    The world of animation has painted an inspiring image of what the robots of the future could be. Taking the robots R2D2 and C3PO from the Star Wars films as representative examples, these robots are portrayed as being more than just machines, rather, they are presented as intelligent and capable social peers, exhibiting many of the traits that people have also. These robots have the ability to interact with people, understand us, and even relate to us in very personal ways through a wide repertoire of social cues. As robotic technologies continue to make their way into society at large, there is a growing trend toward making social robots. The field of Human-Robot Interaction concerns itself with studying, developing and realising these socially capable machines, equipping them with a very rich variety of capabilities that allow them to interact with people in natural and intuitive ways, ranging from the use of natural language, body language and facial gestures, to more unique ways such as expression through colours and abstract sounds. This thesis studies the use of abstract, expressive sounds, like those used iconically by the robot R2D2. These are termed Non-Linguistic Utterances (NLUs) and are a means of communication which has a rich history in film and animation. However, very little is understood about how such expressive sounds may be utilised by social robots, and how people respond to these. This work presents a series of experiments aimed at understanding how NLUs can be utilised by a social robot in order to convey affective meaning to people both young and old, and what factors impact on the production and perception of NLUs. Firstly, it is shown that not all robots should use NLUs. The morphology of the robot matters. People perceive NLUs differently across different robots, and not always in a desired manner. Next it is shown that people readily project affective meaning onto NLUs though not in a coherent manner. Furthermore, people's affective inferences are not subtle, rather they are drawn to well established, basic affect prototypes. Moreover, it is shown that the valence of the situation in which an NLU is made, overrides the initial valence of the NLU itself: situational context biases how people perceive utterances made by a robot, and through this, coherence between people in their affective inferences is found to increase. Finally, it is uncovered that NLUs are best not used as a replacement to natural language (as they are by R2D2), rather, people show a preference for them being used alongside natural language where they can play a supportive role by providing essential social cues
    corecore