130 research outputs found
Augev Method and an Innovative Use of Vocal Spectroscopy in Evaluating and Monitoring the Rehabilitation Path of Subjects Showing Severe Communication Pathologies
A strongly connotative element of developmental disorders (DS) is the total
or partial impairment of verbal communication and, more generally, of social
interaction. The method of Vocal-verb self-management (Augev) is a systemic organicistic method able to intervene in problems regarding verbal, spoken
and written language development successfully. This study intends to demonstrate that it is possible to objectify these progresses through a spectrographic examination of vocal signals, which detects voice phonetic-acoustic
parameters. This survey allows an objective evaluation of how effective an
educational-rehabilitation intervention is. This study was performed on a
population of 40 subjects (34 males and 6 females) diagnosed with developmental disorders (DS), specifically with a diagnosis of the autism spectrum
disorders according to the DSM-5. The 40 subjects were treated in “la Comunicazione” centers, whose headquarters are near Bari, Brindisi and Rome.
The results demonstrate a statistical significance in a correlation among the
observed variables: supervisory status, attention, general dynamic coordination, understanding and execution of orders, performing simple unshielded
rhythmic beats, word rhythm, oral praxies, phono-articulatory praxies, pronunciation of vowels, execution of graphemes, visual perception, acoustic
perception, proprioceptive sensitivity, selective attention, short-term memory, segmental coordination, performance of simple rhythmic beatings, word
rhythm, voice setting, intonation of sounds within a fifth, vowel pronunciation, consonant pronunciation, graphematic decoding, syllabic decoding,
pronunciation of caudate syllables, coding of final syllable consonant, lexical decoding, phoneme-grapheme conversion, homographic grapheme decoding,
homogeneous grapheme decoding, graphic stroke
The influence of dynamics and speech on understanding humanoid facial expressions
Human communication relies mostly on nonverbal signals expressed through body language. Facial expressions, in particular, convey emotional information that allows people involved in social interactions to mutually judge the emotional states and to adjust its behavior appropriately. First studies aimed at investigating the recognition of facial expressions were based on static stimuli. However, facial expressions are rarely static, especially in everyday social interactions. Therefore, it has been hypothesized that the dynamics inherent in a facial expression could be fundamental in understanding its meaning. In addition, it has been demonstrated that nonlinguistic and linguistic information can contribute to reinforce the meaning of a facial expression making it easier to be recognized. Nevertheless, few studies have been performed on realistic humanoid robots. This experimental work aimed at demonstrating the human-like expressive capability of a humanoid robot by examining whether the effect of motion and vocal content influenced the perception of its facial expressions. The first part of the experiment aimed at studying the recognition capability of two kinds of stimuli related to the six basic expressions (i.e. anger, disgust, fear, happiness, sadness, and surprise): static stimuli, that is, photographs, and dynamic stimuli, that is, video recordings. The second and third parts were focused on comparing the same six basic expressions performed by a virtual avatar and by a physical robot under three different conditions: (1) muted facial expressions, (2) facial expressions with nonlinguistic vocalizations, and (3) facial expressions with an emotionally neutral verbal sentence. The results show that static stimuli performed by a human being and by the robot were more ambiguous than the corresponding dynamic stimuli on which motion and vocalization were associated. This hypothesis has been also investigated with a 3-dimensional replica of the physical robot demonstrating that even in case of a virtual avatar, dynamic and vocalization improve the emotional conveying capability
Prosody generation for text-to-speech synthesis
The absence of convincing intonation makes current parametric speech
synthesis systems sound dull and lifeless, even when trained on expressive
speech data. Typically, these systems use regression techniques to predict the
fundamental frequency (F0) frame-by-frame. This approach leads to overlysmooth
pitch contours and fails to construct an appropriate prosodic structure
across the full utterance. In order to capture and reproduce larger-scale
pitch patterns, we propose a template-based approach for automatic F0 generation,
where per-syllable pitch-contour templates (from a small, automatically
learned set) are predicted by a recurrent neural network (RNN). The use of
syllable templates mitigates the over-smoothing problem and is able to reproduce
pitch patterns observed in the data. The use of an RNN, paired with connectionist
temporal classification (CTC), enables the prediction of structure in
the pitch contour spanning the entire utterance. This novel F0 prediction system
is used alongside separate LSTMs for predicting phone durations and the
other acoustic features, to construct a complete text-to-speech system. Later,
we investigate the benefits of including long-range dependencies in duration
prediction at frame-level using uni-directional recurrent neural networks.
Since prosody is a supra-segmental property, we consider an alternate approach
to intonation generation which exploits long-term dependencies of
F0 by effective modelling of linguistic features using recurrent neural networks.
For this purpose, we propose a hierarchical encoder-decoder and
multi-resolution parallel encoder where the encoder takes word and higher
level linguistic features at the input and upsamples them to phone-level
through a series of hidden layers and is integrated into a Hybrid system which
is then submitted to Blizzard challenge workshop. We then highlight some of
the issues in current approaches and a plan for future directions of investigation
is outlined along with on-going work
Models and analysis of vocal emissions for biomedical applications
This book of Proceedings collects the papers presented at the 3rd International Workshop on Models and Analysis of Vocal Emissions for Biomedical Applications, MAVEBA 2003, held 10-12 December 2003, Firenze, Italy. The workshop is organised every two years, and aims to stimulate contacts between specialists active in research and industrial developments, in the area of voice analysis for biomedical applications. The scope of the Workshop includes all aspects of voice modelling and analysis, ranging from fundamental research to all kinds of biomedical applications and related established and advanced technologies
A Study of Non-Linguistic Utterances for Social Human-Robot Interaction
The world of animation has painted an inspiring image of what the robots of the future could be. Taking the robots R2D2 and C3PO from the Star Wars films as representative examples, these robots are portrayed as being more than just machines, rather, they are presented as intelligent and capable social peers, exhibiting many of the traits that people have also. These robots have the ability to interact with people, understand us, and even relate to us in very personal ways through a wide repertoire of social cues.
As robotic technologies continue to make their way into society at large, there is a growing trend toward making social robots. The field of Human-Robot Interaction concerns itself with studying, developing and realising these socially capable machines, equipping them with a very rich variety of capabilities that allow them to interact with people in natural and intuitive ways, ranging from the use of natural language, body language and facial gestures, to more unique ways such as expression through colours and abstract sounds.
This thesis studies the use of abstract, expressive sounds, like those used iconically by the robot R2D2. These are termed Non-Linguistic Utterances (NLUs) and are a means of communication which has a rich history in film and animation. However, very little is understood about how such expressive sounds may be utilised by social robots, and how people respond to these.
This work presents a series of experiments aimed at understanding how NLUs can be utilised by a social robot in order to convey affective meaning to people both young and old, and what factors impact on the production and perception of NLUs. Firstly, it is shown that not all robots should use NLUs. The morphology of the robot matters. People perceive NLUs differently across different robots, and not always in a desired manner. Next it is shown that people readily project affective meaning onto NLUs though not in a coherent manner. Furthermore, people's affective inferences are not subtle, rather they are drawn to well established, basic affect prototypes. Moreover, it is shown that the valence of the situation in which an NLU is made, overrides the initial valence of the NLU itself: situational context biases how people perceive utterances made by a robot, and through this, coherence between people in their affective inferences is found to increase. Finally, it is uncovered that NLUs are best not used as a replacement to natural language (as they are by R2D2), rather, people show a preference for them being used alongside natural language where they can play a supportive role by providing essential social cues
- …