319 research outputs found

    Relations between music and speech from the perspectives of dynamics, timbre and pitch

    Get PDF
    Despite the vast amount of scholarly effort to compare music and speech from a wide range of perspectives, some of the most fundamental aspects of music and speech still remain unexplored. This PhD thesis tackles three aspects essential to the understanding of the relations between music and speech: dynamics, timbre and pitch. In terms of dynamics, previous research has used perception experiments where dynamics is represented by acoustic intensity, with little attention to the fact that dynamics is an important mechanism of motor movements in both music performance and speech production. Therefore, the first study of this thesis compared the dynamics of music and speech using production experiments with a focus on motor movements: finger force in affective piano performance was used as an index of music dynamics and articulatory effort in affective Mandarin speech was used as an index of speech dynamics. The results showed both similarities and differences between the two domains. With regard to timbre, there has been a long-held observation that the timbre of musical instruments mimics human voice, particularly in terms of conveying emotions. However, little research has been done to empirically investigate the emotional connotations of the timbre of isolated sounds of musical instruments in relation to affective human speech. Hence, the second study explored this issue using behavioral and ERP methods. The results largely supported previous observations, although some fundamental differences also existed. In terms of pitch, some studies have mentioned that music could have close relations with speech with regard to pitch prominence and expectation patterns. Nevertheless, the functional differences of pitch in music and speech could also imply that speech does not necessarily follow the same pitch patterns as music in conveying prominence and expectation. So far there is little empirical evidence to either support or refute the aforementioned observations. Hence the third study examined this issue. The results showed the differences outweighed the similarities between music and speech in terms of pitch prominence and expectation. In conclusion, from three perspectives essential to music and speech, this thesis has shed new light on the overlapping yet distinct relations between the two domains

    Characterizing first and second language rhythm in English using spectral coherence between temporal envelope and mouth opening-closing movements

    Full text link
    This study investigated the rhythmic differences between first and second language English from 19 native speakers of American English and an equal number of native speakers of Mandarin. Speech rhythm was viewed from MacNeilage's frame/content theory. The spectral coherence between the temporal envelope and the mouth opening and closing kinematics was computed to operationalize the rhythmic frame. The spectral centroid, spread, rolloff, flatness, and entropy were calculated to reveal the frequency distribution patterns in the coherence. Using a binary logistic regression model, these measures were collectively found to be effective in characterizing rhythmic differences between native and non-native groups (Aâ€Č = 0.71 and B″D = –0.06). Specifically, the native group was significantly higher than the non-native group in terms of spectral centroid and spread, whereas the native group was significantly lower than its non-native counterpart in terms of spectral flatness and entropy. Both groups were not significantly different in spectral rolloff. Possible explanations for the result as well as the efficacy of employing the aforesaid coherence in speech rhythm research in general were discussed

    The phonetics of speech breathing : pauses, physiology, acoustics, and perception

    Get PDF
    Speech is made up of a continuous stream of speech sounds that is interrupted by pauses and breathing. As phoneticians are primarily interested in describing the segments of the speech stream, pauses and breathing are often neglected in phonetic studies, even though they are vital for speech. The present work adds to a more detailed view of both pausing and speech breathing with a special focus on the latter and the resulting breath noises, investigating their acoustic, physiological, and perceptual aspects. We present an overview of how a selection of corpora annotate pauses and pause-internal particles, as well as a recording setup that can be used for further studies on speech breathing. For pauses, this work emphasized their optionality and variability under different tempos, as well as the temporal composition of silence and breath noise in breath pauses. For breath noises, we first focused on acoustic and physiological characteristics: We explored alignment between the onsets and offsets of audible breath noises with the start and end of expansion of both rib cage and abdomen. Further, we found similarities between speech breath noises and aspiration phases of /k/, as well as that breath noises may be produced with a more open and slightly more front place of articulation than realizations of schwa. We found positive correlations between acoustic and physiological parameters, suggesting that when speakers inhale faster, the resulting breath noises were more intense and produced more anterior in the mouth. Inspecting the entire spectrum of speech breath noises, we showed relatively flat spectra and several weak peaks. These peaks largely overlapped with resonances reported for inhalations produced with a central vocal tract configuration. We used 3D-printed vocal tract models representing four vowels and four fricatives to simulate in- and exhalations by reversing airflow direction. We found the direction to not have a general effect for all models, but only for those with high-tongue configurations, as opposed to those that were more open. Then, we compared inhalations produced with the schwa-model to human inhalations in an attempt to approach the vocal tract configuration in speech breathing. There were some similarities, however, several complexities of human speech breathing not captured in the models complicated comparisons. In two perception studies, we investigated how much information listeners could auditorily extract from breath noises. First, we tested categorizing different breath noises into six different types, based on airflow direction and airway usage, e.g. oral inhalation. Around two thirds of all answers were correct. Second, we investigated how well breath noises could be used to discriminate between speakers and to extract coarse information on speaker characteristics, such as age (old/young) and sex (female/male). We found that listeners were able to distinguish between two breath noises coming from the same or different speakers in around two thirds of all cases. Hearing one breath noise, classification of sex was successful in around 64%, while for age it was 50%, suggesting that sex was more perceivable than age in breath noises.Deutsche Forschungsgemeinschaft (DFG) – Projektnummer 418659027: "Pause-internal phonetic particles in speech communication

    What can autism teach us about the role of sensorimotor systems in higher cognition? New clues from studies on language, action semantics, and abstract emotional concept processing.

    Get PDF
    Within the neurocognitive literature there is much debate about the role of the motor system in language, social communication and conceptual processing. We suggest, here, that autism spectrum conditions (ASC) may afford an excellent test case for investigating and evaluating contemporary neurocognitive models, most notably a neurobiological theory of action perception integration where widely-distributed cell assemblies linking neurons in action and perceptual brain regions act as the building blocks of many higher cognitive functions. We review a literature of functional motor abnormalities in ASC, following this with discussion of their neural correlates and aberrancies in language development, explaining how these might arise with reference to the typical formation of cell assemblies linking action and perceptual brain regions. This model gives rise to clear hypotheses regarding language comprehension, and we highlight a recent set of studies reporting differences in brain activation and behaviour in the processing of action-related and abstract-emotional concepts in individuals with ASC. At the neuroanatomical level, we discuss structural differences in long-distance frontotemporal and frontoparietal connections in ASC, such as would compromise information transfer between sensory and motor regions. This neurobiological model of action perception integration may shed light on the cognitive and social-interactive symptoms of ASC itself, building on and extending earlier proposals linking autistic symptomatology to motor disorder and dysfunction in action perception integration. Further investigating the contribution of motor dysfunction to higher cognitive and social impairment, we suggest, is timely and promising as it may advance both neurocognitive theory and the development of new clinical interventions for this population and others characterised by early and pervasive motor disruption

    A psycho-philosophical investigation of the perception of emotional meaning in the performance of solo singing (19th century German lied repertoire).

    No full text
    The research in this thesis is a contribution to the study of expression in music performance. It is primarily concerned with the study of emotion expressiveness in the performance of singing. This thesis focuses on the Western Art Music Canon and with solo singing, in particular, (within the Nineteenth Century German Lied repertoire) involving verbal and/or dramatic action. Though always considered within its musical context, the emotionally expressive character of singing with the words and/or a dramatic context, usually with an explicit narrative, is taken together and considered as a complex yet integrated whole. Two different approaches are adopted: - Investigating the production of the facial movements and vocal sounds directly involved within the expression of emotion in singing. - Exploring how the performed elements are perceived, recognised and experienced by the audience. The movements and sounds involved in the expression of emotion in singing, and the relation between them, are interpreted, compared and analysed through an examination of the studies of authors whose research has been connected with the detection and analysis of the emotion in everyday life and in music performance. Comparing and combining visual and acoustical expressive elements of the performance of singing, the research in this thesis investigates the relative effects of one on the other and seeks to determine through empirical work which are crucial to the production and perception of emotional meaning in singing. These studies range from the conventional experiment in which the data are analysed statistically to individual subjective reports. The conventional experiments examine particular effects, whereas the subjective reports are used to address the more diverse properties of the performance. The experiments range from descriptive to quantitative measurements of the expressive parameters of emotion where 'ecological validity' has been preserved by using realistic performance data. Two professional singers were filmed, videotaped and recorded in performance. Each singer performed in five different emotional conditions (anger, fear, happiness, sadness, and neutral). The performances (videotaped or real-time) were shown to fifteen audience members, who perceived facial and vocal expressions of emotion for each performer. Audience accuracy of the performed emotions was measured by comparing each performer's intentional expression with the audience's recognition of emotional meaning. Results showed a high rate of decoding accuracy of the performer's intended emotional (facial and vocal) expressiveness. A variety of empirical techniques were used including tracking and 'point-light' technique (Experiment-I and II), semantic differentials - Verbal and Non Verbal- (Experiment II and III), interviews with the performers and audience (Experiment-III and IV), and 'ecologically valid' real time performance assessment (Experiment IV). Experiment-I, by using point-light technique and digitised computer tracking and measurement, demonstrated that it is possible to differentiate emotional facial expressiveness in singing by using a purely quantitative measurement technique (that is, without recurring to the subjectivity of the observer). Experiment II, by using point-light technique and semantic differentials (with descriptive emotional terms), demonstrated that kinematics alone provide enough information to distinguish between different expressive manners in the performer's facial behaviour when singing with different emotional meaning. Experiment III (videotaped performances with emotional content), by using semantic differentials and interviews with the performers, showed a high degree of consistency in the expressive elements (acoustical and visual) across repeated performances within the different emotional conditions. Experiment IV (real time performance), by also using semantic differentials and interviews with the performers and audience, showed a high degree of consistency in the expressive elements (acoustical and visual) across repeated performances within the different emotional conditions. It also revealed that musical structure and the performer's intention to enhance emotional meaning are important co-determinants of the communication process. Chapter 1 presents the theoretical background on music/performance and emotion. Chapter 2 presents the theoretical-practical background on music/performance and emotion. Chapter 3 presents the empirical background on music/performance and emotion. A final aim of the studies has been to elaborate an expressive analytical tool that provides singers, singing teachers and students of singing with a reliable audience feedback about their capacities to communicate emotional meaning whilst singing. Since the author is both a performer and teacher it was essential for the investigation to integrate theory and practice. Therefore, a significant focus was also on the development of a recital, which aimed to stimulate debate for the theoretical and empirical results of this thesis. In fact, though performers have shown a partial knowledge of the expressive devices they used, and though all the experiments (videotaped or real-time) showed a high rate of the audience's decoding accuracy of the performance's intended emotional expressions, Experiment IV revealed that the audience's recognition of the performed emotional meaning would increase significantly if the performer, by having access to the audience's cognitive feedback, was able to check and improve the accuracy and consistency of the expressive cues used in the performance. All this experimental research is explored in Chapter 4. Chapter 5, the final chapter, presents a summary of all the empirical results. The thesis concludes with a brief discussion of further possibilities for research in the area and with the practical and theoretical implications to be drawn from this investigation

    Models and Analysis of Vocal Emissions for Biomedical Applications

    Get PDF
    The International Workshop on Models and Analysis of Vocal Emissions for Biomedical Applications (MAVEBA) came into being in 1999 from the particularly felt need of sharing know-how, objectives and results between areas that until then seemed quite distinct such as bioengineering, medicine and singing. MAVEBA deals with all aspects concerning the study of the human voice with applications ranging from the newborn to the adult and elderly. Over the years the initial issues have grown and spread also in other fields of research such as occupational voice disorders, neurology, rehabilitation, image and video analysis. MAVEBA takes place every two years in Firenze, Italy. This edition celebrates twenty-two years of uninterrupted and successful research in the field of voice analysis

    Models and Analysis of Vocal Emissions for Biomedical Applications

    Get PDF
    The International Workshop on Models and Analysis of Vocal Emissions for Biomedical Applications (MAVEBA) came into being in 1999 from the particularly felt need of sharing know-how, objectives and results between areas that until then seemed quite distinct such as bioengineering, medicine and singing. MAVEBA deals with all aspects concerning the study of the human voice with applications ranging from the neonate to the adult and elderly. Over the years the initial issues have grown and spread also in other aspects of research such as occupational voice disorders, neurology, rehabilitation, image and video analysis. MAVEBA takes place every two years always in Firenze, Italy. This edition celebrates twenty years of uninterrupted and succesfully research in the field of voice analysis

    Laterality and Babble: Does asymmetry in lip opening during babble indicate increasing left hemisphere dominance as babies gain articulatory experience?

    Get PDF
    Speech and language are supported by task-dependent neural networks that are predominantly lateralised to the left hemisphere of the brain, whilst emotion is supported by predominantly right hemispheric networks. This is reflected in the asymmetry of lip openings during speech and facial expressions in adults. One cross-sectional orofacial asymmetry study found an analogous distinction between 5-12-month-old babies’ lip openings during reduplicated babble and during positively valenced emotional facial expressions and this has been interpreted as evidence to support the hypothesis that babble is fundamentally linguistic in nature (Holowka & Petitto, 2002). However, a similar distinction is also observed in orofacial behaviours in some non-human primates. Differential hemispheric specialisation for emotional and vocal communicative functions may then be an ancient trait, long predating human language. Additionally, characterising babble as babies’ immature attempts to do language marginalises the critical role of endogenously motivated vocal exploration and may assume a degree of goal-directedness in infant behaviour around the time of babble emergence for which we have little other supporting evidence. This thesis explores laterality in eight 5-12-month-old’s babble, positive facial expressions, and other vocalisations longitudinally. Singleton and variegated babble are captured as well as reduplicated babble, and an alternative method for analysing orofacial asymmetry – hemimouth measurement – is used. Overall, Holowka and Petitto’s between-category distinction was replicated. However, babble was found to show right laterality at emergence and become left lateralised gradually over developmental time. Some interactional effect of utterance complexity was also observed. Bisyllabic babbles showed significant leftward shift over developmental time, whilst monosyllabic and polysyllabic babbles did not. Furthermore, hemimouth measurement revealed a degree of real-time variability in the laterality of babble not previously observed. An alternative theory of the underlying nature of babble – the Old Parts, New Machine hypothesis – is proposed

    The phylogenetic origin and mechanism of sound symbolism - the role of action-perception circuits

    Get PDF
    As opposed to the classic Saussurean view on the arbitrary relationship between linguistic form and meaning, non-arbitrariness is a pervasive feature in human language. Sound symbolism—namely, the intrinsic relationship between meaningless speech sounds and visual shapes—is a typical case of non-arbitrariness. A demonstration of sound symbolism is the “maluma-takete” effect, in which immanent links are observed between meaningless ‘round’ or ‘sharp’ speech sounds (e.g., maluma vs. takete) and round or sharp abstract visual shapes, respectively. An extensive amount of empirical work suggests that these mappings are shared by humans and play a distinct role in the emergence and acquisition of language. However, important questions are still pending on the origins and mechanism of sound symbolic processing. Those questions are addressed in the present work. The first part of this dissertation focuses on the validation of sound symbolic effects in a forced choice task, and on the interaction of sound symbolism with two crossmodal mappings shared by humans. To address this question, human subjects were tested with a forced choice task on sound symbolic mappings crossed with two crossmodal audiovisual mappings (pitch-shape and pitch-spatial position). Subjects performed significantly above chance only for the sound symbolic associations but not for the other two mappings. Sound symbolic effects were replicated, while the other two crossmodal mappings involving low-level audiovisual properties, such as pitch and spatial position, did not emerge. The second issue examined in the present dissertation are the phylogenetic origins of sound symbolic associations. Human subjects and a group of touchscreen trained great apes were tested with a forced choice task on sound symbolic mappings. Only humans were able to process and/or infer the links between meaningless speech sounds and abstract shapes. These results reveal, for the first time, the specificity of humans’ sound symbolic ability, which can be related to neurobiological findings on the distinct development and connectivity of the human language network. The last part of the dissertation investigates whether action knowledge and knowledge of the perceptual outputs of our actions can provide a possible explanation of sound symbolic mappings. In a series of experiments, human subjects performed sound symbolic mappings, and mappings of ‘round’ or ‘sharp’ hand actions sounds with the shapes produced by these hand actions. In addition, the auditory and visual stimuli of both conditions were crossed. Subjects significantly detected congruencies for all mappings, and most importantly, a positive correlation was observed in their performances across conditions. Physical acoustic and visual similarities between the audiovisual byproducts of our hand actions with the sound symbolic pseudowords and shapes show that the link between meaningless speech sounds and abstract visual shapes is found in action knowledge. From a neurobiological perspective the link between actions and the audiovisual by-products of our actions is also in accordance with distributed action perception circuits in the human brain. Action-perception circuits, supported by the human neuroanatomical connectivity between auditory, visual, and motor cortices, and under associative learning, emerge and carry the perceptual and motor knowledge of our actions. These findings give a novel explanation for how symbolic communication is linked to our sensorimotor experiences. To sum up, the present dissertation (i) validates the presence of sound symbolic effects in a forced choice task, (ii) shows that sound symbolic ability is specific to humans, and (iii) that action knowledge can provide the mechanistic glue of mapping meaningless speech sounds to abstract shapes. Overall, the present work contributes to a better understanding of the phylogenetic origins and mechanism of sound symbolic ability in humans.Im Gegensatz zur klassischen Saussureschen Ansicht ĂŒber die willkĂŒrliche Beziehung zwischen sprachlicher Form und Bedeutung ist die NichtwillkĂŒrlichkeit ein durchdringendes Merkmal der menschlichen Sprache. Lautsymbolik—nĂ€mlich die intrinsische Beziehung zwischen bedeutungslosen Sprachlauten und visuellen Formen—ist ein typischer Fall von NichtwillkĂŒrlichkeit. Ein Beispiel fĂŒr Klangsymbolik ist der “malumatakete” Effekt, bei dem immanente Verbindungen zwischen bedeutungslosen ‘runden’ oder ‘scharfen’ Sprachlauten (z.B. maluma vs. takete) und runden bzw. scharfen abstrakten visuellen Formen beobachtet werden. Umfangreiche empirische Arbeiten legen nahe, dass diese Zuordnungen von Menschen vorgenommen werden und bei der Entstehung und dem Erwerb von Sprache eine besondere Rolle spielen. Wichtige Fragen zu Ursprung und Mechanismus der Verarbeitung von Lautsymbolen sind jedoch noch offen. Diese Fragen werden in der vorliegenden Arbeit behandelt. Der erste Teil dieser Dissertation konzentriert sich auf die Validierung von klangsymbolischen Effekten in einer Forced-Choice-Auswahlaufgabe (erzwungene Wahl) und auf die Interaktion von Klangsymbolik mit zwei crossmodalen Mappings, die von Menschen vorgenommen werden. Um dieser Frage nachzugehen, wurden menschliche Probanden mit einer Auswahlaufgabe mit zwei Auswahlmöglichkeiten auf klangsymbolische Zuordnungen getestet , die mit zwei crossmodalen audiovisuellen Zuordnungen (Tonhöhenform und Tonhöhen-Raum-Position) gekreuzt wurden. Die Versuchspersonen erbrachten nur bei den klangsymbolischen Assoziationen eine signifikant ĂŒber dem Zufall liegende Leistung, nicht aber bei den beiden anderen Zuordnungen. Tonsymbolische Effekte wurden repliziert, wĂ€hrend die beiden anderen crossmodalen Zuordnungen, die audiovisuelle Eigenschaften auf niedriger Ebene wie Tonhöhe und rĂ€umliche Position beinhalteten, nicht auftraten. Das zweite Thema, das in der vorliegenden Dissertation untersucht wird, sind die phylogenetischen UrsprĂŒnge der klangsymbolischen Assoziationen. Menschliche Versuchspersonen und eine Gruppe von Menschenaffen, die auf Touchscreens trainiert wurden, wurden mit einer Forced-Choice-Aufgabe auf klangsymbolische Zuordnungen getestet. Nur Menschen waren in der Lage, die Verbindungen zwischen bedeutungslosen Sprachlauten und abstrakten Formen zu verarbeiten und/oder abzuleiten. Diese Ergebnisse zeigen zum ersten Mal die SpezifitĂ€t der lautsymbolischen FĂ€higkeit des Menschen, die mit neurobiologischen Erkenntnissen ĂŒber die ausgeprĂ€gte Entwicklung und KonnektivitĂ€t des menschlichen Sprachnetzwerks in Verbindung gebracht werden kann. Der letzte Teil der Dissertation untersucht darĂŒber hinaus, ob Handlungswissen und das Wissen um die Wahrnehmungsergebnisse unserer Handlungen eine mögliche ErklĂ€rung fĂŒr solide symbolische Mappings liefern können. In einer Reihe von Experimenten fĂŒhrten menschliche Versuchspersonen klangsymbolische Mappings durch sowie Mappings von ‘runden’ oder ‘scharfen’ Handaktionen KlĂ€nge mit den durch diese Handaktionen erzeugten Formen. DarĂŒber hinaus wurden die auditiven und visuellen Reize beider Bedingungen gekreuzt. Die Probanden stellten bei allen Zuordnungen signifikant Kongruenzen fest, und, was am wichtigsten war, es wurde eine positive Korrelation ihrer Leistungen unter allen Bedingungen beobachtet. Physikalische akustische und visuelle Ähnlichkeiten zwischen den audiovisuellen Nebenprodukten unserer Handaktionen mit den klangsymbolischen Pseudowörtern und Formen zeigen, dass die Verbindung zwischen bedeutungslosen Sprachlauten und abstrakten visuellen Formen im Handlungswissen zu finden ist. Aus neurobiologischer Sicht stimmt die Verbindung zwischen Handlungen und den audiovisuellen Nebenprodukten unserer Handlungen auch mit den verteilten Handlungs- und WahrnehmungskreislĂ€ufen im menschlichen Gehirn ĂŒberein. Aktions- Wahrnehmungsnetzwerken, die durch die neuroanatomische KonnektivitĂ€t zwischen auditorischen, visuellen und motorischen kortikalen Arealen des Menschen unterstĂŒtzt werden, entstehen und tragen unter assoziativem Lernen das perzeptuelle und motorische Wissen unserer Handlungen. Diese Erkenntnisse geben eine neuartige ErklĂ€rung dafĂŒr, wie symbolische Kommunikation in unseren sensomotorischen Erfahrungen verknĂŒpft ist. Zusammenfassend lĂ€sst sich sagen, dass die vorliegende Dissertation (i) das Vorhandensein von lautsymbolischen Effekten in einer Forced-Choice-Aufgabe validiert, (ii) zeigt, dass lautsymbolische FĂ€higkeiten spezifisch fĂŒr Menschen sind, und (iii) dass Handlungswissen den mechanistischen Klebstoff liefern kann, um bedeutungslose Sprachlaute auf abstrakte Formen abzubilden. Insgesamt trĂ€gt die vorliegende Arbeit zu einem besseren VerstĂ€ndnis der phylogenetischen UrsprĂŒnge und des Mechanismus der lautsymbolischen FĂ€higkeit des Menschen bei
    • 

    corecore