136 research outputs found

    A review of affective computing: From unimodal analysis to multimodal fusion

    Get PDF
    Affective computing is an emerging interdisciplinary research field bringing together researchers and practitioners from various fields, ranging from artificial intelligence, natural language processing, to cognitive and social sciences. With the proliferation of videos posted online (e.g., on YouTube, Facebook, Twitter) for product reviews, movie reviews, political views, and more, affective computing research has increasingly evolved from conventional unimodal analysis to more complex forms of multimodal analysis. This is the primary motivation behind our first of its kind, comprehensive literature review of the diverse field of affective computing. Furthermore, existing literature surveys lack a detailed discussion of state of the art in multimodal affect analysis frameworks, which this review aims to address. Multimodality is defined by the presence of more than one modality or channel, e.g., visual, audio, text, gestures, and eye gage. In this paper, we focus mainly on the use of audio, visual and text information for multimodal affect analysis, since around 90% of the relevant literature appears to cover these three modalities. Following an overview of different techniques for unimodal affect analysis, we outline existing methods for fusing information from different modalities. As part of this review, we carry out an extensive study of different categories of state-of-the-art fusion techniques, followed by a critical analysis of potential performance improvements with multimodal analysis compared to unimodal analysis. A comprehensive overview of these two complementary fields aims to form the building blocks for readers, to better understand this challenging and exciting research field

    From Being to Becoming: Protests, Festivals, and Musical Mediations of Igorot Indigeneity

    Full text link
    Case studies that highlight the complex musical lives of Igorots, a minority group from Northern Philippines, remain sparse in ethnomusicological studies on Philippine indigenous music. Due largely to colonial racial logics and postcolonial nationalism, scholarship on Igorot music has been driven by essentialism and an attachment to cultural purity; it refuses consideration of indigenous people as agents who engage contemporary realities. My dissertation confronts these issues by illuminating conflicting expressions of Igorotness demonstrated through past and present discourse and the case studies of two Igorot groups who performed in protests and festivals in the Philippines in 2017 and 2018. Compelled by clashing politics, diverse audiences, internal community frictions, and subjective desires, members of both groups grappled with their identities through musical performances in public and intimate settings. From their enactments, Igorotness emerged as at once commemorative, politically pointed, unconstrained by “tradition,” and radically transformed. Adapting postcolonial analysis and theories on indigeneity, performance, and practice through historical critique and ethnography, I demonstrate that Igorotness is less a fixed category of difference than it is a field where identity is constantly contested. This work challenges dominant scholarship by disrupting canonical expressions of indigenous musical identity. It attends to musical performance as a tool for dialogically engaging various forms of Igorot self-awareness, and pieces together discrepant narratives to reveal a wide-ranging sense of human dynamism. I foreground Igorots’ intricate trajectories and struggles for self-determination as seen in their musical lives. This dissertation’s chapters evoke dialectic tension, rupture, and continual emergence—each succeeding narrative unsettles those before it and carves out new possibilities for representation. Chapter One examines selected writings and scholarly-artistic movements from the Spanish and US colonial eras to the early twenty-first century. I investigate the influence of colonial and postcolonial cultural politics on the knowledge production of Cordillera music while outlining epistemic shifts in a gradual overcoming of essentialism. Then, I discuss contemporary Igorot musical practices, beginning with Igorot protest music, its hybridity, and historical and ideological footings in Igorot knowledge and Philippine leftist politics in Chapter Two. Chapter Three complicates this narrative, focusing on the cultural ensemble Dap-ayan ti Kultura iti Kordilyera (DKK) to reveal the vulnerability of Igorot protest musical practices to misreadings and disapproval by varied audiences. Analyzing opposing performance strategies that DKK employed in response to these issues, I demonstrate how both overt and unconventionally oblique references to Igorot activism strengthen political legitimacy. In Chapter Four, I turn to musical displays in state-sponsored indigenous community festivals. Tracing the practice’s evolution from tactical exercises that supported US imperial control to celebrations of official self-governance, I portray festival performances as symbols of continuity and resistance that serve to reclaim Igorot heritage. Chapter Five unveils how festivals counteract grassroots notions of division, difference, and autonomy, and constrain Igorot self-expression. Delving into the experiences of delegates from the municipality of Sagada and an intimate, impromptu musical moment that affirmed their syncretic realities, I dismantle idealizations of festivals as spaces for Igorot empowerment.PHDMusic: MusicologyUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttp://deepblue.lib.umich.edu/bitstream/2027.42/169788/1/lddece_1.pd

    The Perception of Emotion from Acoustic Cues in Natural Speech

    Get PDF
    Knowledge of human perception of emotional speech is imperative for the development of emotion in speech recognition systems and emotional speech synthesis. Owing to the fact that there is a growing trend towards research on spontaneous, real-life data, the aim of the present thesis is to examine human perception of emotion in naturalistic speech. Although there are many available emotional speech corpora, most contain simulated expressions. Therefore, there remains a compelling need to obtain naturalistic speech corpora that are appropriate and freely available for research. In that regard, our initial aim was to acquire suitable naturalistic material and examine its emotional content based on listener perceptions. A web-based listening tool was developed to accumulate ratings based on large-scale listening groups. The emotional content present in the speech material was demonstrated by performing perception tests on conveyed levels of Activation and Evaluation. As a result, labels were determined that signified the emotional content, and thus contribute to the construction of a naturalistic emotional speech corpus. In line with the literature, the ratings obtained from the perception tests suggested that Evaluation (or hedonic valence) is not identified as reliably as Activation is. Emotional valence can be conveyed through both semantic and prosodic information, for which the meaning of one may serve to facilitate, modify, or conflict with the meaning of the other—particularly with naturalistic speech. The subsequent experiments aimed to investigate this concept by comparing ratings from perception tests of non-verbal speech with verbal speech. The method used to render non-verbal speech was low-pass filtering, and for this, suitable filtering conditions were determined by carrying out preliminary perception tests. The results suggested that nonverbal naturalistic speech provides sufficiently discernible levels of Activation and Evaluation. It appears that the perception of Activation and Evaluation is affected by low-pass filtering, but that the effect is relatively small. Moreover, the results suggest that there is a similar trend in agreement levels between verbal and non-verbal speech. To date it still remains difficult to determine unique acoustical patterns for hedonic valence of emotion, which may be due to inadequate labels or the incorrect selection of acoustic parameters. This study has implications for the labelling of emotional speech data and the determination of salient acoustic correlates of emotion

    Turn-Taking in Human Communicative Interaction

    Get PDF
    The core use of language is in face-to-face conversation. This is characterized by rapid turn-taking. This turn-taking poses a number central puzzles for the psychology of language. Consider, for example, that in large corpora the gap between turns is on the order of 100 to 300 ms, but the latencies involved in language production require minimally between 600ms (for a single word) or 1500 ms (for as simple sentence). This implies that participants in conversation are predicting the ends of the incoming turn and preparing in advance. But how is this done? What aspects of this prediction are done when? What happens when the prediction is wrong? What stops participants coming in too early? If the system is running on prediction, why is there consistently a mode of 100 to 300 ms in response time? The timing puzzle raises further puzzles: it seems that comprehension must run parallel with the preparation for production, but it has been presumed that there are strict cognitive limitations on more than one central process running at a time. How is this bottleneck overcome? Far from being 'easy' as some psychologists have suggested, conversation may be one of the most demanding cognitive tasks in our everyday lives. Further questions naturally arise: how do children learn to master this demanding task, and what is the developmental trajectory in this domain? Research shows that aspects of turn-taking such as its timing are remarkably stable across languages and cultures, but the word order of languages varies enormously. How then does prediction of the incoming turn work when the verb (often the informational nugget in a clause) is at the end? Conversely, how can production work fast enough in languages that have the verb at the beginning, thereby requiring early planning of the whole clause? What happens when one changes modality, as in sign languages -- with the loss of channel constraints is turn-taking much freer? And what about face-to-face communication amongst hearing individuals -- do gestures, gaze, and other body behaviors facilitate turn-taking? One can also ask the phylogenetic question: how did such a system evolve? There seem to be parallels (analogies) in duetting bird species, and in a variety of monkey species, but there is little evidence of anything like this among the great apes. All this constitutes a neglected set of problems at the heart of the psychology of language and of the language sciences. This research topic welcomes contributions from right across the board, for example from psycholinguists, developmental psychologists, students of dialogue and conversation analysis, linguists interested in the use of language, phoneticians, corpus analysts and comparative ethologists or psychologists. We welcome contributions of all sorts, for example original research papers, opinion pieces, and reviews of work in subfields that may not be fully understood in other subfields

    Ecocinema Theory and Practice 2

    Get PDF
    This second volume builds on the initial groundwork laid by Ecocinema Theory and Practice by examining the ways in which ecocritical cinema studies have matured and proliferated over the last decade, opening whole new areas of study and research. Featuring fourteen new essays organized into three sections around the themes of cinematic materialities, discourses, and communities, the volume explores a variety of topics within ecocinema studies from examining specifc national and indigenous flm contexts to discussing ecojustice, environmental production studies, flm festivals, and political ecology. The breadth of the contributions exemplifes how ecocinema scholars worldwide have sought to overcome the historical legacy of binary thinking and intellectual norms and are working to champion new ecocritical, intersectional, decolonial, queer, feminist, Indigenous, vitalist, and other emergent theories and cinematic prac-tices. The collection also demonstrates the unique ways that cinema studies scholarship is actively addressing environmental injustice and the climate crisis. This book is an invaluable resource for students and scholars of ecocritical flm and media studies, production studies, cultural studies, and environmental studies.https://cupola.gettysburg.edu/books/1181/thumbnail.jp

    Ecocinema Theory and Practice 2

    Get PDF
    This second volume builds on the initial groundwork laid by Ecocinema Theory and Practice by examining the ways in which ecocritical cinema studies have matured and proliferated over the last decade, opening whole new areas of study and research. Featuring fourteen new essays organized into three sections around the themes of cinematic materialities, discourses, and communities, the volume explores a variety of topics within ecocinema studies from examining specific national and indigenous film contexts to discussing ecojustice, environmental production studies, film festivals, and political ecology. The breadth of the contributions exemplifies how ecocinema scholars worldwide have sought to overcome the historical legacy of binary thinking and intellectual norms and are working to champion new ecocritical, intersectional, decolonial, queer, feminist, Indigenous, vitalist, and other emergent theories and cinematic practices. The collection also demonstrates the unique ways that cinema studies scholarship is actively addressing environmental injustice and the climate crisis. This book is an invaluable resource for students and scholars of ecocritical film and media studies, production studies, cultural studies, and environmental studies

    'Thinking-through-Complicity' with Te Iwi o Ngāti Hauiti: Towards a Critical Use of Participatory Video for Research

    Get PDF
    This thesis explores some of the seductions and dangers of participatory video for research (PVR) involving Indigenous Māori and Pākehā research partners. The project within which PVR was used focused on exploring relationships between place, identity and social cohesion within ‘remote’ rural communities. It involved about 15 members of the Potaka whānau of Te Iwi o Ngāti Hauiti in the central Rangitīkei district of the North Island, Aotearoa New Zealand. A small group of iwi members, myself and an audiovisual specialist and trainer negotiated the project’s focus, process and ethics during 1998. A different group of iwi members were then trained in video production and community research methods later that year and supported to produce their own productions, and carry out video research interviews with other iwi members. The entire process of negotiation, training and collaborative research was filmed for archival and research purposes with everyone’s consent, and several collaborative publications and presentations have been produced since 1999. The discursive space opened up by Ngāti Hauiti’s engagement with, and use of, video provides an opportunity to attend to the ‘cultural mediations’ that occurred throughout the research partnership and to inquire into the possible ‘empire building effects’ of visual technologies within participatory research more generally. The focus on PVR within a Māori context also prompts questions about the visual’s transformative potential within geographic research, and the implications of working through the use of a visual medium for rethinking disciplinary practices and knowledges, particularly when working cross-culturally. In the thesis, I first review the evolution and attendant challenges associated with both the use of participation and video within research contexts. I trace their similar origins in modernist attempts to ‘know’ and ‘empower’ marginalised others, and highlight the ongoing marginalisation of Indigenous perspectives within mainstream debates. I then engage with conceptualisations of complicity and develop an analytical framework that expands on current discursive and ideological discussions to also attend to its material, embodied and spatial dimensions. Using this framework and a complementary autoethnographic and ‘hyper-self-reflexive’ approach, I track aspects of my own power, complicity and desire within my research practice in the PVR project during the period 1998-2001. This approach involves the development of a particular reading position to focus on critical incidents of my research practice and a means of grappling productively with the polyvalent nature of my audiovisual and other information sources. I discuss these critical incidents within three processes associated with the research: facilitation, production and reception, attending to the complex and multifaceted interplay of audiovisual texts, their producers and their audiences throughout. Such a thesis is expedient given that powerful and often uncritical rhetoric that besets participatory research and development is fast taking hold within geography. It is also timely given the proliferation of affordable and accessible audiovisual technology and its increasing use within geography and other social sciences. As geographers respond to calls to embrace more visual, tactile and other methods, this thesis offers possibilities for the repoliticisation of participatory discourse within social geography, through a more considered engagement with participatory action research, Indigenous research practices and audiovisual media such as video. I offer cautionary insights into the ‘power-full’ effects of these ways of working

    The pragmatics of monologue: interaction in video blogs

    Get PDF
    This study reports an in-depth pragmatic analysis of spoke monologues as they appear in video blogs (vlogs). Vlogs are videos of a person talking into the camera, which are edited and subsequently uploaded to video sharing websites such as YouTube, where they appear in a highly multimodal environment. Using methods situated in the fields of Conversation Analysis, Interactive Sociolinguistics and multimodality studies, the research presented investigates a wide range of speaker strategies. These strategies are studied with regard to their form, frequency of occurrence, how they are adapted to the specific context of language use and, where possible, what effect this has on vlog viewers. The strategies and phenomena under investigation are the openings and closings of monologues; repetition and involvement strategies; pointing gestures and video-comment coherence in the virtual online space. Comparison with another monologic genre, the TED Talk, reveals that the monologue setting itself is an influential variable that naturally shapes a vlogger’s or lecturer’s way of speaking compared to conversational settings. However, more specifically, the comparison also shows that the contextual differences between a stage monologue at a TED conference and a camera monologue as part of a vlog, can be significant in terms of their influence on the interaction that takes place.Diese Forschungsarbeit untersucht gesprochene Monologe wie sie in Videoblogs (Vlogs) vorkommen aus linguistisch-pragmatischer Sicht. Vlogs sind Videos von Menschen, die in die Kamera sprechen. Diese Videos werden nach der Aufnahme bearbeitet und dann auf Videoseiten im Internet, wie zum Beispiel YouTube, hochgeladen, wo sie im Rahmen einer multimodal komplexen Webseite einer breiten Öffentlichkeit zugĂ€nglich sind. Die Vlogmonologe werden mit Hilfe von Methoden aus der Konversationsanalyse, der Interaktionalen Soziolinguistik und Studien zur MultimodalitĂ€t auf die Strategien der Sprecher beleuchtet. Diese Strategien werden untersucht im Hinblick auf ihre Form, HĂ€ufigkeit, wie sie an den spezifischen Sprechkontext angepasst sind und, wo möglich, was fĂŒr einen Effekt sie auf die Zuschauer haben. Die Strategien und PhĂ€nomene, die hier untersucht werden sind Anfangs- und Schlusssequenzen; Wiederholungen und Involvement-Strategien; Zeigegesten und die KohĂ€renz zwischen Video und Zuschauerkommentaren im virtuellen Raum. Der Vergleich mit einer anderen monologischen Gattung, dem TED-Talk, zeigt, dass die Sprechsituation im Monolog eine einflussreiche Variable ist verglichen mit konversationellen Situationen. Der Vergleich zeigt aber weiterhin, dass die kontextuellen Unterschiede zwischen Vlogs, die vor der Kamera entstehen, und TED Talks, die auf einer BĂŒhne vor Publikum vorgetragen werden, signifikant sein können hinsichtlich ihres Einflusses auf die stattfindende Interaktion

    Ecocinema Theory and Practice 2

    Get PDF
    This second volume builds on the initial groundwork laid by Ecocinema Theory and Practice by examining the ways in which ecocritical cinema studies have matured and proliferated over the last decade, opening whole new areas of study and research. Featuring fourteen new essays organized into three sections around the themes of cinematic materialities, discourses, and communities, the volume explores a variety of topics within ecocinema studies from examining specific national and indigenous film contexts to discussing ecojustice, environmental production studies, film festivals, and political ecology. The breadth of the contributions exemplifies how ecocinema scholars worldwide have sought to overcome the historical legacy of binary thinking and intellectual norms and are working to champion new ecocritical, intersectional, decolonial, queer, feminist, Indigenous, vitalist, and other emergent theories and cinematic practices. The collection also demonstrates the unique ways that cinema studies scholarship is actively addressing environmental injustice and the climate crisis. This book is an invaluable resource for students and scholars of ecocritical film and media studies, production studies, cultural studies, and environmental studies
    • 

    corecore