21 research outputs found

    Making modality: transmodal composing in a digital media studio.

    Get PDF
    The multiple media that exist for communication have historically been theorized as possessing different available means for persuasion and meaning-making. The exigence of these means has been the object of theoretical debate that ranges from cultural studies, language studies, semiology, and philosophies of the mind. This dissertation contributes to such debates by sharing the results of an ethnographically informed study of multimedia composing in a digital media studio. Drawing from Cultural Historical Activity Theory and theories of enactive perception, I analyze the organizational and infrastructural design of a media studio as well as the activity of composer/designers working in said studio. Throughout this analysis I find that implicit in the organization and infrastructure of the media studio is an ethos of conceptualizing communication technology as a legitimizing force. Such an ethos is troubled by my analysis of composer/designers working in the studio, whose activities do not seek outside legitimization but instead contribute to the media milieu. Following these analyses, I conclude that media’s means for persuasion and meaning-making emerge from local practices of communication and design. Finally, I provide a framework for studying the emergence of such means

    Multimodal Sentiment Analysis Based on Deep Learning: Recent Progress

    Get PDF
    Multimodal sentiment analysis is an important research topic in the field of NLP, aiming to analyze speakers\u27 sentiment tendencies through features extracted from textual, visual, and acoustic modalities. Its main methods are based on machine learning and deep learning. Machine learning-based methods rely heavily on labeled data. But deep learning-based methods can overcome this shortcoming and capture the in-depth semantic information and modal characteristics of the data, as well as the interactive information between multimodal data. In this paper, we survey the deep learning-based methods, including fusion of text and image and fusion of text, image, audio, and video. Specifically, we discuss the main problems of these methods and the future directions. Finally, we review the work of multimodal sentiment analysis in conversation

    Multi-form Visualisation: An approach to acousmatic composition

    Get PDF
    This practice-based doctoral research addresses a critical issue in acousmatic composition: the journey from the immaterial world of sonic imagination to the realisation of musical sound. This was an exploratory journey, where my personal sensibility for visual arts practice met my curiosity and profound interest in acousmatic music. Methodologically, the project approached acousmatic composition as an organic process, intertwining visual sensibilities and musical domains by offering a critical approach to the listening experience and to my compositional practice. A key metaphor used is that of the blank page as a space for multi-form visualisation, where gestures derived from sketching and other visual stimuli are used as guides and catalysts for the realisation of sound. In this approach, a process of deliberately blurring boundaries between real and imaginary realms affords a space to daydream to be moved by sounds, the flow of mental images, virtual sensations, and memory-images that one can associate with traces, dots, shapes or textures. This parallel allows me to find my way within the sonic realm, shaping sound materials and sequences that progressively define a musical structure. This space, which has no proper physical existence, invites sonic and visual perception and imagination to confront, destroy and renew each another, directing the music’s emergence through a feedback loop between the visual and the aural. A key conceptual tool in this practice is the notion of sensory qualia and a blend of phenomenological and ecological views of sound and bodily centered, internally registered responses. By focusing on qualitative sensations derived from drawing, painting and sensations of motion in the natural world, parallels with the sonic imagination are stimulated. The graphical expression of gestures deployed in space and time becomes a space of boundless, imaginative reflection of the composer’s sonic conceptions and expectations

    Individual and Collaborative Semiotic Work in Document Design

    Get PDF
    This article examines the concepts of agency, transformation and transduction in the context of document design. These concepts have been previously used to describe communicative actions and sign-making among individuals: whereas agency focuses on the individual’s capabilities as a sign-maker, transformation and transduction describe how individuals transform meanings within one mode of communication or from one mode to another. Organizational communication, however, is rarely an individual effort, particularly in corporate settings: producing multimodal documents that communicate on behalf of entire organizations, such as annual reports, constitutes a collaborative effort involving a variety of specialists, such as concept planners, copywriters and graphic designers.In the age of increasing specialization, this kind of collaborative semiotic work raises questions about agency, transduction and transformation. In this context, the concepts of agency and transmodality, which emphasize the individual, appear to have reduced explanatory power. This leads to the central question of this article, that is, how can the collaborative design process be captured and how does it affect the multimodal structure of annual reports? By analyzing an annual report published by Finnair and interviewing its designers, this article aims to illuminate the design process and its consequences to the document in question

    Inconsistent Matters: A Knowledge-guided Dual-consistency Network for Multi-modal Rumor Detection

    Full text link
    Rumor spreaders are increasingly utilizing multimedia content to attract the attention and trust of news consumers. Though quite a few rumor detection models have exploited the multi-modal data, they seldom consider the inconsistent semantics between images and texts, and rarely spot the inconsistency among the post contents and background knowledge. In addition, they commonly assume the completeness of multiple modalities and thus are incapable of handling handle missing modalities in real-life scenarios. Motivated by the intuition that rumors in social media are more likely to have inconsistent semantics, a novel Knowledge-guided Dual-consistency Network is proposed to detect rumors with multimedia contents. It uses two consistency detection subnetworks to capture the inconsistency at the cross-modal level and the content-knowledge level simultaneously. It also enables robust multi-modal representation learning under different missing visual modality conditions, using a special token to discriminate between posts with visual modality and posts without visual modality. Extensive experiments on three public real-world multimedia datasets demonstrate that our framework can outperform the state-of-the-art baselines under both complete and incomplete modality conditions. Our codes are available at https://github.com/MengzSun/KDCN

    Early Sydney punk : methods in visual ethnography

    Get PDF
    This thesis explores the recollections of participants who were part of a cohort associated with a small punk venue known as the Grand Hotel, which operated at Railway Square, Sydney, between 1977 and 1979. While Australia’s first-wave moment has been increasingly recognised within a growing body of literature on punk, it has been considered almost exclusively in a music context. This study emphasises the sociality of punk subculture which has been largely absent from the record. The thesis comprises a creative component based on a series of video-recorded interviews, and a written exegesis. The video production, titled Distorted: Reflections on early Sydney punk, was developed through methods drawn from ethnography and other qualitative methodologies. The work presents discussion on a range of social, personal and political concerns of late 1970s Sydney through the reflections of participants. As such, it is a visual ethnography with a research focus on the past and on memory as articulated in a present setting. The written component of the thesis discusses aspects of cultural studies and subcultural theory in relation to punk as experienced in a post-colonial space, which is framed within an analysis of anthropologically-oriented ethnography. The text then discusses in detail the methodological underpinnings of the research. It is here that I advance an approach to audiovisual production which utilises computer assisted data analysis software within an analytical and conceptual framework drawn from grounded theory and narrative analysis

    Learning from Audio, Vision and Language Modalities for Affect Recognition Tasks

    Get PDF
    The world around us as well as our responses to worldly events are multimodal in nature. For intelligent machines to integrate seamlessly into our world, it is imperative that they can process and derive useful information from multimodal signals. Such capabilities can be provided to machines by employing multimodal learning algorithms that consider both the individual characteristics of unimodal signals as well as the complementariness provided by multimodal signals. Based on the number of modalities available during the training and testing phases, learning algorithms can be of three categories: unimodal trained and unimodal tested, multimodal trained and multimodal tested, and multimodal trained and unimodal tested algorithms. This thesis provides three contributions, one for each category and focuses on three modalities that are important for human-human and human-machine communication, namely, audio (paralinguistic speech), vision (facial expressions) and language (linguistic speech) signals. For several applications, either due to hardware limitations or deployment specifications, unimodal trained and tested systems suffice. Our first contribution, for the unimodal trained and unimodal tested category, is an end-to-end deep neural network that uses raw speech signals as input for a computational paralinguistic task, namely, verbal conflict intensity estimation. Our model, which uses a convolutional recurrent architecture equipped with attention mechanism to focus on task-relevant instances of the input speech signal, eliminates the need for task-specific meta data or domain knowledge based manual refinement of hand-crafted generic features. The second contribution, for the multimodal trained and multimodal tested category, is a multimodal fusion framework that exploits both cross (inter) and intra-modal interactions for categorical emotion recognition from audiovisual clips. We explore the effectiveness of two types of attention mechanisms, namely, intra and cross-modal attention by creating two versions of our fusion framework. In many applications, multimodal signals might be available during model training phase, yet we cannot expect the availability of all modality signals during testing phase. Our third contribution addresses this situation wherein we propose a framework for cross-modal learning where paired audio-visual instances are used during training to develop test-time stand-alone unimodal models

    Triangular relationships between commerce, politics and hip-hop : a study of the role of hip-hop in influencing the socio-economic and political landscape in contemporary society

    Get PDF
    A PhD Thesis to the Anthropology Department, Faculty of Humanities: University of the Witwatersrand.This study will argue that; (i) that the evolution of hip-hop arises out of the need by young people to give expression and meaning to their day-to-day socio-political and economic struggles and the harsh realities of urban life, and (ii) that hip-hop has become the audible and dominant voice of reason and a platform that allows youth to address their plight, as active citizens, and (iii) that, as a music expression, the hip-hop narrative can be used as an unsolicited yet resourceful civic perception survey to gauge the temperature and the mood of society at a point in time. My research question is premised on the argument that the youth looks at society and their immediate surroundings through the lens of rap music and the hip-hop culture. It presupposes that it is this hip-hop lens that has become the projector through which the youth views and analyses society and then invites the world to peep through, to confirm and be witnesses to what they see. It is not the purpose of this research to argue how much influence hip-hop has on young people, but instead to look at how youth is using hip-hop to express their discontent and what the various sites are where their relentless desire for a better life is being crafted and articulated. In my investigation, I have argued that it is at these social sites that open or discreet creative expressions are produced/created by the hip-hop generation as the subordinate group and directed to those perceived to be the gatekeepers to their aspirations and their rites of passage. In my investigation I have explored how, out of indignation and desire, the hip-hop generation has employed creative ways to highlight and vent their frustration at a system that seems to derail their aspirations. This is the story of hip-hop where Watkins (2005) argues that the youth have crafted "a vision of their world that is insightful, optimistic and tenaciously critical of the institutions and circumstances that restrict their ability to impact on the world around them" (p. 81) With regard to hip-hop in South Africa critical questions and a central thesis to this paper begin to emerge as to whether hip-hop, as an artistic expression and a seemingly dominant youth culture, has found long-hidden voices through which young people now engage with this art form to address and reflect on their socio-economic and political conditions as active citizens in search of a meaningful social contract. By investigating the triangular relationship between commerce, politics and hip-hop, this study looks at how creative, adaptive people with unrealised potential, who find themselves trapped by illusion and exploitation (realistic or perceived), always try to find a meaning to make sense of their worlds.AC201
    corecore