21 research outputs found
Making modality: transmodal composing in a digital media studio.
The multiple media that exist for communication have historically been theorized as possessing different available means for persuasion and meaning-making. The exigence of these means has been the object of theoretical debate that ranges from cultural studies, language studies, semiology, and philosophies of the mind. This dissertation contributes to such debates by sharing the results of an ethnographically informed study of multimedia composing in a digital media studio. Drawing from Cultural Historical Activity Theory and theories of enactive perception, I analyze the organizational and infrastructural design of a media studio as well as the activity of composer/designers working in said studio. Throughout this analysis I find that implicit in the organization and infrastructure of the media studio is an ethos of conceptualizing communication technology as a legitimizing force. Such an ethos is troubled by my analysis of composer/designers working in the studio, whose activities do not seek outside legitimization but instead contribute to the media milieu. Following these analyses, I conclude that media’s means for persuasion and meaning-making emerge from local practices of communication and design. Finally, I provide a framework for studying the emergence of such means
Multimodal Sentiment Analysis Based on Deep Learning: Recent Progress
Multimodal sentiment analysis is an important research topic in the field of NLP, aiming to analyze speakers\u27 sentiment tendencies through features extracted from textual, visual, and acoustic modalities. Its main methods are based on machine learning and deep learning. Machine learning-based methods rely heavily on labeled data. But deep learning-based methods can overcome this shortcoming and capture the in-depth semantic information and modal characteristics of the data, as well as the interactive information between multimodal data. In this paper, we survey the deep learning-based methods, including fusion of text and image and fusion of text, image, audio, and video. Specifically, we discuss the main problems of these methods and the future directions. Finally, we review the work of multimodal sentiment analysis in conversation
Multi-form Visualisation: An approach to acousmatic composition
This practice-based doctoral research addresses a critical issue in
acousmatic composition: the journey from the immaterial world of sonic
imagination to the realisation of musical sound. This was an exploratory journey,
where my personal sensibility for visual arts practice met my curiosity and
profound interest in acousmatic music. Methodologically, the project approached
acousmatic composition as an organic process, intertwining visual sensibilities
and musical domains by offering a critical approach to the listening experience
and to my compositional practice. A key metaphor used is that of the blank page
as a space for multi-form visualisation, where gestures derived from sketching
and other visual stimuli are used as guides and catalysts for the realisation of
sound. In this approach, a process of deliberately blurring boundaries between
real and imaginary realms affords a space to daydream to be moved by sounds,
the flow of mental images, virtual sensations, and memory-images that one can
associate with traces, dots, shapes or textures. This parallel allows me to find my
way within the sonic realm, shaping sound materials and sequences that
progressively define a musical structure. This space, which has no proper
physical existence, invites sonic and visual perception and imagination to
confront, destroy and renew each another, directing the music’s emergence
through a feedback loop between the visual and the aural. A key conceptual tool
in this practice is the notion of sensory qualia and a blend of phenomenological
and ecological views of sound and bodily centered, internally registered
responses. By focusing on qualitative sensations derived from drawing, painting
and sensations of motion in the natural world, parallels with the sonic imagination
are stimulated. The graphical expression of gestures deployed in space and time
becomes a space of boundless, imaginative reflection of the composer’s sonic
conceptions and expectations
Individual and Collaborative Semiotic Work in Document Design
This article examines the concepts of agency, transformation and transduction in the context of document design. These concepts have been previously used to describe communicative actions and sign-making among individuals: whereas agency focuses on the individual’s capabilities as a sign-maker, transformation and transduction describe how individuals transform meanings within one mode of communication or from one mode to another. Organizational communication, however, is rarely an individual effort, particularly in corporate settings: producing multimodal documents that communicate on behalf of entire organizations, such as annual reports, constitutes a collaborative effort involving a variety of specialists, such as concept planners, copywriters and graphic designers.In the age of increasing specialization, this kind of collaborative semiotic work raises questions about agency, transduction and transformation. In this context, the concepts of agency and transmodality, which emphasize the individual, appear to have reduced explanatory power. This leads to the central question of this article, that is, how can the collaborative design process be captured and how does it affect the multimodal structure of annual reports? By analyzing an annual report published by Finnair and interviewing its designers, this article aims to illuminate the design process and its consequences to the document in question
Inconsistent Matters: A Knowledge-guided Dual-consistency Network for Multi-modal Rumor Detection
Rumor spreaders are increasingly utilizing multimedia content to attract the
attention and trust of news consumers. Though quite a few rumor detection
models have exploited the multi-modal data, they seldom consider the
inconsistent semantics between images and texts, and rarely spot the
inconsistency among the post contents and background knowledge. In addition,
they commonly assume the completeness of multiple modalities and thus are
incapable of handling handle missing modalities in real-life scenarios.
Motivated by the intuition that rumors in social media are more likely to have
inconsistent semantics, a novel Knowledge-guided Dual-consistency Network is
proposed to detect rumors with multimedia contents. It uses two consistency
detection subnetworks to capture the inconsistency at the cross-modal level and
the content-knowledge level simultaneously. It also enables robust multi-modal
representation learning under different missing visual modality conditions,
using a special token to discriminate between posts with visual modality and
posts without visual modality. Extensive experiments on three public real-world
multimedia datasets demonstrate that our framework can outperform the
state-of-the-art baselines under both complete and incomplete modality
conditions. Our codes are available at https://github.com/MengzSun/KDCN
Early Sydney punk : methods in visual ethnography
This thesis explores the recollections of participants who were part of a cohort associated with a small punk venue known as the Grand Hotel, which operated at Railway Square, Sydney, between 1977 and 1979. While Australia’s first-wave moment has been increasingly recognised within a growing body of literature on punk, it has been considered almost exclusively in a music context. This study emphasises the sociality of punk subculture which has been largely absent from the record. The thesis comprises a creative component based on a series of video-recorded interviews, and a written exegesis. The video production, titled Distorted: Reflections on early Sydney punk, was developed through methods drawn from ethnography and other qualitative methodologies. The work presents discussion on a range of social, personal and political concerns of late 1970s Sydney through the reflections of participants. As such, it is a visual ethnography with a research focus on the past and on memory as articulated in a present setting. The written component of the thesis discusses aspects of cultural studies and subcultural theory in relation to punk as experienced in a post-colonial space, which is framed within an analysis of anthropologically-oriented ethnography. The text then discusses in detail the methodological underpinnings of the research. It is here that I advance an approach to audiovisual production which utilises computer assisted data analysis software within an analytical and conceptual framework drawn from grounded theory and narrative analysis
Learning from Audio, Vision and Language Modalities for Affect Recognition Tasks
The world around us as well as our responses to worldly events are multimodal in nature. For intelligent machines to integrate seamlessly into our world, it is imperative that they can process and derive useful information from multimodal signals. Such capabilities can be provided to machines by employing multimodal learning algorithms that consider both the individual characteristics of unimodal signals as well as the complementariness provided by multimodal signals. Based on the number of modalities available during the training and testing phases, learning algorithms can be of three categories: unimodal trained and unimodal tested, multimodal trained and multimodal tested, and multimodal trained and unimodal tested algorithms. This thesis provides three contributions, one for each category and focuses on three modalities that are important for human-human and human-machine communication, namely, audio (paralinguistic speech), vision (facial expressions) and language (linguistic speech) signals. For several applications, either due to hardware limitations or deployment specifications, unimodal trained and tested systems suffice. Our first contribution, for the unimodal trained and unimodal tested category, is an end-to-end deep neural network that uses raw speech signals as input for a computational paralinguistic task, namely, verbal conflict intensity estimation. Our model, which uses a convolutional recurrent architecture equipped with attention mechanism to focus on task-relevant instances of the input speech signal, eliminates the need for task-specific meta data or domain knowledge based manual refinement of hand-crafted generic features. The second contribution, for the multimodal trained and multimodal tested category, is a multimodal fusion framework that exploits both cross (inter) and intra-modal interactions for categorical emotion recognition from audiovisual clips. We explore the effectiveness of two types of attention mechanisms, namely, intra and cross-modal attention by creating two versions of our fusion framework. In many applications, multimodal signals might be available during model training phase, yet we cannot expect the availability of all modality signals during testing phase. Our third contribution addresses this situation wherein we propose a framework for cross-modal learning where paired audio-visual instances are used during training to develop test-time stand-alone unimodal models
Recommended from our members
The Electroacoustic and its Double: Duality and Dramaturgy in Live Performance
Live electroacoustic performance juxtaposes and superimposes two main elements: the real, present and physical, against the simulated and disembodied. In this sense, it is a liminal form which negotiates two different worlds on stage. In this dissertation I will address some central aspects of performance that have been reshaped and problematised by the use of the electroacoustic medium in a live context.
I will investigate in particular three main dualities: the performer's body / electroacoustic sound; physical space / electroacoustic space; and performance / audience. I will also discuss a generalised duality common to all three: presence / absence. Rather than regarding these dualities as indicators of discontinuity, I will suggest that they can help develop a continuum of connections and relationships between performance elements. These connections can be designed as part of the composition process.
By investigating these dualities, this research addresses the main elements of the live event. The central guiding principle here is that the live electroacoustic mode is a performance discipline, and therefore requires a dramaturgical approach that takes into consideration the elements of the live event: performer, audience and use of space. I will suggest that such an approach should guide the creative process, starting at the initial composition stages, through rehearsal and the actual performance
Triangular relationships between commerce, politics and hip-hop : a study of the role of hip-hop in influencing the socio-economic and political landscape in contemporary society
A PhD Thesis to the Anthropology Department,
Faculty of Humanities: University of the
Witwatersrand.This study will argue that; (i) that the evolution of hip-hop arises out of the
need by young people to give expression and meaning to their day-to-day
socio-political and economic struggles and the harsh realities of urban life,
and (ii) that hip-hop has become the audible and dominant voice of reason
and a platform that allows youth to address their plight, as active citizens, and
(iii) that, as a music expression, the hip-hop narrative can be used as an
unsolicited yet resourceful civic perception survey to gauge the temperature
and the mood of society at a point in time.
My research question is premised on the argument that the youth looks at
society and their immediate surroundings through the lens of rap music and
the hip-hop culture. It presupposes that it is this hip-hop lens that has become
the projector through which the youth views and analyses society and then
invites the world to peep through, to confirm and be witnesses to what they
see.
It is not the purpose of this research to argue how much influence hip-hop has
on young people, but instead to look at how youth is using hip-hop to express
their discontent and what the various sites are where their relentless desire for
a better life is being crafted and articulated. In my investigation, I have argued
that it is at these social sites that open or discreet creative expressions are
produced/created by the hip-hop generation as the subordinate group and
directed to those perceived to be the gatekeepers to their aspirations and their
rites of passage. In my investigation I have explored how, out of indignation
and desire, the hip-hop generation has employed creative ways to highlight
and vent their frustration at a system that seems to derail their aspirations.
This is the story of hip-hop where Watkins (2005) argues that the youth have
crafted "a vision of their world that is insightful, optimistic and tenaciously
critical of the institutions and circumstances that restrict their ability to impact
on the world around them" (p. 81)
With regard to hip-hop in South Africa critical questions and a central thesis to
this paper begin to emerge as to whether hip-hop, as an artistic expression
and a seemingly dominant youth culture, has found long-hidden voices
through which young people now engage with this art form to address and
reflect on their socio-economic and political conditions as active citizens in
search of a meaningful social contract.
By investigating the triangular relationship between commerce, politics and
hip-hop, this study looks at how creative, adaptive people with unrealised
potential, who find themselves trapped by illusion and exploitation (realistic or
perceived), always try to find a meaning to make sense of their worlds.AC201