47 research outputs found

    Data-Driven Audio Feature Space Clustering for Automatic Sound Recognition in Radio Broadcast News

    Get PDF
    This is an Open Access article published by World Scientific Publishing Company. It is distributed under the terms of the Creative Commons Attribution 4.0 (CC-BY) License. Further distribution of this work is permitted, provided the original work is properly cited. T. Theodorou, I. Mpoas, A. Lazaridis, N. Fakotakis, 'Data-Driven Audio Feature Space Clustering for Automatic Sound Recognition in Radio Broadcast News', International Journal on Artificial Intelligence Tools, Vol. 26 (2), April 2017, 1750005 (13 pages), DOI: 10.1142/S021821301750005. © The Author(s).In this paper we describe an automatic sound recognition scheme for radio broadcast news based on principal component clustering with respect to the discrimination ability of the principal components. Specifically, streams of broadcast news transmissions, labeled based on the audio event, are decomposed using a large set of audio descriptors and project into the principal component space. A data-driven algorithm clusters the relevance of the components. The component subspaces are used by sound type classifier. This methodology showed that the k-nearest neighbor and the artificial intelligent network provide good results. Also, this methodology showed that discarding unnecessary dimension works in favor on the outcome, as it hardly deteriorates the effectiveness of the algorithms.Peer reviewe

    SEWA DB: A rich database for audio-visual emotion and sentiment research in the wild

    Get PDF
    Natural human-computer interaction and audio-visual human behaviour sensing systems, which would achieve robust performance in-the-wild are more needed than ever as digital devices are becoming indispensable part of our life more and more. Accurately annotated real-world data are the crux in devising such systems. However, existing databases usually consider controlled settings, low demographic variability, and a single task. In this paper, we introduce the SEWA database of more than 2000 minutes of audio-visual data of 398 people coming from six cultures, 50% female, and uniformly spanning the age range of 18 to 65 years old. Subjects were recorded in two different contexts: while watching adverts and while discussing adverts in a video chat. The database includes rich annotations of the recordings in terms of facial landmarks, facial action units (FAU), various vocalisations, mirroring, and continuously valued valence, arousal, liking, agreement, and prototypic examples of (dis)liking. This database aims to be an extremely valuable resource for researchers in affective computing and automatic human sensing and is expected to push forward the research in human behaviour analysis, including cultural studies. Along with the database, we provide extensive baseline experiments for automatic FAU detection and automatic valence, arousal and (dis)liking intensity estimation

    Interactive Technologies for the Public Sphere Toward a Theory of Critical Creative Technology

    Get PDF
    Digital media cultural practices continue to address the social, cultural and aesthetic contexts of the global information economy, perhaps better called ecology, by inventing new methods and genres that encourage interactive engagement, collaboration, exploration and learning. The theoretical framework for creative critical technology evolved from the confluence of the arts, human computer interaction, and critical theories of technology. Molding this nascent theoretical framework from these seemingly disparate disciplines was a reflexive process where the influence of each component on each other spiraled into the theory and practice as illustrated through the Constructed Narratives project. Research that evolves from an arts perspective encourages experimental processes of making as a method for defining research principles. The traditional reductionist approach to research requires that all confounding variables are eliminated or silenced using methods of statistics. However, that noise in the data, those confounding variables provide the rich context, media, and processes by which creative practices thrive. As research in the arts gains recognition for its contributions of new knowledge, the traditional reductive practice in search of general principles will be respectfully joined by methodologies for defining living principles that celebrate and build from the confounding variables, the data noise. The movement to develop research methodologies from the noisy edges of human interaction have been explored in the research and practices of ludic design and ambiguity (Gaver, 2003); affective gap (Sengers et al., 2005b; 2006); embodied interaction (Dourish, 2001); the felt life (McCarthy & Wright, 2004); and reflective HCI (Dourish, et al., 2004). The theory of critical creative technology examines the relationships between critical theories of technology, society and aesthetics, information technologies and contemporary practices in interaction design and creative digital media. The theory of critical creative technology is aligned with theories and practices in social navigation (Dourish, 1999) and community-based interactive systems (Stathis, 1999) in the development of smart appliances and network systems that support people in engaging in social activities, promoting communication and enhancing the potential for learning in a community-based environment. The theory of critical creative technology amends these community-based and collaborative design theories by emphasizing methods to facilitate face-to-face dialogical interaction when the exchange of ideas, observations, dreams, concerns, and celebrations may be silenced by societal norms about how to engage others in public spaces. The Constructed Narratives project is an experiment in the design of a critical creative technology that emphasizes the collaborative construction of new knowledge about one's lived world through computer-supported collaborative play (CSCP). To construct is to creatively invent one's world by engaging in creative decision-making, problem solving and acts of negotiation. The metaphor of construction is used to demonstrate how a simple artefact - a building block - can provide an interactive platform to support discourse between collaborating participants. The technical goal for this project was the development of a software and hardware platform for the design of critical creative technology applications that can process a dynamic flow of logistical and profile data from multiple users to be used in applications that facilitate dialogue between people in a real-time playful interactive experience

    Recent Advances in Social Data and Artificial Intelligence 2019

    Get PDF
    The importance and usefulness of subjects and topics involving social data and artificial intelligence are becoming widely recognized. This book contains invited review, expository, and original research articles dealing with, and presenting state-of-the-art accounts pf, the recent advances in the subjects of social data and artificial intelligence, and potentially their links to Cyberspace

    Proceedings of the 7th Sound and Music Computing Conference

    Get PDF
    Proceedings of the SMC2010 - 7th Sound and Music Computing Conference, July 21st - July 24th 2010

    Robust Multi-stream Keyword and Non-linguistic Vocalization Detection for Computationally Intelligent Virtual Agents

    No full text
    Systems for keyword and non-linguistic vocalization detection in conversational agent applications need to be robust with respect to background noise and different speaking styles. Focussing on the Sensitive Artificial Listener (SAL) scenario which involves spontaneous, emotionally colored speech, this paper proposes a multi-stream model that applies the principle of Long Short-Term Memory to generate context-sensitive phoneme predictions which can be used for keyword detection. Further, we investigate the incorporation of noisy training material in order to create noise robust acoustic models. We show that both strategies can improve recognition performance when evaluated on spontaneous human-machine conversations as contained in the SEMAINE database
    corecore