1,783 research outputs found

    Integrating Voice-Based Machine Learning Technology into Complex Home Environments

    Full text link
    To demonstrate the value of machine learning based smart health technologies, researchers have to deploy their solutions into complex real-world environments with real participants. This gives rise to many, oftentimes unexpected, challenges for creating technology in a lab environment that will work when deployed in real home environments. In other words, like more mature disciplines, we need solutions for what can be done at development time to increase success at deployment time. To illustrate an approach and solutions, we use an example of an ongoing project that is a pipeline of voice based machine learning solutions that detects the anger and verbal conflicts of the participants. For anonymity, we call it the XYZ system. XYZ is a smart health technology because by notifying the participants of their anger, it encourages the participants to better manage their emotions. This is important because being able to recognize one's emotions is the first step to better managing one's anger. XYZ was deployed in 6 homes for 4 months each and monitors the emotion of the caregiver of a dementia patient. In this paper we demonstrate some of the necessary steps to be accomplished during the development stage to increase deployment time success, and show where continued work is still necessary. Note that the complex environments arise both from the physical world and from complex human behavior

    Composing affect: reflection on configurations of body, sound and technology in contemporary South African performance

    Get PDF
    This thesis engages with experiential performance modes through the lenses of phenomenology and affect theory. Because experiential performance relies per definition on personal, subjective ‘experience’, specific responses cannot be anticipated. However, by attempting to compose ‘affect’, a performance has the potential to ‘move’ an attendant towards response. Deleuze and Guattari define ‘affect’ as “an ability to affect and be affected….a prepersonal intensity corresponding to the passage from one experiential state of the body to another and implying an augmentation or diminution in that body’s capacity to act” (1987: xvi). One current strategy for manifesting affect in performance seems to be the ways in which different configurations of body, sound and technology are employed. The body is the means through which sound is received or ‘experienced’ in the phenomenological sense, but it can also act as a source for sonic material. The body is furthermore the means by which sonic technology is manipulated. It is the complex, reverberating relationships between body, sound and technology, and their potential for eliciting affective transformation, which is the focus of my enquiry. In the first chapter I unpack the roles of the natural phenomena, body and sound, and their complex relationships to affect. The chapter serves as philosophical basis for the rest of the investigation, and draws largely on works by philosophers Susan Kozel, Maurice Merleau-Ponty, Brian Massumi, Gille Deleuze and Félix Guatarri and sound theorists Don Ihde, Marshall McLuhan, Brandon LaBelle and Frances Dyson.In the remaining three chapters I discuss current South African theatre works that employ the strategy of placing emphasis on sound, sonic technology, and its relationship to the human body. These works are my own piece herTz (2014), Jaco Bouwer’s pieces Samsa-masjien (2014) and Na-aap (2013), and First Physical Theatre Company’s Everyday Falling (2010). While they range from being plays to physical theatre performances to performative experiments, they all place specific emphasis on sonic devices, drawing attention to sound by revealing microphones, speakers, midi boards, etc. to the attendants, and including the generation and manipulation of sound in the action of the performance

    The listening talker: A review of human and algorithmic context-induced modifications of speech

    Get PDF
    International audienceSpeech output technology is finding widespread application, including in scenarios where intelligibility might be compromised - at least for some listeners - by adverse conditions. Unlike most current algorithms, talkers continually adapt their speech patterns as a response to the immediate context of spoken communication, where the type of interlocutor and the environment are the dominant situational factors influencing speech production. Observations of talker behaviour can motivate the design of more robust speech output algorithms. Starting with a listener-oriented categorisation of possible goals for speech modification, this review article summarises the extensive set of behavioural findings related to human speech modification, identifies which factors appear to be beneficial, and goes on to examine previous computational attempts to improve intelligibility in noise. The review concludes by tabulating 46 speech modifications, many of which have yet to be perceptually or algorithmically evaluated. Consequently, the review provides a roadmap for future work in improving the robustness of speech output

    Sound archaeology: terminology, Palaeolithic cave art and the soundscape

    Get PDF
    This article is focused on the ways that terminology describing the study of music and sound within archaeology has changed over time, and how this reflects developing methodologies, exploring the expectations and issues raised by the use of differing kinds of language to define and describe such work. It begins with a discussion of music archaeology, addressing the problems of using the term ‘music’ in an archaeological context. It continues with an examination of archaeoacoustics and acoustics, and an emphasis on sound rather than music. This leads on to a study of sound archaeology and soundscapes, pointing out that it is important to consider the complete acoustic ecology of an archaeological site, in order to identify its affordances, those possibilities offered by invariant acoustic properties. Using a case study from northern Spain, the paper suggests that all of these methodological approaches have merit, and that a project benefits from their integration

    Best Practices for Noise-Based Augmentation to Improve the Performance of Deployable Speech-Based Emotion Recognition Systems

    Full text link
    Speech emotion recognition is an important component of any human centered system. But speech characteristics produced and perceived by a person can be influenced by a multitude of reasons, both desirable such as emotion, and undesirable such as noise. To train robust emotion recognition models, we need a large, yet realistic data distribution, but emotion datasets are often small and hence are augmented with noise. Often noise augmentation makes one important assumption, that the prediction label should remain the same in presence or absence of noise, which is true for automatic speech recognition but not necessarily true for perception based tasks. In this paper we make three novel contributions. We validate through crowdsourcing that the presence of noise does change the annotation label and hence may alter the original ground truth label. We then show how disregarding this knowledge and assuming consistency in ground truth labels propagates to downstream evaluation of ML models, both for performance evaluation and robustness testing. We end the paper with a set of recommendations for noise augmentations in speech emotion recognition datasets

    Traveling Yellow Peril: Race, Gender, and Empire in Japan's English Teaching Industry

    Get PDF
    Contemporary U.S. white migrants working in Japan long-term as English teachers find themselves in an increasingly precarious labor market. When reacting to industry flexibilization, the U.S. men I interviewed during two years of fieldwork in Nagoya regularly invoked Filipina competition as an impending threat to their livelihoods. Anxieties coalesced around the question of whether racialized postcolonial subjects can fully inhabit the category of "native English teacher." This essay combines Asian American, postcolonial, and transnational American Studies perspectives to situate these "nativist" logics within an historical trajectory of anti-Asian labor backlash in the United States and "benevolent assimilation" policies in the Philippines. These histories reappear within Japan's neoliberal labor regimes to position Filipina migrants as a feminized "yellow peril" menace to hegemonic white masculinities abroad. Extending Homi Bhabha's theories, the essay demonstrates how Filipina "colonial mimicry" undermines the embodied, linguistic authority of white "native" English teachers and becomes a discursive conduit for the transplantation into Japan of the "white male victim" figure commonly seen in domestic U.S. culture wars

    SALSA: A Novel Dataset for Multimodal Group Behavior Analysis

    Get PDF
    Studying free-standing conversational groups (FCGs) in unstructured social settings (e.g., cocktail party ) is gratifying due to the wealth of information available at the group (mining social networks) and individual (recognizing native behavioral and personality traits) levels. However, analyzing social scenes involving FCGs is also highly challenging due to the difficulty in extracting behavioral cues such as target locations, their speaking activity and head/body pose due to crowdedness and presence of extreme occlusions. To this end, we propose SALSA, a novel dataset facilitating multimodal and Synergetic sociAL Scene Analysis, and make two main contributions to research on automated social interaction analysis: (1) SALSA records social interactions among 18 participants in a natural, indoor environment for over 60 minutes, under the poster presentation and cocktail party contexts presenting difficulties in the form of low-resolution images, lighting variations, numerous occlusions, reverberations and interfering sound sources; (2) To alleviate these problems we facilitate multimodal analysis by recording the social interplay using four static surveillance cameras and sociometric badges worn by each participant, comprising the microphone, accelerometer, bluetooth and infrared sensors. In addition to raw data, we also provide annotations concerning individuals' personality as well as their position, head, body orientation and F-formation information over the entire event duration. Through extensive experiments with state-of-the-art approaches, we show (a) the limitations of current methods and (b) how the recorded multiple cues synergetically aid automatic analysis of social interactions. SALSA is available at http://tev.fbk.eu/salsa.Comment: 14 pages, 11 figure

    On Distant Speech Recognition for Home Automation

    No full text
    The official version of this draft is available at Springer via http://dx.doi.org/10.1007/978-3-319-16226-3_7International audienceIn the framework of Ambient Assisted Living, home automation may be a solution for helping elderly people living alone at home. This study is part of the Sweet-Home project which aims at developing a new home automation system based on voice command to improve support and well-being of people in loss of autonomy. The goal of the study is vocal order recognition with a focus on two aspects: distance speech recognition and sentence spotting. Several ASR techniques were evaluated on a realistic corpus acquired in a 4-room flat equipped with microphones set in the ceiling. This distant speech French corpus was recorded with 21 speakers who acted scenarios of activities of daily living. Techniques acting at the decoding stage, such as our novel approach called Driven Decoding Algorithm (DDA), gave better speech recognition results than the baseline and other approaches. This solution which uses the two best SNR channels and a priori knowledge (voice commands and distress sentences) has demonstrated an increase in recognition rate without introducing false alarms
    corecore