Search CORE

1,783 research outputs found

Integrating Voice-Based Machine Learning Technology into Complex Home Environments

Author: Gao Ye
Gordon Kristina
Jabbour Jason
Kim Sooyoung
Ko Eunjung
Ma Meiyi
Rose Karen
Stankovic John
Wang Hongning
Wang Zetao
Wijayasingha Lahiru Nuwan
Publication venue
Publication date: 06/11/2022
Field of study

To demonstrate the value of machine learning based smart health technologies, researchers have to deploy their solutions into complex real-world environments with real participants. This gives rise to many, oftentimes unexpected, challenges for creating technology in a lab environment that will work when deployed in real home environments. In other words, like more mature disciplines, we need solutions for what can be done at development time to increase success at deployment time. To illustrate an approach and solutions, we use an example of an ongoing project that is a pipeline of voice based machine learning solutions that detects the anger and verbal conflicts of the participants. For anonymity, we call it the XYZ system. XYZ is a smart health technology because by notifying the participants of their anger, it encourages the participants to better manage their emotions. This is important because being able to recognize one's emotions is the first step to better managing one's anger. XYZ was deployed in 6 homes for 4 months each and monitors the emotion of the caregiver of a dementia patient. In this paper we demonstrate some of the necessary steps to be accomplished during the development stage to increase deployment time success, and show where continued work is still necessary. Note that the complex environments arise both from the physical world and from complex human behavior

arXiv.org e-Print Archive

Composing affect: reflection on configurations of body, sound and technology in contemporary South African performance

Author: Cilliers Ilana
Publication venue: Faculty of Humanities, Drama
Publication date: 01/01/2016
Field of study

This thesis engages with experiential performance modes through the lenses of phenomenology and affect theory. Because experiential performance relies per definition on personal, subjective ‘experience’, specific responses cannot be anticipated. However, by attempting to compose ‘affect’, a performance has the potential to ‘move’ an attendant towards response. Deleuze and Guattari define ‘affect’ as “an ability to affect and be affected….a prepersonal intensity corresponding to the passage from one experiential state of the body to another and implying an augmentation or diminution in that body’s capacity to act” (1987: xvi). One current strategy for manifesting affect in performance seems to be the ways in which different configurations of body, sound and technology are employed. The body is the means through which sound is received or ‘experienced’ in the phenomenological sense, but it can also act as a source for sonic material. The body is furthermore the means by which sonic technology is manipulated. It is the complex, reverberating relationships between body, sound and technology, and their potential for eliciting affective transformation, which is the focus of my enquiry. In the first chapter I unpack the roles of the natural phenomena, body and sound, and their complex relationships to affect. The chapter serves as philosophical basis for the rest of the investigation, and draws largely on works by philosophers Susan Kozel, Maurice Merleau-Ponty, Brian Massumi, Gille Deleuze and Félix Guatarri and sound theorists Don Ihde, Marshall McLuhan, Brandon LaBelle and Frances Dyson.In the remaining three chapters I discuss current South African theatre works that employ the strategy of placing emphasis on sound, sonic technology, and its relationship to the human body. These works are my own piece herTz (2014), Jaco Bouwer’s pieces Samsa-masjien (2014) and Na-aap (2013), and First Physical Theatre Company’s Everyday Falling (2010). While they range from being plays to physical theatre performances to performative experiments, they all place specific emphasis on sonic devices, drawing attention to sound by revealing microphones, speakers, midi boards, etc. to the attendants, and including the generation and manipulation of sound in the action of the performance

South East Academic Libraries System (SEALS)

Rhodes Repository (SEALS)

The listening talker: A review of human and algorithmic context-induced modifications of speech

Author: Adriaans
Albin
Alcántara
Andruski
ANSI S3.5-1997
Arai
Assmann
Assmann
Aubanel
Aubanel
Aubanel
Babel
Babel
Bailly
Baran
Barker
Batliner
Beautemps
Beckford Wassink
Beckman
Beckman
Bele
Bell
Benoit
Best
Biersack
Bird
Blamey
Boike
Bond
Bond
Bond
Boril
Bradlow
Bradlow
Bradlow
Bradlow
Branigan
Bregman
Bronkhorst
Brungart
Brungart
Brunskog
Burnham
Burnham
Burnham
Burnham
Castellanos
Chen
Cheskin
Cheyne
Chládková
Chung
Church
Cole
Cooke
Cooke
Cooke
Cooke
Cooke
Cooke
Cooper
Cooper
Cox
Cox
Cristia
Cristià
Cutler
Darwin
Dau
Davis
Davis
Dejonckere
Delvaux
Dodane
Dreher
Dudley
Dunst
Egan
Englund
Eriksson
Erting
Estival
Falk
Farris
Ferguson
Ferguson
Fernald
Fernald
Fernald
Fernald
Fernald
Field
Fisher
Fisher
Fitzpatrick
Floccia
Fogerty
Fogerty
Fowler
Fowler
Freed
Fux
Fux
Fux
Gagne
Gagne
Gagne
Galati
Garnier
Garnier
Garnier
Garnier
Garnier
Garnier
Garnier
Garrod
Giles
Goldwater
Golinkoff
Golinkoff
Gordon-Salant
Granlund
Granlund
Green
Grieser
Hawley
Hazan
Hazan
Hazan
Hazan
Healey
Helfer
Helfer
Hornsby
Horwitz
Howell
Imaizumi
Imaizumi
Ishizuka
Janarthanam
Johnson
Jun
Jung
Junqua
Junqua
Junqua
Kadiri
Kang
Kaplan
Kappes
Kawahara
Kewley-Port
Kim
Kim
Kirchhoff
Kitamura
Kitamura
Kondaurova
Kondaurova
Korn
Krause
Krause
Krause
Krause
Krause
Kretsinger
Kryter
Kuhl
Kusumoto
Lam
Lane
Laures
Laures
Lee
Lienard
Lindblom
Lindblom
Little
Liu
Liu
Liu
Lombard
Long
Long
Lu
Lu
Lu
Malsheen
Maniwa
Marin
Martin Cooke
Masataka
Matthies
Mattys
Mattys
Mattys
Maye
Maye
Mayo
Maëva Garnier
Metz
Michael
Miller
Mokbel
Monsen
Montgomery
Moon
Moon
Moore
Moore
Moulines
Naoi
Natale
Nejime
Newport
Niederjohn
Niwano
Niwano
Ostroff
Oviatt
Owren
Papoušek
Papoušek
Papoušek
Pardo
Patel
Patel
Payne
Payton
Pegg
Pelegrín-García
Perkell
Petkov
Peutz
Phillips
Picheny
Picheny
Picheny
Pickering
Pickett
Pickett
Pisoni
Pittman
Pollack
Pucher
Pye
Rasetshwane
Ratner
Ratner
Ratner
Rieser
Rogers
Rostolland
Rostolland
Ryan
Räsänen
Sachs
Sankowska
Sauert
Scarborough
Schmitt
Schulman
Schum
Shimron
Simon King
Sims
Singh
Skowronski
Smiljanic
Smith
Snow
Song
Stanton
Stern
Stilp
Stylianou
Summers
Summers
Sundberg
Sundberg
Sundberg
Suni
Synnestvedt
Taal
Taal
Tang
Tang
Tang
Tartter
Ternström
Thanavisuth
Titze
Torick
Trainor
Trainor
Traunmuller
Uchanski
Uchanski
Uther
Valentini-Botinhao
Valentini-Botinhao
Valian
Valian
van de Weijer
van Rooij
Vatikiotis-Bateson
Villegas
Vincent Aubanel
Vitevitch
Wang
Warner
Warren
Watson
Webster
Welby
Welby
Werker
World Health Organisation
Xu
Xu
Yamagishi
Yang
Yoo
Zajdó
Zampini
Zangl
Zhao
Zipf
Zorilă
Publication venue: 'Elsevier BV'
Publication date: 01/01/2014
Field of study

International audienceSpeech output technology is finding widespread application, including in scenarios where intelligibility might be compromised - at least for some listeners - by adverse conditions. Unlike most current algorithms, talkers continually adapt their speech patterns as a response to the immediate context of spoken communication, where the type of interlocutor and the environment are the dominant situational factors influencing speech production. Observations of talker behaviour can motivate the design of more robust speech output algorithms. Starting with a listener-oriented categorisation of possible goals for speech modification, this review article summarises the extensive set of behavioural findings related to human speech modification, identifies which factors appear to be beneficial, and goes on to examine previous computational attempts to improve intelligibility in noise. The review concludes by tabulating 46 speech modifications, many of which have yet to be perceptually or algorithmically evaluated. Consequently, the review provides a roadmap for future work in improving the robustness of speech output

Crossref

Hal - Université Grenoble Alpes

Edinburgh Research Explorer

Western Sydney ResearchDirect

Sound archaeology: terminology, Palaeolithic cave art and the soundscape

Author: Arias P.
Attali J.
Attali J.
Babbit M.
Bahn P.
Barrett J.
Blesser B.
Cage J.
Chernoff J. M.
Chion M.
Cresswell T.
Cross I.
Dauvois M.
Ehrenreich B.
Frith S.
Hickmann E.
Kassabian A.
Kerman J.
Lund C.
Lund C.
Lund C.
McClary S.
Moberg C.-A.
Murray Schafer R.
Nettl B.
Ontañón R.
Pedelty M.
Pettitt P.
Reznikoff I.
Rindel J. H.
Rupert Till
Russolo L.
Scarre C.
Scruton R.
Subotnik R. R.
Subotnik R. R.
Till R.
Till R.
Truax B.
Tuan Y.-F.
Watson A.
Watson A.
Publication venue: 'Informa UK Limited'
Publication date: 01/01/2014
Field of study

This article is focused on the ways that terminology describing the study of music and sound within archaeology has changed over time, and how this reflects developing methodologies, exploring the expectations and issues raised by the use of differing kinds of language to define and describe such work. It begins with a discussion of music archaeology, addressing the problems of using the term ‘music’ in an archaeological context. It continues with an examination of archaeoacoustics and acoustics, and an emphasis on sound rather than music. This leads on to a study of sound archaeology and soundscapes, pointing out that it is important to consider the complete acoustic ecology of an archaeological site, in order to identify its affordances, those possibilities offered by invariant acoustic properties. Using a case study from northern Spain, the paper suggests that all of these methodological approaches have merit, and that a project benefits from their integration

Crossref

University of Huddersfield Repository

Huddersfield Research Portal

Best Practices for Noise-Based Augmentation to Improve the Performance of Deployable Speech-Based Emotion Recognition Systems

Author: Jaiswal Mimansa
Provost Emily Mower
Publication venue
Publication date: 31/08/2023
Field of study

Speech emotion recognition is an important component of any human centered system. But speech characteristics produced and perceived by a person can be influenced by a multitude of reasons, both desirable such as emotion, and undesirable such as noise. To train robust emotion recognition models, we need a large, yet realistic data distribution, but emotion datasets are often small and hence are augmented with noise. Often noise augmentation makes one important assumption, that the prediction label should remain the same in presence or absence of noise, which is true for automatic speech recognition but not necessarily true for perception based tasks. In this paper we make three novel contributions. We validate through crowdsourcing that the presence of noise does change the annotation label and hence may alter the original ground truth label. We then show how disregarding this knowledge and assuming consistency in ground truth labels propagates to downstream evaluation of ML models, both for performance evaluation and robustness testing. We end the paper with a set of recommendations for noise augmentations in speech emotion recognition datasets

arXiv.org e-Print Archive

Using Virtual Reality to Evaluate the Impact of Room Acoustics on Cognitive Performance and Well-Being

Author: Baumann Oliver
Birt James R.
Doggett Rachel
Ottley Matthew
Sander Elizabeth J
Publication venue: 'Frontiers Media SA'
Publication date: 12/04/2021
Field of study

Bond University Research Portal

Traveling Yellow Peril: Race, Gender, and Empire in Japan's English Teaching Industry

Author: Owens Christina D.
Publication venue: 'Project Muse'
Publication date: 01/01/2017
Field of study

Contemporary U.S. white migrants working in Japan long-term as English teachers find themselves in an increasingly precarious labor market. When reacting to industry flexibilization, the U.S. men I interviewed during two years of fieldwork in Nagoya regularly invoked Filipina competition as an impending threat to their livelihoods. Anxieties coalesced around the question of whether racialized postcolonial subjects can fully inhabit the category of "native English teacher." This essay combines Asian American, postcolonial, and transnational American Studies perspectives to situate these "nativist" logics within an historical trajectory of anti-Asian labor backlash in the United States and "benevolent assimilation" policies in the Philippines. These histories reappear within Japan's neoliberal labor regimes to position Filipina migrants as a feminized "yellow peril" menace to hegemonic white masculinities abroad. Extending Homi Bhabha's theories, the essay demonstrates how Filipina "colonial mimicry" undermines the embodied, linguistic authority of white "native" English teachers and becomes a discursive conduit for the transplantation into Japan of the "white male victim" figure commonly seen in domestic U.S. culture wars

The University of Kansas: Journals@KU

Biodiversity Informatics

SALSA: A Novel Dataset for Multimodal Group Behavior Analysis

Author: Alameda-Pineda Xavier
Batrinca Ligia
Lanz Oswald
Lepri Bruno
Ricci Elisa
Sebe Nicu
Staiano Jacopo
Subramanian Ramanathan
Publication venue
Publication date: 23/06/2015
Field of study

Studying free-standing conversational groups (FCGs) in unstructured social settings (e.g., cocktail party ) is gratifying due to the wealth of information available at the group (mining social networks) and individual (recognizing native behavioral and personality traits) levels. However, analyzing social scenes involving FCGs is also highly challenging due to the difficulty in extracting behavioral cues such as target locations, their speaking activity and head/body pose due to crowdedness and presence of extreme occlusions. To this end, we propose SALSA, a novel dataset facilitating multimodal and Synergetic sociAL Scene Analysis, and make two main contributions to research on automated social interaction analysis: (1) SALSA records social interactions among 18 participants in a natural, indoor environment for over 60 minutes, under the poster presentation and cocktail party contexts presenting difficulties in the form of low-resolution images, lighting variations, numerous occlusions, reverberations and interfering sound sources; (2) To alleviate these problems we facilitate multimodal analysis by recording the social interplay using four static surveillance cameras and sociometric badges worn by each participant, comprising the microphone, accelerometer, bluetooth and infrared sensors. In addition to raw data, we also provide annotations concerning individuals' personality as well as their position, head, body orientation and F-formation information over the entire event duration. Through extensive experiments with state-of-the-art approaches, we show (a) the limitations of current methods and (b) how the recorded multiple cues synergetically aid automatic analysis of social interactions. SALSA is available at http://tev.fbk.eu/salsa.Comment: 14 pages, 11 figure

arXiv.org e-Print Archive

Crossref

Archivio della ricerca - Fondazione Bruno Kessler

University of Canberra Research Repository

On Distant Speech Recognition for Home Automation

Author: A Baba
B Lecouteux
B Vlasenko
D Istrate
F Mäyrä
F Portet
G Filho
J Barker
J Fozard
JM Valin
K McCoy
K McCoy
K Reidel
L Baeckman
L Lines
M Chan
M Hamill
M Vacher
M Vacher
M Vacher
M Wölfel
MK Wolters
N Takeda
P Chahuara
P Mueller
P Nocera
R López-Cózar
RC Vipperla
S Bouakaz
S Katz
T Koskela
T Pellegrini
W Edwards
W Ryan
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 02/02/2015
Field of study

The official version of this draft is available at Springer via http://dx.doi.org/10.1007/978-3-319-16226-3_7International audienceIn the framework of Ambient Assisted Living, home automation may be a solution for helping elderly people living alone at home. This study is part of the Sweet-Home project which aims at developing a new home automation system based on voice command to improve support and well-being of people in loss of autonomy. The goal of the study is vocal order recognition with a focus on two aspects: distance speech recognition and sentence spotting. Several ASR techniques were evaluated on a realistic corpus acquired in a 4-room flat equipped with microphones set in the ceiling. This distant speech French corpus was recorded with 21 speakers who acted scenarios of activities of daily living. Techniques acting at the decoding stage, such as our novel approach called Driven Decoding Algorithm (DDA), gave better speech recognition results than the baseline and other approaches. This solution which uses the two best SNR channels and a priori knowledge (voice commands and distress sentences) has demonstrated an increase in recognition rate without introducing false alarms

Crossref

Hal - Université Grenoble Alpes