Search CORE

1,077 research outputs found

CoDi: Leveraging Compatibility and Diversity in Computational Mashup Creation from Large Loop Collections

Author: Gonçalo Nuno Botelho Amaral Rolão Bernardo
Publication venue
Publication date: 14/07/2021
Field of study

Repositório Aberto da Universidade do Porto

A multi-modal perception based assistive robotic system for the elderly

Author: Boudet Blandine
Ferrané Isabelle
Lerasle Frédéric
Mekonnen Alhayat Ali
Mollaret Christophe
Pinquier Julien
Rumeau Pierre
Publication venue: 'Elsevier BV'
Publication date: 01/01/2016
Field of study

Edited by Giovanni Maria Farinella, Takeo Kanade, Marco Leo, Gerard G. Medioni, Mohan TrivediInternational audienceIn this paper, we present a multi-modal perception based framework to realize a non-intrusive domestic assistive robotic system. It is non-intrusive in that it only starts interaction with a user when it detects the user's intention to do so. All the robot's actions are based on multi-modal perceptions which include user detection based on RGB-D data, user's intention-for-interaction detection with RGB-D and audio data, and communication via user distance mediated speech recognition. The utilization of multi-modal cues in different parts of the robotic activity paves the way to successful robotic runs (94% success rate). Each presented perceptual component is systematically evaluated using appropriate dataset and evaluation metrics. Finally the complete system is fully integrated on the PR2 robotic platform and validated through system sanity check runs and user studies with the help of 17 volunteer elderly participants

Scientific Publications of the University of Toulouse II Le Mirail

HAL-Inserm

Open Archive Toulouse Archive Ouverte

HAL-INSA Toulouse

Automated Semantic Understanding of Human Emotions in Writing and Speech

Author: Calix Ricardo A.
Publication venue: LSU Digital Commons
Publication date: 01/01/2011
Field of study

Affective Human Computer Interaction (A-HCI) will be critical for the success of new technologies that will prevalent in the 21st century. If cell phones and the internet are any indication, there will be continued rapid development of automated assistive systems that help humans to live better, more productive lives. These will not be just passive systems such as cell phones, but active assistive systems like robot aides in use in hospitals, homes, entertainment room, office, and other work environments. Such systems will need to be able to properly deduce human emotional state before they determine how to best interact with people. This dissertation explores and extends the body of knowledge related to Affective HCI. New semantic methodologies are developed and studied for reliable and accurate detection of human emotional states and magnitudes in written and spoken speech; and for mapping emotional states and magnitudes to 3-D facial expression outputs. The automatic detection of affect in language is based on natural language processing and machine learning approaches. Two affect corpora were developed to perform this analysis. Emotion classification is performed at the sentence level using a step-wise approach which incorporates sentiment flow and sentiment composition features. For emotion magnitude estimation, a regression model was developed to predict evolving emotional magnitude of actors. Emotional magnitudes at any point during a story or conversation are determined by 1) previous emotional state magnitude; 2) new text and speech inputs that might act upon that state; and 3) information about the context the actors are in. Acoustic features are also used to capture additional information from the speech signal. Evaluation of the automatic understanding of affect is performed by testing the model on a testing subset of the newly extended corpus. To visualize actor emotions as perceived by the system, a methodology was also developed to map predicted emotion class magnitudes to 3-D facial parameters using vertex-level mesh morphing. The developed sentence level emotion state detection approach achieved classification accuracies as high as 71% for the neutral vs. emotion classification task in a test corpus of children’s stories. After class re-sampling, the results of the step-wise classification methodology on a test sub-set of a medical drama corpus achieved accuracies in the 56% to 84% range for each emotion class and polarity. For emotion magnitude prediction, the developed recurrent (prior-state feedback) regression model using both text-based and acoustic based features achieved correlation coefficients in the range of 0.69 to 0.80. This prediction function was modeled using a non-linear approach based on Support Vector Regression (SVR) and performed better than other approaches based on Linear Regression or Artificial Neural Networks

Louisiana State University

Signal Processing Techniques for Data Reduction and Event Recognition in Cough Counting

Author: Barton Antony
Publication venue
Publication date: 31/12/2013
Field of study

The University of Manchester - Institutional Repository

A practical guide for generating unsupervised, spectrogram-based latent space representations of animal vocalizations

Author: Averly Baptiste
Demartsev Vlad
Jensen Frants H.
Manser Marta B.
Roch Marie
Sainburg Tim
Strandburg-Peshkin Ariana
Thomas Mara
Publication venue: 'Wiley'
Publication date: 01/01/2022
Field of study

© The Author(s), 2022. This article is distributed under the terms of the Creative Commons Attribution License. The definitive version was published in Thomas, M., Jensen, F. H., Averly, B., Demartsev, V., Manser, M. B., Sainburg, T., Roch, M. A., & Strandburg-Peshkin, A. A practical guide for generating unsupervised, spectrogram-based latent space representations of animal vocalizations. The Journal of Animal Ecology, 91(8), (2022): 1567– 1581, https://doi.org/10.1111/1365-2656.13754.1. Background: The manual detection, analysis and classification of animal vocalizations in acoustic recordings is laborious and requires expert knowledge. Hence, there is a need for objective, generalizable methods that detect underlying patterns in these data, categorize sounds into distinct groups and quantify similarities between them. Among all computational methods that have been proposed to accomplish this, neighbourhood-based dimensionality reduction of spectrograms to produce a latent space representation of calls stands out for its conceptual simplicity and effectiveness. 2. Goal of the study/what was done: Using a dataset of manually annotated meerkat Suricata suricatta vocalizations, we demonstrate how this method can be used to obtain meaningful latent space representations that reflect the established taxonomy of call types. We analyse strengths and weaknesses of the proposed approach, give recommendations for its usage and show application examples, such as the classification of ambiguous calls and the detection of mislabelled calls. 3. What this means: All analyses are accompanied by example code to help researchers realize the potential of this method for the study of animal vocalizations.This work was supported by HFSP Research Grant RGP0051/2019 to ASP, MBM and MAR, and funded by the Deutsche Forschungsgemeinschaft (DFG) under Germany's Excellence Strategy (EXC-2117-422037984). ASP received additional funding from the Gips-Schüle Stiftung, the Zukunftskolleg at the University of Konstanz and the Max-Planck-Institute of Animal Behaviour. VD was funded by the Minerva Stiftung and Alexander von Humboldt Foundation

KOPS - The Institutional Repository of the University of Konstanz

Woods Hole Open Access Server

MPG.PuRe

Recent Advances in Signal Processing

Author
Publication venue: 'IntechOpen'
Publication date: 20/04/2021
Field of study

The signal processing task is a very critical issue in the majority of new technological inventions and challenges in a variety of applications in both science and engineering fields. Classical signal processing techniques have largely worked with mathematical models that are linear, local, stationary, and Gaussian. They have always favored closed-form tractability over real-world accuracy. These constraints were imposed by the lack of powerful computing tools. During the last few decades, signal processing theories, developments, and applications have matured rapidly and now include tools from many areas of mathematics, computer science, physics, and engineering. This book is targeted primarily toward both students and researchers who want to be exposed to a wide variety of signal processing techniques and algorithms. It includes 27 chapters that can be categorized into five different areas depending on the application at hand. These five categories are ordered to address image processing, speech processing, communication systems, time-series analysis, and educational packages respectively. The book has the advantage of providing a collection of applications that are completely independent and self-contained; thus, the interested reader can choose any chapter and skip to another without losing continuity

Directory of Open Access Books (DOAB)

Non-verbal communication in instant messaging: conveying emotion through voice interfaces

Author: van Dijk Jan
Publication venue
Publication date: 11/07/2023
Field of study

Instant Messaging has become a keystone of human personal communication, where the biggest application WhatsApp is currently serving over 2 billion people. Plenty of research confirms people use non-verbal communication in computer mediated communication, allowing for emotional communication at distance. At the same time, Virtual Personal Assistants, such as the Google Assistant and Apple Siri, are continuously expanding their market share. Recently, they have included support for voice-based instant messaging, which includes reading aloud instant messages. As instant messages are synthesised, included digital non-verbal communication traits may be lost or omitted. This study aims to explore the impact of text-to-speech conversion of instant messages by virtual personal assistants on recognition of non-verbal cues by the receiving party. Secondly, the research aims to explore and test methods to include non-verbal communication traits in instant messages to speech synthesis, by the inclusion of spatial arrays (emojis) and modification of synthetic voice prosody. Sentiment analysis and emotion detection are explored and applied to extract emotional data from instant messages, which can be used to modify speech synthesis characteristics, such as pitch and speech rate, to mimic human paralanguage and vocal non-verbal communication to convey emotion

Trepo - Institutional Repository of Tampere University

A configurable vector processor for accelerating speech coding algorithms

Author: Konstantia Koutsomyti (7201031)
Publication venue
Publication date: 01/01/2007
Field of study

The growing demand for voice-over-packer (VoIP) services and multimedia-rich applications has made increasingly important the efficient, real-time implementation of low-bit rates speech coders on embedded VLSI platforms. Such speech coders are designed to substantially reduce the bandwidth requirements thus enabling dense multichannel gateways in small form factor. This however comes at a high computational cost which mandates the use of very high performance embedded processors. This thesis investigates the potential acceleration of two major ITU-T speech coding algorithms, namely G.729A and G.723.1, through their efficient implementation on a configurable extensible vector embedded CPU architecture. New scalar and vector ISAs were introduced which resulted in up to 80% reduction in the dynamic instruction count of both workloads. These instructions were subsequently encapsulated into a parametric, hybrid SISD (scalar processor)–SIMD (vector) processor. This work presents the research and implementation of the vector datapath of this vector coprocessor which is tightly-coupled to a Sparc-V8 compliant CPU, the optimization and simulation methodologies employed and the use of Electronic System Level (ESL) techniques to rapidly design SIMD datapaths

Loughborough University Institutional Repository

Recommended from our members

A new user interface for musical timbre design

Author: Seago Allan
Publication venue
Publication date: 01/01/2009
Field of study

This thesis characterises and addresses problems and issues associated with the design of intuitive user interfaces for timbral control. The usability of a range of synthesis methods and representative implementations of these methods is assessed, and three interface architectures - fixed architecture, architecture specification and direct specification - are identified. The characteristics of each of these architectures, as well as problems of usability inherent to each of them are discussed; it is argued that none of them provide intuitive tools for the manipulation and control of timbre. The study examines the nature of timbre and the notion of timbre space; different kinds of timbre space are considered and criteria are proposed for the selection of suitable timbre spaces as vehicles for synthesis. A number of listening tests, designed to demonstrate the feasibility of subsequent work, were devised and carried out; the results of these tests provide evidence that, where Euclidean distances between sounds located in a given timbre space are reflected in perceptual distances, the ability of subjects to detect relative distances in different parts of the space varies with the perceptual granularity of the space. Three contrasting timbre spaces conforming to the proposed criteria for use in synthesis are constructed; the purpose of these spaces is to provide an environment for a novel user interaction approach for timbral design which incorporates a search strategy based on weighted centroid localization. Two prototypes which exemplify the proposed approach in alternative ways are designed, implemented and tested with potential users in order to validate the approach; a third contrasting prototype which represents a simple contrasting alternative is tested for purposes of comparison. The results of these tests are evaluated and discussed, and areas of further work identified

Open Research Online (The Open University)

OpenGrey Repository