Search CORE

63 research outputs found

Fast vocabulary acquisition in an NMF-based self-learning vocal user interface

Author: Gemmeke Jort F.
Ons Bart
Van hamme Hugo
Publication venue: The Authors. Published by Elsevier Ltd.
Publication date
Field of study

AbstractIn command-and-control applications, a vocal user interface (VUI) is useful for handsfree control of various devices, especially for people with a physical disability. The spoken utterances are usually restricted to a predefined list of phrases or to a restricted grammar, and the acoustic models work well for normal speech. While some state-of-the-art methods allow for user adaptation of the predefined acoustic models and lexicons, we pursue a fully adaptive VUI by learning both vocabulary and acoustics directly from interaction examples. A learning curve usually has a steep rise in the beginning and an asymptotic ceiling at the end. To limit tutoring time and to guarantee good performance in the long run, the word learning rate of the VUI should be fast and the learning curve should level off at a high accuracy. In order to deal with these performance indicators, we propose a multi-level VUI architecture and we investigate the effectiveness of alternative processing schemes. In the low-level layer, we explore the use of MIDA features (Mutual Information Discrimination Analysis) against conventional MFCC features. In the mid-level layer, we enhance the acoustic representation by means of phone posteriorgrams and clustering procedures. In the high-level layer, we use the NMF (Non-negative Matrix Factorization) procedure which has been demonstrated to be an effective approach for word learning. We evaluate and discuss the performance and the feasibility of our approach in a realistic experimental setting of the VUI-user learning context

Elsevier - Publisher Connector

The evolution of language: Proceedings of the Joint Conference on Language Evolution (JCoLE)

Author
Publication venue: Joint Conference on Language Evolution (JCoLE)
Publication date: 01/01/2022
Field of study

MPG.PuRe

A knowledge-based approach to automatic detection of equipment alarm sounds in a neonatal intensive care unit environment

Author: Jancovic Peter
Kokuer Munevver
Muñoz Mahamud Blanca
Nadeu Camprubí Climent
Peiró Lilja Alexandre Cristian
Raboshchuk Ganna
Riverola de Veciana Ana
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2017
Field of study

A large number of alarm sounds triggered by biomedical equipment occur frequently in the noisy environment of a neonatal intensive care unit (NICU) and play a key role in providing healthcare. In this paper, our work on the development of an automatic system for detection of acoustic alarms in that difficult environment is presented. Such automatic detection system is needed for the investigation of how a preterm infant reacts to auditory stimuli of the NICU environment and for an improved real-time patient monitoring. The approach presented in this paper consists of using the available knowledge about each alarm class in the design of the detection system. The information about the frequency structure is used in the feature extraction stage and the time structure knowledge is incorporated at the post-processing stage. Several alternative methods are compared for feature extraction, modelling and post-processing. The detection performance is evaluated with real data recorded in the NICU of the hospital, and by using both frame-level and period-level metrics. The experimental results show that the inclusion of both spectral and temporal information allows to improve the baseline detection performance by more than 60%Peer ReviewedPostprint (published version

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

Crossref

UPCommons. Portal del coneixement obert de la UPC

University of Birmingham Research Portal

Error Signals from the Brain: 7th Mismatch Negativity Conference

Author: Bendixen Alexandra
Friederici Angela D.
Grimm Sabine
Gunter Thomas C.
Kotz Sonja A.
Müller Dagmar
Roeber Urte
Rübsamen Rudolf
Schröger Erich
Steinberg Johanna
Weise Annekathrin
Wetzel Nicole
Widmann Andreas
Publication venue
Publication date: 28/02/2019
Field of study

The 7th Mismatch Negativity Conference presents the state of the art in methods, theory, and application (basic and clinical research) of the MMN (and related error signals of the brain). Moreover, there will be two pre-conference workshops: one on the design of MMN studies and the analysis and interpretation of MMN data, and one on the visual MMN (with 20 presentations). There will be more than 40 presentations on hot topics of MMN grouped into thirteen symposia, and about 130 poster presentations. Keynote lectures by Kimmo Alho, Angela D. Friederici, and Israel Nelken will round off the program by covering topics related to and beyond MMN

Qucosa - Publikationsserver der Universität Leipzig

Combining Long Short-Term Memory and Dynamic Bayesian Networks for Incremental Emotion-Sensitive Artificial Listening

Author: Eyben F
Rigoll G
Schuller B
Woellmer M
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/10/2010
Field of study

OPUS Augsburg

Crossref

Spiral - Imperial College Digital Repository

Detecting autism, emotions and social signals using AdaBoost

Author: Busa-Fekete Róbert
Gosztolya Gábor
Tóth László
Publication venue: Interspeech
Publication date: 01/01/2013
Field of study

SZTE Publicatio Repozitórium - SZTE - Repository of Publications

Robust speech recognition with spectrogram factorisation

Author: Hurmalainen Antti
Publication venue: Tampere University of Technology
Publication date: 01/01/2014
Field of study

Communication by speech is intrinsic for humans. Since the breakthrough of mobile devices and wireless communication, digital transmission of speech has become ubiquitous. Similarly distribution and storage of audio and video data has increased rapidly. However, despite being technically capable to record and process audio signals, only a fraction of digital systems and services are actually able to work with spoken input, that is, to operate on the lexical content of speech. One persistent obstacle for practical deployment of automatic speech recognition systems is inadequate robustness against noise and other interferences, which regularly corrupt signals recorded in real-world environments. Speech and diverse noises are both complex signals, which are not trivially separable. Despite decades of research and a multitude of different approaches, the problem has not been solved to a sufficient extent. Especially the mathematically ill-posed problem of separating multiple sources from a single-channel input requires advanced models and algorithms to be solvable. One promising path is using a composite model of long-context atoms to represent a mixture of non-stationary sources based on their spectro-temporal behaviour. Algorithms derived from the family of non-negative matrix factorisations have been applied to such problems to separate and recognise individual sources like speech. This thesis describes a set of tools developed for non-negative modelling of audio spectrograms, especially involving speech and real-world noise sources. An overview is provided to the complete framework starting from model and feature definitions, advancing to factorisation algorithms, and finally describing different routes for separation, enhancement, and recognition tasks. Current issues and their potential solutions are discussed both theoretically and from a practical point of view. The included publications describe factorisation-based recognition systems, which have been evaluated on publicly available speech corpora in order to determine the efficiency of various separation and recognition algorithms. Several variants and system combinations that have been proposed in literature are also discussed. The work covers a broad span of factorisation-based system components, which together aim at providing a practically viable solution to robust processing and recognition of speech in everyday situations

Trepo - Institutional Repository of Tampere University

Recognising realistic emotions and affect in speech: State of the art and lessons learnt from the first challenge

Author: Altun
Anton Batliner
Armstrong
Atal
Athanaselis
Batliner
Batliner
Batliner
Bellman
Bengio
Björn Schuller
Boersma
Cheveigne
Cowie
Cowie
Daubechies
Davis
de Gelder
de Gelder
Devillers
Devillers
Dino Seppi
Erickson
Eyben
Eysenck
Fehr
Ferguson
Fernandez
Fillenbaum
Fleiss
Frick
Fukunaga
Gigerenzer
Grimm
Harnad
Hermansky
Hess
Hyvärinen
Johnstone
Jolliffe
Kharat
Kim
Lee
Lee
Lizhong
Lovins
Makhoul
Martin
Matos
Morrison
Morrison
Morrison
Nasoz
Nickerson
Noll
Nwe
Nöth
Pachet
Pantic
Pernegger
Picard
Porter
Pudil
Rabiner
Rosch
Rozeboom
Russell
Sachs
Said
Salzberg
Sato
Scherer
Schröder
Shaver
Stefan Steidl
tenBosch
Vlasenko
Witten
Wolpert
Wu
Wöllmer
Zeng
Zeng
Zeng
Zwicker
Publication venue: 'Elsevier BV'
Publication date: 01/11/2011
Field of study

More than a decade has passed since research on automatic recognition of emotion from speech has become a new field of research in line with its 'big brothers' speech and speaker recognition. This article attempts to provide a short overview on where we are today, how we got there and what this can reveal us on where to go next and how we could arrive there. In a first part, we address the basic phenomenon reflecting the last fifteen years, commenting on databases, modelling and annotation, the unit of analysis and prototypicality. We then shift to automatic processing including discussions on features, classification, robustness, evaluation, and implementation and system integration. From there we go to the first comparative challenge on emotion recognition from speech-the INTERSPEECH 2009 Emotion Challenge, organised by (part of) the authors, including the description of the Challenge's database, Sub-Challenges, participants and their approaches, the winners, and the fusion of results to the actual learnt lessons before we finally address the ever-lasting problems and future promising attempts. (C) 2011 Elsevier B.V. All rights reserved.Schuller B., Batliner A., Steidl S., Seppi D., ''Recognising realistic emotions and affect in speech: state of the art and lessons learnt from the first challenge'', Speech communication, vol. 53, no. 9-10, pp. 1062-1087, November 2011.status: publishe

Lirias

OPUS Augsburg

Crossref

Spiral - Imperial College Digital Repository

Models and analysis of vocal emissions for biomedical applications

Author
Publication venue: 'Firenze University Press'
Publication date: 31/05/2022
Field of study

This book of Proceedings collects the papers presented at the 3rd International Workshop on Models and Analysis of Vocal Emissions for Biomedical Applications, MAVEBA 2003, held 10-12 December 2003, Firenze, Italy. The workshop is organised every two years, and aims to stimulate contacts between specialists active in research and industrial developments, in the area of voice analysis for biomedical applications. The scope of the Workshop includes all aspects of voice modelling and analysis, ranging from fundamental research to all kinds of biomedical applications and related established and advanced technologies

Directory of Open Access Books (DOAB)

Models and Analysis of Vocal Emissions for Biomedical Applications

Author
Publication venue: 'Firenze University Press'
Publication date: 31/05/2022
Field of study

The International Workshop on Models and Analysis of Vocal Emissions for Biomedical Applications (MAVEBA) came into being in 1999 from the particularly felt need of sharing know-how, objectives and results between areas that until then seemed quite distinct such as bioengineering, medicine and singing. MAVEBA deals with all aspects concerning the study of the human voice with applications ranging from the newborn to the adult and elderly. Over the years the initial issues have grown and spread also in other fields of research such as occupational voice disorders, neurology, rehabilitation, image and video analysis. MAVEBA takes place every two years in Firenze, Italy. This edition celebrates twenty-two years of uninterrupted and successful research in the field of voice analysis

Directory of Open Access Books (DOAB)