Search CORE

1,200 research outputs found

Recommended from our members

Real-time decoding of question-and-answer speech dialogue using human cortical activity.

Author: Chang Edward F
Leonard Matthew K
Makin Joseph G
Moses David A
Publication venue: eScholarship, University of California
Publication date: 01/07/2019
Field of study

Natural communication often occurs in dialogue, differentially engaging auditory and sensorimotor brain regions during listening and speaking. However, previous attempts to decode speech directly from the human brain typically consider listening or speaking tasks in isolation. Here, human participants listened to questions and responded aloud with answers while we used high-density electrocorticography (ECoG) recordings to detect when they heard or said an utterance and to then decode the utterance's identity. Because certain answers were only plausible responses to certain questions, we could dynamically update the prior probabilities of each answer using the decoded question likelihoods as context. We decode produced and perceived utterances with accuracy rates as high as 61% and 76%, respectively (chance is 7% and 20%). Contextual integration of decoded question likelihoods significantly improves answer decoding. These results demonstrate real-time decoding of speech in an interactive, conversational setting, which has important implications for patients who are unable to communicate

eScholarship - University of California

Continuous Action Recognition Based on Sequence Alignment

Author: Cech Jan
Evangelidis Georgios
Horaud Radu
Kulkarni Kaustubh
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 02/06/2014
Field of study

Continuous action recognition is more challenging than isolated recognition because classification and segmentation must be simultaneously carried out. We build on the well known dynamic time warping (DTW) framework and devise a novel visual alignment technique, namely dynamic frame warping (DFW), which performs isolated recognition based on per-frame representation of videos, and on aligning a test sequence with a model sequence. Moreover, we propose two extensions which enable to perform recognition concomitant with segmentation, namely one-pass DFW and two-pass DFW. These two methods have their roots in the domain of continuous recognition of speech and, to the best of our knowledge, their extension to continuous visual action recognition has been overlooked. We test and illustrate the proposed techniques with a recently released dataset (RAVEL) and with two public-domain datasets widely used in action recognition (Hollywood-1 and Hollywood-2). We also compare the performances of the proposed isolated and continuous recognition algorithms with several recently published methods

arXiv.org e-Print Archive

Hal - Université Grenoble Alpes

INRIA a CCSD electronic archive server

Reactive Statistical Mapping: Towards the Sketching of Performative Control with Data

Author: Astrinaki Maria
Babacan Onur
Barbulescu Adela
Cakmak Huseyin
Dall Rasmus
d’Alessandro Nicolas
Hu Qiong
Hueber Thomas
Huguenin Victor
Kalaycı Emine Sümeyye
Moinet Alexis
Parfait Valentin
Ravet Thierry
Tilmanne Joëlle
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 15/07/2013
Field of study

Part 1: Fundamental IssuesInternational audienceThis paper presents the results of our participation to the ninth eNTERFACE workshop on multimodal user interfaces. Our target for this workshop was to bring some technologies currently used in speech recognition and synthesis to a new level, i.e. being the core of a new HMM-based mapping system. The idea of statistical mapping has been investigated, more precisely how to use Gaussian Mixture Models and Hidden Markov Models for realtime and reactive generation of new trajectories from inputted labels and for realtime regression in a continuous-to-continuous use case. As a result, we have developed several proofs of concept, including an incremental speech synthesiser, a software for exploring stylistic spaces for gait and facial motion in realtime, a reactive audiovisual laughter and a prototype demonstrating the realtime reconstruction of lower body gait motion strictly from upper body motion, with conservation of the stylistic properties. This project has been the opportunity to formalise HMM-based mapping, integrate various of these innovations into the Mage library and explore the development of a realtime gesture recognition tool

CiteSeerX

Crossref

Hal - Université Grenoble Alpes

INRIA a CCSD electronic archive server

Deep Learning in Cardiology

Author: Bizopoulos Paschalis
Koutsouris Dimitrios
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 03/02/2021
Field of study

The medical field is creating large amount of data that physicians are unable to decipher and use efficiently. Moreover, rule-based expert systems are inefficient in solving complicated medical tasks or for creating insights using big data. Deep learning has emerged as a more accurate and effective technology in a wide range of medical problems such as diagnosis, prediction and intervention. Deep learning is a representation learning method that consists of layers that transform the data non-linearly, thus, revealing hierarchical relationships and structures. In this review we survey deep learning application papers that use structured data, signal and imaging modalities from cardiology. We discuss the advantages and limitations of applying deep learning in cardiology that also apply in medicine in general, while proposing certain directions as the most viable for clinical use.Comment: 27 pages, 2 figures, 10 table

arXiv.org e-Print Archive

Articulatory features for conversational speech recognition

Author: Metze Florian
Publication venue: KIT-Bibliothek, Karlsruhe
Publication date: 01/01/2005
Field of study

KITopen

Detecting autism, emotions and social signals using AdaBoost

Author: Busa-Fekete Róbert
Gosztolya Gábor
Tóth László
Publication venue: Interspeech
Publication date: 01/01/2013
Field of study

SZTE Publicatio Repozitórium - SZTE - Repository of Publications

Learning deep models from synthetic data for extracting dolphin whistle contours

Author: Cholewiak Danielle
Fleishman Erica
Gillespie Douglas Michael
Helble Tyler
Klinck Holger
Li Pu
Liu Xiaobai
Nosal Eva-Marie
Palmer Kaitlin
Roch Marie
Shiu Yu
Publication venue: IEEE Computer Society
Publication date: 21/04/2020
Field of study

We present a learning-based method for extracting whistles of toothed whales (Odontoceti) in hydrophone recordings. Our method represents audio signals as time-frequency spectrograms and decomposes each spectrogram into a set of time-frequency patches. A deep neural network learns archetypical patterns (e.g., crossings, frequency modulated sweeps) from the spectrogram patches and predicts time-frequency peaks that are associated with whistles. We also developed a comprehensive method to synthesize training samples from background environments and train the network with minimal human annotation effort. We applied the proposed learn-from-synthesis method to a subset of the public Detection, Classification, Localization, and Density Estimation (DCLDE) 2011 workshop data to extract whistle confidence maps, which we then processed with an existing contour extractor to produce whistle annotations. The F1-score of our best synthesis method was 0.158 greater than our baseline whistle extraction algorithm (~25% improvement) when applied to common dolphin (Delphinus spp.) and bottlenose dolphin (Tursiops truncatus) whistles.Postprin

arXiv.org e-Print Archive

University of St. Andrews - Pure

St Andrews Research Repository

Recommended from our members

Detection and Classification of Acoustic Scenes and Events

Author: Benetos E.
Giannoulis D.
Lagrange M.
Plumbley M. D.
Stowell D.
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2015
Field of study

For intelligent systems to make best use of the audio modality, it is important that they can recognize not just speech and music, which have been researched as specific tasks, but also general sounds in everyday environments. To stimulate research in this field we conducted a public research challenge: the IEEE Audio and Acoustic Signal Processing Technical Committee challenge on Detection and Classification of Acoustic Scenes and Events (DCASE). In this paper, we report on the state of the art in automatically classifying audio scenes, and automatically detecting and classifying audio events. We survey prior work as well as the state of the art represented by the submissions to the challenge from various research groups. We also provide detail on the organization of the challenge, so that our experience as challenge hosts may be useful to those organizing challenges in similar domains. We created new audio datasets and baseline systems for the challenge; these, as well as some submitted systems, are publicly available under open licenses, to serve as benchmarks for further research in general-purpose machine listening

City Research Online

Crossref

Queen Mary Research Online

Surrey Research Insight

On the development of an automatic voice pleasantness classiﬁcation and intensity estimation system

Author: Braga Daniela
Coelho Luís
Garcia-Mateo Carmen
Sales-Dias Miguel
Publication venue: 'Elsevier BV'
Publication date: 01/01/2013
Field of study

In the last few years, the number of systems and devices that use voice based interaction has grown signiﬁcantly. For a continued use of these systems, the interface must be reliable and pleasant in order to provide an optimal user experience. However there are currently very few studies that try to evaluate how pleasant is a voice from a perceptual point of view when the ﬁnal application is a speech based interface. In this paper we present an objective deﬁnition for voice pleasantness based on the composition of a representative feature subset and a new automatic voice pleasantness classiﬁcation and intensity estimation system. Our study is based on a database composed by European Portuguese female voices but the methodology can be extended to male voices or to other languages. In the objective performance evaluation the system achieved a 9.1% error rate for voice pleasantness classiﬁcation and a 15.7% error rate for voice pleasantness intensity estimation.Work partially supported by ERDF funds, the Spanish Government (TEC2009-14094-C04-04), and Xunta de Galicia (CN2011/019, 2009/062

Repositório Científico do Instituto Politécnico do Porto

Repositório Institucional do ISCTE-IUL

The Photosynthesiser - A methodology for mapping environmental conditions, pivotal to the speed of photosynthesis in plants, through sonification.

Author: Brand Oliver
Publication venue: 'University of Plymouth'
Publication date: 01/01/2020
Field of study

Traditionally, the close inspection of data requires visual guidance in the form of displays depicting numeric or graphical representations over time. Sonification offers a way to convey this data through auditory means, relinquishing the need for constant display monitoring. To enable horticulturists to continue to move and work around their environment a proposed sonification mapping system for the key environmental conditions, vital for optimum levels of photosynthesis, has been developed. The outcome of this research was to provide a monitoring system that was both musical and meaningful with regards to data fluctuations and most importantly, could be interpreted by a wide demographic of listeners. A literature review provides an underpinning to both the scientific and artistic merits of sonification whilst a practice-based model was used to develop appropriate musical timbres, offering a natural instrumentation through physical modelling synthesis. Key questions around which musical factors can be used to trigger specific emotions and which of these emotions do we associate with an environment that offers a higher rate or low rate of photosynthesis for plants are explored. Through literary research as well as the deployment and analysis of surveys, a list of musical parameters was identified and a mapping framework designed. To analyse the success of the design, an audio installation was constructed within grounds at the Eden Project. The environmental data of both biomes, tropical and Mediterranean, were sonified into two musical streams and visitors surveyed through quantitative and qualitative methods in an experiment to see if they could correctly associate the music to the biome. The results provided 90% accuracy in the correct identification. It is theorised through this research that the mapping framework designed can be used in the sonification of climate conditions and communicate key traits within each environment

Plymouth Electronic Archive and Research Library