Search CORE

1,474 research outputs found

EMG-to-Speech: Direct Generation of Speech from Facial Electromyographic Signals

Author: Janke Matthias
Publication venue: KIT-Bibliothek, Karlsruhe
Publication date: 01/01/2016
Field of study

The general objective of this work is the design, implementation, improvement and evaluation of a system that uses surface electromyographic (EMG) signals and directly synthesizes an audible speech output: EMG-to-speech

KITopen

High-Dimensional Regression with Gaussian Mixtures and Partially-Latent Response Variables

Author: Deleforge Antoine
Forbes Florence
Horaud Radu
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 20/12/2013
Field of study

In this work we address the problem of approximating high-dimensional data with a low-dimensional representation. We make the following contributions. We propose an inverse regression method which exchanges the roles of input and response, such that the low-dimensional variable becomes the regressor, and which is tractable. We introduce a mixture of locally-linear probabilistic mapping model that starts with estimating the parameters of inverse regression, and follows with inferring closed-form solutions for the forward parameters of the high-dimensional regression problem of interest. Moreover, we introduce a partially-latent paradigm, such that the vector-valued response variable is composed of both observed and latent entries, thus being able to deal with data contaminated by experimental artifacts that cannot be explained with noise models. The proposed probabilistic formulation could be viewed as a latent-variable augmentation of regression. We devise expectation-maximization (EM) procedures based on a data augmentation strategy which facilitates the maximum-likelihood search over the model parameters. We propose two augmentation schemes and we describe in detail the associated EM inference procedures that may well be viewed as generalizations of a number of EM regression, dimension reduction, and factor analysis algorithms. The proposed framework is validated with both synthetic and real data. We provide experimental evidence that our method outperforms several existing regression techniques

arXiv.org e-Print Archive

Hal - Université Grenoble Alpes

INRIA a CCSD electronic archive server

Electronic Relaxation Dynamics of UV-Photoexcited 2-Aminopurine–Thymine Base Pairs in Watson-Crick and Hoogsteen Conformations

Author: Bohnsack Mats
Böhnke Hendrik
Ingle Rebecca
Marroux Hugo
Orr-Ewing Andrew
Roettger Katharina
Schwalb Nina
Temps Friedrich
Publication venue: 'American Chemical Society (ACS)'
Publication date: 26/03/2019
Field of study

The fluorescent analogue 2-aminopurine (2AP) of the canonical nucleobase adenine (6-aminopurine) base-pairs with thymine (T) without disrupting the helical structure of DNA. It therefore finds frequent use in molecular biology for probing DNA and RNA structures and conformational dynamics. However, detailed understanding of the processes responsible for fluorescence quenching remains largely elusive on a fundamental level. Although attempts have been made to ascribe decreased excited-state lifetimes to intrastrand charge-transfer and stacking interactions, possible influences from dynamic interstrand H-bonding have been widely ignored. Here, we investigate the electronic relaxation of UV-excited 2AP center dot T in Watson Crick (WC) and Hoogsteen (HS) conformations. Although the WC conformation features slowed-down, monomer-like electronic relaxation in tau similar to 1.6 ns toward ground-state recovery and triplet formation, the dynamics associated with 2AP center dot T in the HS motif exhibit faster deactivation in tau similar to 70 ps. As recent research has revealed abundant transient interstrand H-bonding in the Hoogsteen motif for duplex DNA, the established model for dynamic fluorescence quenching may need to be revised in the light of our results. The underlying supramolecular photophysical mechanisms are discussed in terms of a proposed excited-state double-proton transfer as an efficient deactivation channel for recovery of the HS species in the electronic ground state

Infoscience - École polytechnique fédérale de Lausanne

Explore Bristol Research

FigShare

Text-Independent Voice Conversion

Author: Sündermann David
Publication venue: Universität der Bundeswehr München, Fakultät für Elektrotechnik und Informationstechnik
Publication date: 01/01/2008
Field of study

This thesis deals with text-independent solutions for voice conversion. It first introduces the use of vocal tract length normalization (VTLN) for voice conversion. The presented variants of VTLN allow for easily changing speaker characteristics by means of a few trainable parameters. Furthermore, it is shown how VTLN can be expressed in time domain strongly reducing the computational costs while keeping a high speech quality. The second text-independent voice conversion paradigm is residual prediction. In particular, two proposed techniques, residual smoothing and the application of unit selection, result in essential improvement of both speech quality and voice similarity. In order to apply the well-studied linear transformation paradigm to text-independent voice conversion, two text-independent speech alignment techniques are introduced. One is based on automatic segmentation and mapping of artificial phonetic classes and the other is a completely data-driven approach with unit selection. The latter achieves a performance very similar to the conventional text-dependent approach in terms of speech quality and similarity. It is also successfully applied to cross-language voice conversion. The investigations of this thesis are based on several corpora of three different languages, i.e., English, Spanish, and German. Results are also presented from the multilingual voice conversion evaluation in the framework of the international speech-to-speech translation project TC-Star

Universität der Bundeswehr München: AtheneForschung

Deep learning for audio-visul speaker diarization

Author: Βαρθολομαίος Αργύριος Σ.
Publication venue
Publication date: 01/01/2017
Field of study

University of Thessaly Institutional Repository

Adaptation Algorithms for Neural Network-Based Speech Recognition: An Overview

Author: Bell Peter
Fainberg Joachim
Klejch Ondrej
Li Jinyu
Renals Steve
Swietojanski Pawel
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2020
Field of study

We present a structured overview of adaptation algorithms for neural network-based speech recognition, considering both hybrid hidden Markov model / neural network systems and end-to-end neural network systems, with a focus on speaker adaptation, domain adaptation, and accent adaptation. The overview characterizes adaptation algorithms as based on embeddings, model parameter adaptation, or data augmentation. We present a meta-analysis of the performance of speech recognition adaptation algorithms, based on relative error rate reductions as reported in the literature.Comment: Submitted to IEEE Open Journal of Signal Processing. 30 pages, 27 figure

arXiv.org e-Print Archive

Crossref

Edinburgh Research Explorer

Recommended from our members

Signal acquisition challenges in mobile systems

Author: Kim Youngchun
Publication venue
Publication date: 21/08/2018
Field of study

In recent decades, the advent of mobile computing has changed human lives by providing information that was not available in the past. The mobile computing platform opens a new door to the connected world in which various forms of hand-held and wearable systems are ubiquitous. A single mobile device plays multiple roles and shapes human lives towards a better future. In these systems, sensor-based data acquisition plays an essential role in generating and providing useful information. The increased number of sensors is embedded in a single device in order to process various signal modalities. In practice, more than 30 data converters are required in designing a mobile system in which the data-converting blocks become among the most power-hungry components in battery-operated systems. Due to the increased variety of sensors, mobile systems are meant to face several obstacles. For example, the increased number of sensors increase system power consumption during the system operation. The increased power consumption directly affects operation time because mobile systems are powered by a limited energy source. Moreover, an increased amount of information also gives rise to bandwidth problems in communication due to the increased volume of data transmission. Also, this system design requires a larger area in a silicon die so that multiple signal paths can be placed without cross-channel interference. Therefore, the system design has presented a challenge in terms of trying to resolve the design constraints such as power consumption, bandwidth usage, storage space, and design complexity issues. To overcome these obstacles, in this dissertation, efficient data acquisition and processing methods are investigated. Specifically, this thesis considers the problems of energy-efficient sampling and binary event detection. This dissertation begins by presenting a new signal sampling scheme that enables higher precision signal conversion in compressed-sensing-based signal acquisition. The proposed scheme is based on the popular successive approximation register and employs a modified compressive sensing technique to increase the resolution of successive-approximation-register (SAR) analog-to-digital converter (ADC) architecture. Circuit-level architecture is discussed to implement the proposed scheme using the SAR ADC architecture. A non-uniform quantization scheme is proposed and it improves data quality after data acquisition. The proposed scheme is expected to be used for medium- or high- frequency data conversion. Secondly, the possibility of using fewer ADCs than channels is studied by leveraging sparse-signal representation and blind-source-separation (BSS) techniques. In particular, this dissertation examines the problem of using a single ADC or quantizer system for digitizing multi-channel inputs. Mixing and de-mixing strategies are extensively studied for sampling frequency-sparse signals and the proposed multi-channel architecture can be easily implemented using today's analog/mixed-signal circuits. The third part of this dissertation investigates a binary hypothesis testing problem. In mobile devices such as smartphones and tablet PCs, a major portion of energy is consumed in user interfaces (LCD display and touch input processing). For accurate detection and better user interface, energy-efficient sensing and detection schemes are necessary to manage multiple sensor inputs. A highly efficient detection scheme is presented that can detect binary events reliably with a fraction of the energy consumption required in the conventional energy detection.Electrical and Computer Engineerin

Texas ScholarWorks

Investigating deep neural structures and their interpretability in the domain of voice conversion

Author: Broughton S.J.
Jalal M.A.
Moore R.K.
Publication venue: 'International Speech Communication Association'
Publication date: 22/02/2021
Field of study

Generative Adversarial Networks (GANs) are machine learning networks based around creating synthetic data. Voice Conversion (VC) is a subset of voice translation that involves translating the paralinguistic features of a source speaker to a target speaker while preserving the linguistic information. The aim of non-parallel conditional GANs for VC is to translate an acoustic speech feature sequence from one domain to another without the use of paired data. In the study reported here, we investigated the interpretability of state-of-the-art implementations of non-parallel GANs in the domain of VC. We show that the learned representations in the repeating layers of a particular GAN architecture remain close to their original random initialised parameters, demonstrating that it is the number of repeating layers that is more responsible for the quality of the output. We also analysed the learned representations of a model trained on one particular dataset when used during transfer learning on another dataset. This also showed high levels of similarity in the repeating layers. Together, these results provide new insight into how the learned representations of deep generative networks change during learning and the importance of the number of layers, which would help build better GAN-based speech conversion models

arXiv.org e-Print Archive

White Rose Research Online