1,474 research outputs found
EMG-to-Speech: Direct Generation of Speech from Facial Electromyographic Signals
The general objective of this work is the design, implementation, improvement and evaluation of a system that uses surface electromyographic (EMG) signals and directly synthesizes an audible speech output: EMG-to-speech
High-Dimensional Regression with Gaussian Mixtures and Partially-Latent Response Variables
In this work we address the problem of approximating high-dimensional data
with a low-dimensional representation. We make the following contributions. We
propose an inverse regression method which exchanges the roles of input and
response, such that the low-dimensional variable becomes the regressor, and
which is tractable. We introduce a mixture of locally-linear probabilistic
mapping model that starts with estimating the parameters of inverse regression,
and follows with inferring closed-form solutions for the forward parameters of
the high-dimensional regression problem of interest. Moreover, we introduce a
partially-latent paradigm, such that the vector-valued response variable is
composed of both observed and latent entries, thus being able to deal with data
contaminated by experimental artifacts that cannot be explained with noise
models. The proposed probabilistic formulation could be viewed as a
latent-variable augmentation of regression. We devise expectation-maximization
(EM) procedures based on a data augmentation strategy which facilitates the
maximum-likelihood search over the model parameters. We propose two
augmentation schemes and we describe in detail the associated EM inference
procedures that may well be viewed as generalizations of a number of EM
regression, dimension reduction, and factor analysis algorithms. The proposed
framework is validated with both synthetic and real data. We provide
experimental evidence that our method outperforms several existing regression
techniques
Electronic Relaxation Dynamics of UV-Photoexcited 2-Aminopurine–Thymine Base Pairs in Watson-Crick and Hoogsteen Conformations
The fluorescent analogue 2-aminopurine (2AP) of the canonical nucleobase adenine (6-aminopurine) base-pairs with thymine (T) without disrupting the helical structure of DNA. It therefore finds frequent use in molecular biology for probing DNA and RNA structures and conformational dynamics. However, detailed understanding of the processes responsible for fluorescence quenching remains largely elusive on a fundamental level. Although attempts have been made to ascribe decreased excited-state lifetimes to intrastrand charge-transfer and stacking interactions, possible influences from dynamic interstrand H-bonding have been widely ignored. Here, we investigate the electronic relaxation of UV-excited 2AP center dot T in Watson Crick (WC) and Hoogsteen (HS) conformations. Although the WC conformation features slowed-down, monomer-like electronic relaxation in tau similar to 1.6 ns toward ground-state recovery and triplet formation, the dynamics associated with 2AP center dot T in the HS motif exhibit faster deactivation in tau similar to 70 ps. As recent research has revealed abundant transient interstrand H-bonding in the Hoogsteen motif for duplex DNA, the established model for dynamic fluorescence quenching may need to be revised in the light of our results. The underlying supramolecular photophysical mechanisms are discussed in terms of a proposed excited-state double-proton transfer as an efficient deactivation channel for recovery of the HS species in the electronic ground state
Text-Independent Voice Conversion
This thesis deals with text-independent solutions for voice conversion. It first introduces the use of vocal tract length normalization (VTLN) for voice conversion. The presented variants of VTLN allow for easily changing speaker characteristics by means of a few trainable parameters. Furthermore, it is shown how VTLN can be expressed in time domain strongly reducing the computational costs while keeping a high speech quality. The second text-independent voice conversion paradigm is residual prediction. In particular, two proposed techniques, residual smoothing and the application of unit selection, result in essential improvement of both speech quality and voice similarity. In order to apply the well-studied linear transformation paradigm to text-independent voice conversion, two text-independent speech alignment techniques are introduced. One is based on automatic segmentation and mapping of artificial phonetic classes and the other is a completely data-driven approach with unit selection. The latter achieves a performance very similar to the conventional text-dependent approach in terms of speech quality and similarity. It is also successfully applied to cross-language voice conversion. The investigations of this thesis are based on several corpora of three different languages, i.e., English, Spanish, and German. Results are also presented from the multilingual voice conversion evaluation in the framework of the international speech-to-speech translation project TC-Star
Adaptation Algorithms for Neural Network-Based Speech Recognition: An Overview
We present a structured overview of adaptation algorithms for neural
network-based speech recognition, considering both hybrid hidden Markov model /
neural network systems and end-to-end neural network systems, with a focus on
speaker adaptation, domain adaptation, and accent adaptation. The overview
characterizes adaptation algorithms as based on embeddings, model parameter
adaptation, or data augmentation. We present a meta-analysis of the performance
of speech recognition adaptation algorithms, based on relative error rate
reductions as reported in the literature.Comment: Submitted to IEEE Open Journal of Signal Processing. 30 pages, 27
figure
Recommended from our members
Signal acquisition challenges in mobile systems
In recent decades, the advent of mobile computing has changed human lives by providing information that was not available in the past. The mobile computing platform opens a new door to the connected world in which various forms of hand-held and wearable systems are ubiquitous. A single mobile device plays multiple roles and shapes human lives towards a better future. In these systems, sensor-based data acquisition plays an essential role in generating and providing useful information.
The increased number of sensors is embedded in a single device in order to process various signal modalities. In practice, more than 30 data converters are required in designing a mobile system in which the data-converting blocks become among the most power-hungry components in battery-operated systems. Due to the increased variety of sensors, mobile systems are meant to face several obstacles. For example, the increased number of sensors increase system power consumption during the system operation. The increased power consumption directly affects operation time because mobile systems are powered by a limited energy source. Moreover, an increased amount of information also gives rise to bandwidth problems in communication due to the increased volume of data transmission. Also, this system design requires a larger area in a silicon die so that multiple signal paths can be placed without cross-channel interference. Therefore, the system design has presented a challenge in terms of trying to resolve the design constraints such as power consumption, bandwidth usage, storage space, and design complexity issues.
To overcome these obstacles, in this dissertation, efficient data acquisition and processing methods are investigated. Specifically, this thesis considers the problems of energy-efficient sampling and binary event detection.
This dissertation begins by presenting a new signal sampling scheme that enables higher precision signal conversion in compressed-sensing-based signal acquisition. The proposed scheme is based on the popular successive approximation register and employs a modified compressive sensing technique to increase the resolution of successive-approximation-register (SAR) analog-to-digital converter (ADC) architecture. Circuit-level architecture is discussed to implement the proposed scheme using the SAR ADC architecture. A non-uniform quantization scheme is proposed and it improves data quality after data acquisition. The proposed scheme is expected to be used for medium- or high- frequency data conversion.
Secondly, the possibility of using fewer ADCs than channels is studied by leveraging sparse-signal representation and blind-source-separation (BSS) techniques.
In particular, this dissertation examines the problem of using a single ADC or quantizer system for digitizing multi-channel inputs. Mixing and de-mixing strategies are extensively studied for sampling frequency-sparse signals and the proposed multi-channel architecture can be easily implemented using today's analog/mixed-signal circuits.
The third part of this dissertation investigates a binary hypothesis testing problem. In mobile devices such as smartphones and tablet PCs, a major portion of energy is consumed in user interfaces (LCD display and touch input processing). For accurate detection and better user interface, energy-efficient sensing and detection schemes are necessary to manage multiple sensor inputs. A highly efficient detection scheme is presented that can detect binary events reliably with a fraction of the energy consumption required in the conventional energy detection.Electrical and Computer Engineerin
Investigating deep neural structures and their interpretability in the domain of voice conversion
Generative Adversarial Networks (GANs) are machine learning networks based around creating synthetic data. Voice Conversion (VC) is a subset of voice translation that involves translating the paralinguistic features of a source speaker to a target speaker while preserving the linguistic information. The aim of non-parallel conditional GANs for VC is to translate an acoustic speech feature sequence from one domain to another without the use of paired data. In the study reported here, we investigated the interpretability of state-of-the-art implementations of non-parallel GANs in the domain of VC. We show that the learned representations in the repeating layers of a particular GAN architecture remain close to their original random initialised parameters, demonstrating that it is the number of repeating layers that is more responsible for the quality of the output. We also analysed the learned representations of a model trained on one particular dataset when used during transfer learning on another dataset. This also showed high levels of similarity in the repeating layers. Together, these results provide new insight into how the learned representations of deep generative networks change during learning and the importance of the number of layers, which would help build better GAN-based speech conversion models
- …