Search CORE

84 research outputs found

Rhythmic constant pitch time stretching for digital audio

Author: Trevorrow Brendan
Publication venue: Australian Acoustical Society
Publication date: 01/11/2014
Field of study

Constant pitch time stretching is not uncommon in audio editing software, however several issues arise when it is used on musical recordings, most notably the doubling and skipping of rhythmic transients. This paper examines three signal processing algorithms which are commonly used to provide constant pitch time stretching: these are SOLA (Synchronous Overlap and Add), TD-PSOLA (Time Domain Pitch Synchronous Overlap and Add), and Phase Vocoder. Enhancements to the SOLA and TD-PSOLA algorithms are provided which may make them more suited to rhythmic music. It is found that each of these three algorithms introduce audible artifacts in the time stretched waveform, the severity of these side effects and what causes them is also discussed

University of Southern Queensland ePrints

Glottal-synchronous speech processing

Author: Thomas Mark R P
Thomas Mark R P
Publication venue: Electrical and Electronic Engineering, Imperial College London
Publication date: 01/01/2010
Field of study

Glottal-synchronous speech processing is a field of speech science where the pseudoperiodicity of voiced speech is exploited. Traditionally, speech processing involves segmenting and processing short speech frames of predefined length; this may fail to exploit the inherent periodic structure of voiced speech which glottal-synchronous speech frames have the potential to harness. Glottal-synchronous frames are often derived from the glottal closure instants (GCIs) and glottal opening instants (GOIs). The SIGMA algorithm was developed for the detection of GCIs and GOIs from the Electroglottograph signal with a measured accuracy of up to 99.59%. For GCI and GOI detection from speech signals, the YAGA algorithm provides a measured accuracy of up to 99.84%. Multichannel speech-based approaches are shown to be more robust to reverberation than single-channel algorithms. The GCIs are applied to real-world applications including speech dereverberation, where SNR is improved by up to 5 dB, and to prosodic manipulation where the importance of voicing detection in glottal-synchronous algorithms is demonstrated by subjective testing. The GCIs are further exploited in a new area of data-driven speech modelling, providing new insights into speech production and a set of tools to aid deployment into real-world applications. The technique is shown to be applicable in areas of speech coding, identification and artificial bandwidth extension of telephone speec

Spiral - Imperial College Digital Repository

OpenGrey Repository

Pitch modification techniques for sampled voice

Author: Brooks Michael
Publication venue
Publication date: 27/06/2018
Field of study

The Australian National University

Wavelet-based voice morphing

Author: Moroz I. M.
Orphanidou C.
Roberts S. J.
Publication venue
Publication date: 01/01/2004
Field of study

This paper presents a new multi-scale voice morphing algorithm. This algorithm enables a user to transform one person's speech pattern into another person's pattern with distinct characteristics, giving it a new identity, while preserving the original content. The voice morphing algorithm performs the morphing at different subbands by using the theory of wavelets and models the spectral conversion using the theory of Radial Basis Function Neural Networks. The results obtained on the TIMIT speech database demonstrate effective transformation of the speaker identity

Oxford University Research Archive

Simulating dysarthric speech for training data augmentation in clinical speech applications

Author: Berisha Visar
Jiao Yishan
Liss Julie
Tu Ming
Publication venue
Publication date: 26/04/2018
Field of study

Training machine learning algorithms for speech applications requires large, labeled training data sets. This is problematic for clinical applications where obtaining such data is prohibitively expensive because of privacy concerns or lack of access. As a result, clinical speech applications are typically developed using small data sets with only tens of speakers. In this paper, we propose a method for simulating training data for clinical applications by transforming healthy speech to dysarthric speech using adversarial training. We evaluate the efficacy of our approach using both objective and subjective criteria. We present the transformed samples to five experienced speech-language pathologists (SLPs) and ask them to identify the samples as healthy or dysarthric. The results reveal that the SLPs identify the transformed speech as dysarthric 65% of the time. In a pilot classification experiment, we show that by using the simulated speech samples to balance an existing dataset, the classification accuracy improves by about 10% after data augmentation.Comment: Will appear in Proc. of ICASSP 201

arXiv.org e-Print Archive

Crossref

Text-Independent Voice Conversion

Author: Sündermann David
Publication venue: Universität der Bundeswehr München, Fakultät für Elektrotechnik und Informationstechnik
Publication date: 01/01/2008
Field of study

This thesis deals with text-independent solutions for voice conversion. It first introduces the use of vocal tract length normalization (VTLN) for voice conversion. The presented variants of VTLN allow for easily changing speaker characteristics by means of a few trainable parameters. Furthermore, it is shown how VTLN can be expressed in time domain strongly reducing the computational costs while keeping a high speech quality. The second text-independent voice conversion paradigm is residual prediction. In particular, two proposed techniques, residual smoothing and the application of unit selection, result in essential improvement of both speech quality and voice similarity. In order to apply the well-studied linear transformation paradigm to text-independent voice conversion, two text-independent speech alignment techniques are introduced. One is based on automatic segmentation and mapping of artificial phonetic classes and the other is a completely data-driven approach with unit selection. The latter achieves a performance very similar to the conventional text-dependent approach in terms of speech quality and similarity. It is also successfully applied to cross-language voice conversion. The investigations of this thesis are based on several corpora of three different languages, i.e., English, Spanish, and German. Results are also presented from the multilingual voice conversion evaluation in the framework of the international speech-to-speech translation project TC-Star

Universität der Bundeswehr München: AtheneForschung

The auditory-brainstem response to continuous, non repetitive speech is modulated by the speech envelope and reflects speech processing

Author: Braiman C
Hudspeth AJ
Reichenbach CS
Reichenbach JDT
Schiff ND
Publication venue: 'Frontiers Media SA'
Publication date: 29/04/2016
Field of study

Spiral - Imperial College Digital Repository

A transient-preserving audio time-stretching algorithm and a real-time realization for a commercial music product

Author
Publication venue
Publication date
Field of study

The core of this work is a sub-band transient detection/preservation scheme based on the complex domain transient detection, and inspired by Robel’s work. This proposed technique can be integrated in a real-time phase vocoder analysis/synthesis scheme without introducing latency at relatively low computational cost

Padua Thesis and Dissertation Archive