755 research outputs found
Improving the Speech Intelligibility By Cochlear Implant Users
In this thesis, we focus on improving the intelligibility of speech for cochlear implants (CI) users. As an auditory prosthetic device, CI can restore hearing sensations for most patients with profound hearing loss in both ears in a quiet background. However, CI users still have serious problems in understanding speech in noisy and reverberant environments. Also, bandwidth limitation, missing temporal fine structures, and reduced spectral resolution due to a limited number of electrodes are other factors that raise the difficulty of hearing in noisy conditions for CI users, regardless of the type of noise. To mitigate these difficulties for CI listener, we investigate several contributing factors such as the effects of low harmonics on tone identification in natural and vocoded speech, the contribution of matched envelope dynamic range to the binaural benefits and contribution of low-frequency harmonics to tone identification in quiet and six-talker babble background. These results revealed several promising methods for improving speech intelligibility for CI patients. In addition, we investigate the benefits of voice conversion in improving speech intelligibility for CI users, which was motivated by an earlier study showing that familiarity with a talker’s voice can improve understanding of the conversation. Research has shown that when adults are familiar with someone’s voice, they can more accurately – and even more quickly – process and understand what the person is saying. This theory identified as the “familiar talker advantage” was our motivation to examine its effect on CI patients using voice conversion technique. In the present research, we propose a new method based on multi-channel voice conversion to improve the intelligibility of transformed speeches for CI patients
Time and Frequency Independent Manipulation of Audio in Real Time
Analog audio implies time-frequency dependence. With digitally sampled audio, this timefrequency dependence can be broken and either variable can be manipulated independently of the other, in real time. This paper will mostly focus on the frequency domain algorithm called the Phase Vocoder which breaks this time-frequency dependence. We will start by looking at Fourier Theory and the effect of discrete sampling. Then we will look at the Phase Vocoder\u27s theory of operation, as well as improvements made by Puckette, Laroche, and Dolson, to name a few. Through all of this, simple examples will be presented in order to gain intuition into the principles at hand. Towards the end, a time domain approach for time-frequency independence called Granular Synthesis will be explored. We will compare it to the Phase Vocoder, and see how our understanding of one changes how we think and make decisions for the other. Finally we will propose some ideas for further improvement to real-time time-frequency independent manipulation of audio
Efficient Approaches for Voice Change and Voice Conversion Systems
In this thesis, the study and design of Voice Change and Voice Conversion systems are
presented. Particularly, a voice change system manipulates a speaker’s voice to be perceived
as it is not spoken by this speaker; and voice conversion system modifies a speaker’s voice,
such that it is perceived as being spoken by a target speaker.
This thesis mainly includes two sub-parts. The first part is to develop a low latency and low
complexity voice change system (i.e. includes frequency/pitch scale modification and formant
scale modification algorithms), which can be executed on the smartphones in 2012 with very
limited computational capability. Although some low-complexity voice change algorithms
have been proposed and studied, the real-time implementations are very rare. According to the
experimental results, the proposed voice change system achieves the same quality as the
baseline approach but requires much less computational complexity and satisfies the
requirement of real-time. Moreover, the proposed system has been implemented in C
language and was released as a commercial software application. The second part of this
thesis is to investigate a novel low-complexity voice conversion system (i.e. from a source
speaker A to a target speaker B) that improves the perceptual quality and identity without
introducing large processing latencies. The proposed scheme directly manipulates the
spectrum using an effective and physically motivated method – Continuous Frequency
Warping and Magnitude Scaling (CFWMS) to guarantee high perceptual naturalness and
quality. In addition, a trajectory limitation strategy is proposed to prevent the frame-by-frame
discontinuity to further enhance the speech quality. The experimental results show that the
proposed method outperforms the conventional baseline solutions in terms of either objective
tests or subjective tests
VARTOOLS: A Program for Analyzing Astronomical Time-Series Data
This paper describes the VARTOOLS program, which is an open-source
command-line utility, written in C, for analyzing astronomical time-series
data, especially light curves. The program provides a general-purpose set of
tools for processing light curves including signal identification, filtering,
light curve manipulation, time conversions, and modeling and simulating light
curves. Some of the routines implemented include the Generalized Lomb-Scargle
periodogram, the Box-Least Squares transit search routine, the Analysis of
Variance periodogram, the Discrete Fourier Transform including the CLEAN
algorithm, the Weighted Wavelet Z-Transform, light curve arithmetic, linear and
non-linear optimization of analytic functions including support for Markov
Chain Monte Carlo analyses with non-trivial covariances, characterizing and/or
simulating time-correlated noise, and the TFA and SYSREM filtering algorithms,
among others. A mechanism is also provided for incorporating a user's own
compiled processing routines into the program. VARTOOLS is designed especially
for batch processing of light curves, including built-in support for parallel
processing, making it useful for large time-domain surveys such as searches for
transiting planets. Several examples are provided to illustrate the use of the
program.Comment: 83 pages, 5 figures, accepted for publication in Astronomy and
Computing, code available at
http://www.astro.princeton.edu/~jhartman/vartools.htm
Prosody Modifications for Voice Conversion
Generally defined, speech modification is the process of changing certain perceptual properties of speech while
leaving other properties unchanged. Among the many types of speech information that may be altered are rate
of articulation, pitch and formant characteristics.Modifying the speech parameters like pitch, duration and strength
of excitation by desired factor is termed as prosody modification. In this thesis prosody modifications for voice
conversion framework are presented. Among all the speech modifications for prosody two things are important
firstly modification of duartion and pauses (Time scale modification) in a speech utterance and secondly
modification of the pitch(pitch scale modification).Prosody modification involves changing the pitch and duration
of speech without affecting the message and naturalness.In this work time scale and pitch scale modifications
of speech are discussed using two methods Time Domain Pitch Synchronous Overlapped-Add (TD-PSOLA) and
epoch based approach.In order to apply desired speech modifications TD-PSOLA discussed in this thesis works
directly on speech in time domian although there are many variations of TD-PSOLA.The epoch based approach
involves modifications of LP-residual
Musical timbre: bridging perception with semantics
Musical timbre is a complex and multidimensional entity which provides information regarding
the properties of a sound source (size, material, etc.). When it comes to music, however, timbre
does not merely carry environmental information, but it also conveys aesthetic meaning. In this
sense, semantic description of musical tones is used to express perceptual concepts related to
artistic intention. Recent advances in sound processing and synthesis technology have enabled
the production of unique timbral qualities which cannot be easily associated with a familiar
musical instrument. Therefore, verbal description of these qualities facilitates communication
between musicians, composers, producers, audio engineers etc. The development of a common
semantic framework for musical timbre description could be exploited by intuitive sound synthesis
and processing systems and could even influence the way in which music is being consumed.
This work investigates the relationship between musical timbre perception and its semantics.
A set of listening experiments in which participants from two different language groups (Greek
and English) rated isolated musical tones on semantic scales has tested semantic universality of
musical timbre. The results suggested that the salient semantic dimensions of timbre, namely:
luminance, texture and mass, are indeed largely common between these two languages. The relationship
between semantics and perception was further examined by comparing the previously
identified semantic space with a perceptual timbre space (resulting from pairwise dissimilarity
rating of the same stimuli). The two spaces featured a substantial amount of common variance
suggesting that semantic description can largely capture timbre perception. Additionally, the
acoustic correlates of the semantic and perceptual dimensions were investigated. This work concludes
by introducing the concept of partial timbre through a listening experiment that demonstrates
the influence of background white noise on the perception of musical tones. The results
show that timbre is a relative percept which is influenced by the auditory environment
Independent formant and pitch control applied to singing voice
Thesis (MScIng)--University of Stellenbosch, 2004.ENGLISH ABSTRACT: A singing voice can be manipulated artificially by means of a digital computer for the
purposes of creating new melodies or to correct existing ones. When the fundamental frequency
of an audio signal that represents a human voice is changed by simple algorithms,
the formants of the voice tend to move to new frequency locations, making it sound unnatural.
The main purpose is to design a technique by which the pitch and formants of a
singing voice can be controlled independently.AFRIKAANSE OPSOMMING: Onafhanklike formant- en toonhoogte beheer toegepas op ’n sangstem: ’n Sangstem kan
deur ’n digitale rekenaar gemanipuleer word om nuwe melodie¨e te skep, of om bestaandes
te verbeter. Wanneer die fundamentele frekwensie van ’n klanksein (wat ’n menslike stem
voorstel) deur ’n eenvoudige algoritme verander word, skuif die oorspronklike formante
na nuwe frekwensie gebiede. Dit veroorsaak dat die resultaat onnatuurlik klink. Die hoof
oogmerk is om ’n tegniek te ontwerp wat die toonhoogte en die formante van ’n sangstem
apart kan beheer
Models and analysis of vocal emissions for biomedical applications
This book of Proceedings collects the papers presented at the 3rd International Workshop on Models and Analysis of Vocal Emissions for Biomedical Applications, MAVEBA 2003, held 10-12 December 2003, Firenze, Italy. The workshop is organised every two years, and aims to stimulate contacts between specialists active in research and industrial developments, in the area of voice analysis for biomedical applications. The scope of the Workshop includes all aspects of voice modelling and analysis, ranging from fundamental research to all kinds of biomedical applications and related established and advanced technologies
- …