91 research outputs found
Improved compactly computable objective measures for predicting the acceptiability of speech communications systems
Issued as Monthly status reports [1-7], and Final report, Project no. E-21-61
Fixed-analysis adaptive-synthesis filter banks
Subband/Wavelet filter analysis-synthesis filters are a major component in many compression algorithms. Such compression algorithms have been applied to images, voice, and video. These algorithms have achieved high performance. Typically, the configuration for such compression algorithms involves a bank of analysis filters whose coefficients have been designed in advance to enable high quality reconstruction. The analysis system is then followed by subband quantization and decoding on the synthesis side. Decoding is performed using a corresponding set of synthesis filters and the subbands are merged together. For many years, there has been interest in improving the analysis-synthesis filters in order to achieve better coding quality. Adaptive filter banks have been explored by a number of authors where by the analysis filters and synthesis filters coefficients are changed dynamically in response to the input. A degree of performance improvement has been reported but this approach does require that the analysis system dynamically maintain synchronization with the synthesis system in order to perform reconstruction.
In this thesis, we explore a variant of the adaptive filter bank idea. We will refer to this approach as fixed-analysis adaptive-synthesis filter banks. Unlike the adaptive filter banks proposed previously, there is no analysis synthesis synchronization issue involved. This implies less coder complexity and more coder flexibility. Such an approach can be compatible with existing subband wavelet encoders. The design methodology and a performance analysis are presented.Ph.D.Committee Chair: Smith, Mark J. T.; Committee Co-Chair: Mersereau, Russell M.; Committee Member: Anderson, David; Committee Member: Lanterman, Aaron; Committee Member: Rosen, Gail; Committee Member: Wardi, Yora
Size discrimination of transient signals
The importance of spectral cues in size discrimination of transient signals was investigated, and a model for this ability, tAIM, was created based on the biological principles of human hearing. A psychophysics experiment involving 40 participants found that the most important cue for size discrimination of transient signals, created by striking different sizes of polystyrene spheres, was similar to that of speakers listening to vowels â the relative positions of the resonances between comparison signals. It was found possible to scale the sphere signals in order to confuse listeners into believing the signal source was a different size, but two methods of scaling signals in order to sound the same size as another proved inconclusive, suggesting the possibility that transient signals cannot be scaled in a linear fashion as has been shown possible for vowels. Filtering the signals in a number of different ways found that the most important cue in size discrimination of transient signals is the difference between the most prominent resonances available in the spectra of the comparison signals. A model of the auditory system using the dynamic compressive Gammachirp filterbank, and based on the well-known AIM, was created to produce auditory images of transient signals that could be normalised for size. Transient-AIM, or tAIM used the Mellin transform to produce images that showed size normalisation was possible due to the spectral envelope similarities across the sizes of the spheres. tAIM was extended to carry out size discrimination of the spheres using the information contained within the Mellin images. There was a systematic association between Mellin phase and size of objects of various shapes, which suggests that tAIM is able to infer object size from sound recordings of objects being struck
Glottal-synchronous speech processing
Glottal-synchronous speech processing is a field of speech science where the pseudoperiodicity
of voiced speech is exploited. Traditionally, speech processing involves segmenting
and processing short speech frames of predefined length; this may fail to exploit the inherent
periodic structure of voiced speech which glottal-synchronous speech frames have
the potential to harness. Glottal-synchronous frames are often derived from the glottal
closure instants (GCIs) and glottal opening instants (GOIs).
The SIGMA algorithm was developed for the detection of GCIs and GOIs from
the Electroglottograph signal with a measured accuracy of up to 99.59%. For GCI and
GOI detection from speech signals, the YAGA algorithm provides a measured accuracy
of up to 99.84%. Multichannel speech-based approaches are shown to be more robust to
reverberation than single-channel algorithms.
The GCIs are applied to real-world applications including speech dereverberation,
where SNR is improved by up to 5 dB, and to prosodic manipulation where the importance
of voicing detection in glottal-synchronous algorithms is demonstrated by subjective
testing. The GCIs are further exploited in a new area of data-driven speech modelling,
providing new insights into speech production and a set of tools to aid deployment into
real-world applications. The technique is shown to be applicable in areas of speech coding,
identification and artificial bandwidth extension of telephone speec
Audio watermarking techniques using singular value decomposition
In an increasingly digital world, proving ownership of files is more and more difficult. For audio files, many schemes have been put into place to attempt to protect the rights of the digital content owners. In general, these techniques fall under the classification of Digital Rights Management (DRM). Audio watermarking is one of the less invasive schemes which embeds security into the data itself instead of in an outside layer meant to encapsulate and protect the data. There are many domains in which an audio watermark can be applied. The simplest is that of the time domain; often, however, other domains may be more desirable due to greater imperceptibility and robustness to attack. Common domains include the frequency domain, or domains similar to frequency through functions such as the Wavelet Transform. One domain of particular interest is that of the Singular Value Decomposition. The goal of this thesis is to propose and test many different watermarking schemes as well as test an existing watermarking scheme operating in the SVD domain in order to assess the viability of the SVD as a watermarking carrier domain. Different carrier matrices as well as bit embedding methods are explored. The use of a standard set of audio files was used to help test the systems; a standard set of watermarking tests was unavailable, so a comparable test bed was implemented and utilized
Proceedings of the Second International Mobile Satellite Conference (IMSC 1990)
Presented here are the proceedings of the Second International Mobile Satellite Conference (IMSC), held June 17-20, 1990 in Ottawa, Canada. Topics covered include future mobile satellite communications concepts, aeronautical applications, modulation and coding, propagation and experimental systems, mobile terminal equipment, network architecture and control, regulatory and policy considerations, vehicle antennas, and speech compression
Designing sound : procedural audio research based on the book by Andy Farnell
In
procedural
media,
data
normally
acquired
by
measuring
something,
commonly
described
as
sampling,
is
replaced
by
a
set
of
computational
rules
(procedure)
that
defines
the
typical
structure
and/or
behaviour
of
that
thing.
Here,
a
general
approach
to
sound
as
a
definable
process,
rather
than
a
recording,
is
developed.
By
analysis
of
their
physical
and
perceptual
qualities,
natural
objects
or
processes
that
produce
sound
are
modelled
by
digital
Sounding
Objects
for
use
in
arts
and
entertainments.
This
Thesis
discusses
different
aspects
of
Procedural
Audio
introducing
several
new
approaches
and
solutions
to
this
emerging
field
of
Sound
Design.Em
Media
Procedimental,
os
dados
os
dados
normalmente
adquiridos
através
da
medição
de
algo
habitualmente
designado
como
amostragem,
sĂŁo
substituĂdos
por
um
conjunto
de
regras
computacionais
(procedimento)
que
definem
a
estrutura
tĂpica,
ou
comportamento,
desse
elemento.
Neste
caso
Ă©
desenvolvida
uma
abordagem
ao
som
definĂvel
como
um
procedimento
em
vez
de
uma
gravação.
Através
da
anĂĄlise
das
suas
caracterĂsticas
fĂsicas
e
perceptuais
,
objetos
naturais
ou
processos
que
produzem
som,
sĂŁo
modelados
como
objetos
sonoros
digitais
para
utilização
nas
Artes
e
Entretenimento.
Nesta
Tese
sĂŁo
discutidos
diferentes
aspectos
de
Ăudio
Procedimental,
sendo
introduzidas
vĂĄrias
novas
abordagens
e
soluçÔes
para
o
campo
emergente
do
Design
Sonoro
Offline and real time noise reduction in speech signals using the discrete wavelet packet decomposition
This thesis describes the development of an offline and real time wavelet based speech enhancement system to process speech corrupted with various amounts of white Gaussian noise and other different noise types
- âŠ