61 research outputs found
High-resolution sinusoidal analysis for resolving harmonic collisions in music audio signal processing
Many music signals can largely be considered an additive combination of
multiple sources, such as musical instruments or voice. If the musical sources
are pitched instruments, the spectra they produce are predominantly harmonic,
and are thus well suited to an additive sinusoidal model. However,
due to resolution limits inherent in time-frequency analyses, when the harmonics
of multiple sources occupy equivalent time-frequency regions, their
individual properties are additively combined in the time-frequency representation
of the mixed signal. Any such time-frequency point in a mixture
where multiple harmonics overlap produces a single observation from which
the contributions owed to each of the individual harmonics cannot be trivially
deduced. These overlaps are referred to as overlapping partials or harmonic
collisions. If one wishes to infer some information about individual sources in
music mixtures, the information carried in regions where collided harmonics
exist becomes unreliable due to interference from other sources. This interference
has ramifications in a variety of music signal processing applications
such as multiple fundamental frequency estimation, source separation, and
instrumentation identification.
This thesis addresses harmonic collisions in music signal processing applications.
As a solution to the harmonic collision problem, a class of signal
subspace-based high-resolution sinusoidal parameter estimators is explored.
Specifically, the direct matrix pencil method, or equivalently, the Estimation
of Signal Parameters via Rotational Invariance Techniques (ESPRIT)
method, is used with the goal of producing estimates of the salient parameters
of individual harmonics that occupy equivalent time-frequency regions. This
estimation method is adapted here to be applicable to time-varying signals
such as musical audio. While high-resolution methods have been previously
explored in the context of music signal processing, previous work has not
addressed whether or not such methods truly produce high-resolution sinusoidal parameter estimates in real-world music audio signals. Therefore, this
thesis answers the question of whether high-resolution sinusoidal parameter
estimators are really high-resolution for real music signals.
This work directly explores the capabilities of this form of sinusoidal parameter
estimation to resolve collided harmonics. The capabilities of this
analysis method are also explored in the context of music signal processing
applications. Potential benefits of high-resolution sinusoidal analysis are
examined in experiments involving multiple fundamental frequency estimation
and audio source separation. This work shows that there are indeed
benefits to high-resolution sinusoidal analysis in music signal processing applications,
especially when compared to methods that produce sinusoidal
parameter estimates based on more traditional time-frequency representations.
The benefits of this form of sinusoidal analysis are made most evident
in multiple fundamental frequency estimation applications, where substantial
performance gains are seen. High-resolution analysis in the context of
computational auditory scene analysis-based source separation shows similar
performance to existing comparable methods
Proceedings of the 7th Sound and Music Computing Conference
Proceedings of the SMC2010 - 7th Sound and Music Computing Conference, July 21st - July 24th 2010
Proceedings of the Mobile Satellite Conference
A satellite-based mobile communications system provides voice and data communications to mobile users over a vast geographic area. The technical and service characteristics of mobile satellite systems (MSSs) are presented and form an in-depth view of the current MSS status at the system and subsystem levels. Major emphasis is placed on developments, current and future, in the following critical MSS technology areas: vehicle antennas, networking, modulation and coding, speech compression, channel characterization, space segment technology and MSS experiments. Also, the mobile satellite communications needs of government agencies are addressed, as is the MSS potential to fulfill them
マルチモーダル音声対話システムでの先進的コミュニケーションのためのユーザ状態推定
Tohoku University伊藤彰則課
Proceedings of the Second International Mobile Satellite Conference (IMSC 1990)
Presented here are the proceedings of the Second International Mobile Satellite Conference (IMSC), held June 17-20, 1990 in Ottawa, Canada. Topics covered include future mobile satellite communications concepts, aeronautical applications, modulation and coding, propagation and experimental systems, mobile terminal equipment, network architecture and control, regulatory and policy considerations, vehicle antennas, and speech compression
Statistical parametric speech synthesis using conversational data and phenomena
Statistical parametric text-to-speech synthesis currently relies on predefined and highly
controlled prompts read in a “neutral” voice. This thesis presents work on utilising
recordings of free conversation for the purpose of filled pause synthesis and as an
inspiration for improved general modelling of speech for text-to-speech synthesis purposes.
A corpus of both standard prompts and free conversation is presented and the
potential usefulness of conversational speech as the basis for text-to-speech voices
is validated. Additionally, through psycholinguistic experimentation it is shown that
filled pauses can have potential subconscious benefits to the listener but that current
text-to-speech voices cannot replicate these effects. A method for pronunciation variant
forced alignment is presented in order to obtain a more accurate automatic speech
segmentation something which is particularly bad for spontaneously produced speech.
This pronunciation variant alignment is utilised not only to create a more accurate underlying
acoustic model, but also as the driving force behind creating more natural
pronunciation prediction at synthesis time. While this improves both the standard and
spontaneous voices the naturalness of spontaneous speech based voices still lags behind
the quality of voices based on standard read prompts. Thus, the synthesis of filled
pauses is investigated in relation to specific phonetic modelling of filled pauses and
through techniques for the mixing of standard prompts with spontaneous utterances in
order to retain the higher quality of standard speech based voices while still utilising
the spontaneous speech for filled pause modelling. A method for predicting where to
insert filled pauses in the speech stream is also developed and presented, relying on
an analysis of human filled pause usage and a mix of language modelling methods.
The method achieves an insertion accuracy in close agreement with human usage. The
various approaches are evaluated and their improvements documented throughout the
thesis, however, at the end the resulting filled pause quality is assessed through a repetition
of the psycholinguistic experiments and an evaluation of the compilation of all
developed methods
- …