4,842 research outputs found
Development of a Two-Level Warping Algorithm and Its Application to Speech Signal Processing
In many different fields there are signals that need to be aligned or “warped” in order to measure the similarity between them. When two time signals are compared, or when a pattern is sought in a larger stream of data, it may be necessary to warp one of the signals in a nonlinear way by compressing or stretching it to fit the other. Simple point-to-point comparison may give inadequate results, because one part of the signal might be comparing different relative parts of the other signal/pattern. Such cases need some sort of alignment todo the comparison. Dynamic Time Warping (DTW) is a powerful and widely used technique of time series analysis which performs such nonlinear warping in temporal domain. The work in this dissertation develops in two directions. The first direction is to extend the this dynamic time warping to produce a two-level dynamic warping algorithm, with warping in both temporal and spectral domains. While there have been hundreds of research efforts in the last two decades that have applied and used the one-dimensional warping process idea between time series, extending DTW method to two or more dimensions poses a more involved problem. The two-dimensional dynamic warping algorithm developed here for a variety of speech signal processing is ideally suited.
The second direction is focused on two speech signal applications. The First application is the evaluation of dysarthric speech. Dysarthria is a neurological motor speech disorder, which characterized by spectral and temporal degradation in speech production. Dysarthria management has focused primarily teaching patients to improve their ability to produce speech or strategies to compensate for their deficits. However, many individuals with dysarthria are not well-suited for traditional speaker-oriented intervention. Recent studies have shown that speech intelligibility can be improved by training the listener to better understand the degraded speech signal. A computer-based training tool was developed using a two-level dynamic warping algorithm to eventually be incorporated into a program that trains listeners to learn to imitate dysarthric speech by providing subjects with feedback about the accuracy of their imitation attempts during training.
The second application is voice transformation. Voice transformation techniques aims to modify a subject’s voice characteristics to make them sound like someone else, for example from a male speaker to female speaker. The approach taken here avoids the need to find acoustic parameters as many voice transformation methods do, and instead deals directly with spectral information. Based on the two-Level DW it is straightforward to map the source speech to target speech when both are available. The resulted spectral warping signal produced as described above introduces significant processing artifacts. Phase reconstruction was applied to the transformed signal to improve the quality of the final sound. Neural networks are trained to perform the voice transformation
DancingLines: An Analytical Scheme to Depict Cross-Platform Event Popularity
Nowadays, events usually burst and are propagated online through multiple
modern media like social networks and search engines. There exists various
research discussing the event dissemination trends on individual medium, while
few studies focus on event popularity analysis from a cross-platform
perspective. Challenges come from the vast diversity of events and media,
limited access to aligned datasets across different media and a great deal of
noise in the datasets. In this paper, we design DancingLines, an innovative
scheme that captures and quantitatively analyzes event popularity between
pairwise text media. It contains two models: TF-SW, a semantic-aware popularity
quantification model, based on an integrated weight coefficient leveraging
Word2Vec and TextRank; and wDTW-CD, a pairwise event popularity time series
alignment model matching different event phases adapted from Dynamic Time
Warping. We also propose three metrics to interpret event popularity trends
between pairwise social platforms. Experimental results on eighteen real-world
event datasets from an influential social network and a popular search engine
validate the effectiveness and applicability of our scheme. DancingLines is
demonstrated to possess broad application potentials for discovering the
knowledge of various aspects related to events and different media
A Novel Windowing Technique for Efficient Computation of MFCC for Speaker Recognition
In this paper, we propose a novel family of windowing technique to compute
Mel Frequency Cepstral Coefficient (MFCC) for automatic speaker recognition
from speech. The proposed method is based on fundamental property of discrete
time Fourier transform (DTFT) related to differentiation in frequency domain.
Classical windowing scheme such as Hamming window is modified to obtain
derivatives of discrete time Fourier transform coefficients. It has been
mathematically shown that the slope and phase of power spectrum are inherently
incorporated in newly computed cepstrum. Speaker recognition systems based on
our proposed family of window functions are shown to attain substantial and
consistent performance improvement over baseline single tapered Hamming window
as well as recently proposed multitaper windowing technique
On RG-flow and the Cosmological Constant
The AdS/CFT correspondence implies that the effective action of certain
strongly coupled large gauge theories satisfy the Hamilton-Jacobi equation
of 5d gravity. Using an analogy with the relativistic point particle, I
construct a low energy effective action that includes the Einstein action and
obeys a Callan-Symanzik-type RG-flow equation. It follows from the flow
equation that under quite general conditions the Einstein equations admit a
flat space-time solution, but other solutions with non-zero cosmological
constant are also allowed. I discuss the geometric interpretation of this
result in the context of warped compactifications.Comment: 11 pages, 1 figure, contribution to the proceedings of Strings '99,
misprint correcte
Scales and hierarchies in warped compactifications and brane worlds
Warped compactifications with branes provide a new approach to the hierarchy
problem and generate a diversity of four-dimensional thresholds. We investigate
the relationships between these scales, which fall into two classes.
Geometrical scales, such as thresholds for Kaluza-Klein, excited string, and
black hole production, are generically determined soley by the spacetime
geometry. Dynamical scales, notably the scale of supersymmetry breaking and
moduli masses, depend on other details of the model. We illustrate these
relationships in a class of solutions of type IIB string theory with imaginary
self-dual fluxes. After identifying the geometrical scales and the resulting
hierarchy, we determine the gravitino and moduli masses through explicit
dimensional reduction, and estimate their value to be near the four-dimensional
Planck scale. In the process we obtain expressions for the superpotential and
Kahler potential, including the effects of warping. We identify matter living
on certain branes to be effectively sequestered from the supersymmetry breaking
fluxes: specifically, such "visible sector" fields receive no tree-level masses
from the supersymmetry breaking. However, loop corrections are expected to
generate masses, at the phenomenologically viable TeV scale.Comment: 33 pages, LaTeX. v2: reference added v3: reference added, typos
correcte
Unifying Amplitude and Phase Analysis: A Compositional Data Approach to Functional Multivariate Mixed-Effects Modeling of Mandarin Chinese
Mandarin Chinese is characterized by being a tonal language; the pitch (or
) of its utterances carries considerable linguistic information. However,
speech samples from different individuals are subject to changes in amplitude
and phase which must be accounted for in any analysis which attempts to provide
a linguistically meaningful description of the language. A joint model for
amplitude, phase and duration is presented which combines elements from
Functional Data Analysis, Compositional Data Analysis and Linear Mixed Effects
Models. By decomposing functions via a functional principal component analysis,
and connecting registration functions to compositional data analysis, a joint
multivariate mixed effect model can be formulated which gives insights into the
relationship between the different modes of variation as well as their
dependence on linguistic and non-linguistic covariates. The model is applied to
the COSPRO-1 data set, a comprehensive database of spoken Taiwanese Mandarin,
containing approximately 50 thousand phonetically diverse sample contours
(syllables), and reveals that phonetic information is jointly carried by both
amplitude and phase variation.Comment: 49 pages, 13 figures, small changes to discussio
- …