6 research outputs found
Entwicklung einer Klassifikationsmethode zur akustischen Analyse fortlaufender Sprache unterschiedlicher StimmgĂŒte mittels Neuronaler Netze und deren Anwendung
Die akustische Analyse fortlaufender Sprache stellt bei der Beschreibung von Stimmstörungen eine wesentliche Erweiterung zur Analyse gehaltener Phonation dar. Zur Selektion stimmhafter Phoneme aus fortlaufender Sprache ist eine Klassifikationsmethode (vup) entwickelt worden, die eine Segmentierung des Sprachsignals in zusammenhĂ€ngende Bereiche stimmhafter und stimmloser Phonation sowie Pause mittels Neuronaler Netze (Multi-Layer Perceptron) ermöglicht. Auf Basis dieser Klassifikation ist das Göttinger Heiserkeits-Diagramm fĂŒr fortlaufende Sprache (GHDT) in Anlehnung an das Göttinger Heiserkeits-Diagramm fĂŒr gehaltene Phonation (GHD) entwickelt worden
An Investigation of nonlinear speech synthesis and pitch modification techniques
Speech synthesis technology plays an important role in many aspects of manâmachine interaction,
particularly in telephony applications. In order to be widely accepted, the synthesised
speech quality should be as humanâlike as possible. This thesis investigates novel techniques
for the speech signal generation stage in a speech synthesiser, based on concepts from nonlinear
dynamical theory. It focuses on naturalâsounding synthesis for voiced speech, coupled with the
ability to generate the sound at the required pitch.
The oneâdimensional voiced speech timeâdomain signals are embedded into an appropriate
higher dimensional space, using Takensâ method of delays. These reconstructed state space
representations have approximately the same dynamical properties as the original speech generating
system and are thus effective models.
A new technique for marking epoch points in voiced speech that operates in the state space
domain is proposed. Using the fact that one revolution of the state space representation is equal
to one pitch period, pitch synchronous points can be found using a PoincarÂŽe map. Evidently the
epoch pulses are pitch synchronous and therefore can be marked.
The same state space representation is also used in a locallyâlinear speech synthesiser. This
models the nonlinear dynamics of the speech signal by a series of local approximations, using
the original signal as a template. The synthesised speech is naturalâsounding because, rather
than simply copying the original data, the technique makes use of the local dynamics to create
a new, unique signal trajectory. Pitch modification within this synthesis structure is also investigated,
with an attempt made to exploit the Ë Silnikovâtype orbit of voiced speech state space
reconstructions. However, this technique is found to be incompatible with the locallyâlinear
modelling technique, leaving the pitch modification issue unresolved.
A different modelling strategy, using a radial basis function neural network to model the state
space dynamics, is then considered. This produces a parametric model of the speech sound.
Synthesised speech is obtained by connecting a delayed version of the network output back to
the input via a global feedback loop. The network then synthesises speech in a freeârunning
manner. Stability of the output is ensured by using regularisation theory when learning the
weights. Complexity is also kept to a minimum because the network centres are fixed on a
dataâindependent hyperâlattice, so only the linearâinâtheâparameters weights need to be learnt
for each vowel realisation. Pitch modification is again investigated, based around the idea of
interpolating the weight vector between different realisations of the same vowel, but at differing
pitch values. However modelling the interâpitch weight vector variations is very difficult, indicating
that further study of pitch modification techniques is required before a complete nonlinear
synthesiser can be implemented
Entwicklung und PrĂŒfung eines akustischen Verfahrens zur objektiven StimmgĂŒtebeurteilung pathologischer Stimmen
Das Heiserkeits-Diagramm ist eine grafische Darstellung der StimmqualitĂ€t in zwei Dimensionen. In der einen Richtung ist die IrregularitĂ€t und in der anderen Richtung der Rauschanteil der Stimme aufgetragen. Besonderer Wert wird darauf gelegt, dass sich jede gesunde und pathologische Stimme, auch solche mit schweren Stimmstörungen, in dem Diagramm darstellen lassen. Die Messung des Rauschanteils beruht auf dem neuen akustischen MaĂ Glottal to Noise Excitation Ratio (GNE), dass in dieser Arbeit entwickelt wird. GNE zeigt gegenĂŒber anderen MaĂen, die den Rauschanteil messen, den groĂen Vorteil, dass er unabhĂ€ngig gegenĂŒber typischen IrregularitĂ€ten des Stimmsignals ist. Die Messung der IrregularitĂ€t geschieht durch drei akustische MaĂe: Zwei statistische MaĂe zur Beschreibung der PeriodenlĂ€ngenschwankung (Jitter) und der Energieschwankung (Shimmer) sowie den mittleren Korrelationswert von je zwei aufeinanderfolgenden Perioden. Die vier akustischen MaĂe des Heiserkeits-Diagramms wurden aus 22 MaĂen nach statistischen Kriterien selektiert. Der Einfluss des Vokaltraktes auf Jitter und Shimmer wird untersucht und das Verfahren zur Messung der PeriodenlĂ€ngen auf die Tauglichkeit fĂŒr sehr unregelmĂ€Ăige Stimmen getestet. Eine Theorie fĂŒr den durch Jitter induzierten Shimmer wird hergeleitet, die sehr gut mit den Messungen ĂŒbereinstimmt. Vokale bilden ein spezielles Muster im Heiserkeits-Diagramm. Sechs Gruppen mit verschiedenen Phonationsmechanismen, darunter normale Stimmen und FlĂŒsterstimmen, werden im Heiserkeits-Diagramm signifikant voneinander unterscheiden. Im Anhang ist die StimmgĂŒteentwicklung von 48 Patienten zusammengestellt
An algorithm for the measurement of jitter
Jitter is the small fluctuation from one glottis cycle to the next in the duration of the fundamental period of the voice source. Analyzing jitter requires measuring glottal cycle durations accurately. Generally speaking, this is carried out by sampling at a medium rate and interpolating the discretized signal to obtain the required time resolution. In this article we describe an algorithm which solves the following two signal processing problems. Firstly, signal samples obtained by interpolation are only estimates of the original samples, which are unknown. The quality of the reconstruction of the signal therefore has to be evaluated. Secondly, small variations in cycle durations are easily corrupted by noise and measurement errors. The magnitude of measurement errors therefore has to be gauged. In our algorithm, the quality of reconstruction by signal interpolation is evaluated by a statistical test which takes into account the distribution of the corrections (which are brought about by interpolation) to the positions of the signal events which mark the beginnings of the glottal cycles. Three different interpolation methods have been implemented. Measurement errors are controlled by estimating independently the cycle durations of the speech and the electroglottographic signals. When the series obtained from both signals agree, we may then conclude that they reflect vocal fold activity and that they have not been unduly corrupted by errors or noise. The algorithm has been tested on 77 signals produced by healthy and dysphonic subjects. Its performance was satisfactory on all counts. © 1991.SCOPUS: ar.jinfo:eu-repo/semantics/publishe