Search CORE

6 research outputs found

Automated measures of dysphonias and the phonatory effects of asymmetries in the posterior larynx

Author: Vieira Maurilio Nunes
Publication venue: The University of Edinburgh
Publication date: 01/01/1997
Field of study

Entwicklung einer Klassifikationsmethode zur akustischen Analyse fortlaufender Sprache unterschiedlicher Stimmgüte mittels Neuronaler Netze und deren Anwendung

Author: Lessing Jan
Publication venue
Publication date: 17/07/2007
Field of study

Die akustische Analyse fortlaufender Sprache stellt bei der Beschreibung von Stimmstörungen eine wesentliche Erweiterung zur Analyse gehaltener Phonation dar. Zur Selektion stimmhafter Phoneme aus fortlaufender Sprache ist eine Klassifikationsmethode (vup) entwickelt worden, die eine Segmentierung des Sprachsignals in zusammenhängende Bereiche stimmhafter und stimmloser Phonation sowie Pause mittels Neuronaler Netze (Multi-Layer Perceptron) ermöglicht. Auf Basis dieser Klassifikation ist das Göttinger Heiserkeits-Diagramm für fortlaufende Sprache (GHDT) in Anlehnung an das Göttinger Heiserkeits-Diagramm für gehaltene Phonation (GHD) entwickelt worden

Georg-August-University Göttingen

An Investigation of nonlinear speech synthesis and pitch modification techniques

Author: Mann Iain
Publication venue: University of Edinburgh. College of Science and Engineering. School of Engineering and Electronics
Publication date: 01/06/2000
Field of study

Speech synthesis technology plays an important role in many aspects of man–machine interaction, particularly in telephony applications. In order to be widely accepted, the synthesised speech quality should be as human–like as possible. This thesis investigates novel techniques for the speech signal generation stage in a speech synthesiser, based on concepts from nonlinear dynamical theory. It focuses on natural–sounding synthesis for voiced speech, coupled with the ability to generate the sound at the required pitch. The one–dimensional voiced speech time–domain signals are embedded into an appropriate higher dimensional space, using Takens’ method of delays. These reconstructed state space representations have approximately the same dynamical properties as the original speech generating system and are thus effective models. A new technique for marking epoch points in voiced speech that operates in the state space domain is proposed. Using the fact that one revolution of the state space representation is equal to one pitch period, pitch synchronous points can be found using a Poincar´e map. Evidently the epoch pulses are pitch synchronous and therefore can be marked. The same state space representation is also used in a locally–linear speech synthesiser. This models the nonlinear dynamics of the speech signal by a series of local approximations, using the original signal as a template. The synthesised speech is natural–sounding because, rather than simply copying the original data, the technique makes use of the local dynamics to create a new, unique signal trajectory. Pitch modification within this synthesis structure is also investigated, with an attempt made to exploit the ˇ Silnikov–type orbit of voiced speech state space reconstructions. However, this technique is found to be incompatible with the locally–linear modelling technique, leaving the pitch modification issue unresolved. A different modelling strategy, using a radial basis function neural network to model the state space dynamics, is then considered. This produces a parametric model of the speech sound. Synthesised speech is obtained by connecting a delayed version of the network output back to the input via a global feedback loop. The network then synthesises speech in a free–running manner. Stability of the output is ensured by using regularisation theory when learning the weights. Complexity is also kept to a minimum because the network centres are fixed on a data–independent hyper–lattice, so only the linear–in–the–parameters weights need to be learnt for each vowel realisation. Pitch modification is again investigated, based around the idea of interpolating the weight vector between different realisations of the same vowel, but at differing pitch values. However modelling the inter–pitch weight vector variations is very difficult, indicating that further study of pitch modification techniques is required before a complete nonlinear synthesiser can be implemented

Edinburgh Research Archive

Entwicklung und Prüfung eines akustischen Verfahrens zur objektiven Stimmgütebeurteilung pathologischer Stimmen

Author: Michaelis Dirk
Publication venue
Publication date
Field of study

Das Heiserkeits-Diagramm ist eine grafische Darstellung der Stimmqualität in zwei Dimensionen. In der einen Richtung ist die Irregularität und in der anderen Richtung der Rauschanteil der Stimme aufgetragen. Besonderer Wert wird darauf gelegt, dass sich jede gesunde und pathologische Stimme, auch solche mit schweren Stimmstörungen, in dem Diagramm darstellen lassen. Die Messung des Rauschanteils beruht auf dem neuen akustischen Maß Glottal to Noise Excitation Ratio (GNE), dass in dieser Arbeit entwickelt wird. GNE zeigt gegenüber anderen Maßen, die den Rauschanteil messen, den großen Vorteil, dass er unabhängig gegenüber typischen Irregularitäten des Stimmsignals ist. Die Messung der Irregularität geschieht durch drei akustische Maße: Zwei statistische Maße zur Beschreibung der Periodenlängenschwankung (Jitter) und der Energieschwankung (Shimmer) sowie den mittleren Korrelationswert von je zwei aufeinanderfolgenden Perioden. Die vier akustischen Maße des Heiserkeits-Diagramms wurden aus 22 Maßen nach statistischen Kriterien selektiert. Der Einfluss des Vokaltraktes auf Jitter und Shimmer wird untersucht und das Verfahren zur Messung der Periodenlängen auf die Tauglichkeit für sehr unregelmäßige Stimmen getestet. Eine Theorie für den durch Jitter induzierten Shimmer wird hergeleitet, die sehr gut mit den Messungen übereinstimmt. Vokale bilden ein spezielles Muster im Heiserkeits-Diagramm. Sechs Gruppen mit verschiedenen Phonationsmechanismen, darunter normale Stimmen und Flüsterstimmen, werden im Heiserkeits-Diagramm signifikant voneinander unterscheiden. Im Anhang ist die Stimmgüteentwicklung von 48 Patienten zusammengestellt

Georg-August-University Göttingen

An algorithm for the measurement of jitter

Author: De Guchteneere Raoul
Schoentgen Jean
Publication venue: Institut de Phonétique, Brussels
Publication date: 01/01/1990
Field of study

info:eu-repo/semantics/publishe

DI-fusion

An algorithm for the measurement of jitter

Author: De Guchteneere Raoul
Schoentgen Jean
Publication venue
Publication date: 01/01/1990
Field of study

Jitter is the small fluctuation from one glottis cycle to the next in the duration of the fundamental period of the voice source. Analyzing jitter requires measuring glottal cycle durations accurately. Generally speaking, this is carried out by sampling at a medium rate and interpolating the discretized signal to obtain the required time resolution. In this article we describe an algorithm which solves the following two signal processing problems. Firstly, signal samples obtained by interpolation are only estimates of the original samples, which are unknown. The quality of the reconstruction of the signal therefore has to be evaluated. Secondly, small variations in cycle durations are easily corrupted by noise and measurement errors. The magnitude of measurement errors therefore has to be gauged. In our algorithm, the quality of reconstruction by signal interpolation is evaluated by a statistical test which takes into account the distribution of the corrections (which are brought about by interpolation) to the positions of the signal events which mark the beginnings of the glottal cycles. Three different interpolation methods have been implemented. Measurement errors are controlled by estimating independently the cycle durations of the speech and the electroglottographic signals. When the series obtained from both signals agree, we may then conclude that they reflect vocal fold activity and that they have not been unduly corrupted by errors or noise. The algorithm has been tested on 77 signals produced by healthy and dysphonic subjects. Its performance was satisfactory on all counts. © 1991.SCOPUS: ar.jinfo:eu-repo/semantics/publishe

DI-fusion