6 research outputs found

    Entwicklung einer Klassifikationsmethode zur akustischen Analyse fortlaufender Sprache unterschiedlicher StimmgĂŒte mittels Neuronaler Netze und deren Anwendung

    Get PDF
    Die akustische Analyse fortlaufender Sprache stellt bei der Beschreibung von Stimmstörungen eine wesentliche Erweiterung zur Analyse gehaltener Phonation dar. Zur Selektion stimmhafter Phoneme aus fortlaufender Sprache ist eine Klassifikationsmethode (vup) entwickelt worden, die eine Segmentierung des Sprachsignals in zusammenhĂ€ngende Bereiche stimmhafter und stimmloser Phonation sowie Pause mittels Neuronaler Netze (Multi-Layer Perceptron) ermöglicht. Auf Basis dieser Klassifikation ist das Göttinger Heiserkeits-Diagramm fĂŒr fortlaufende Sprache (GHDT) in Anlehnung an das Göttinger Heiserkeits-Diagramm fĂŒr gehaltene Phonation (GHD) entwickelt worden

    An Investigation of nonlinear speech synthesis and pitch modification techniques

    Get PDF
    Speech synthesis technology plays an important role in many aspects of man–machine interaction, particularly in telephony applications. In order to be widely accepted, the synthesised speech quality should be as human–like as possible. This thesis investigates novel techniques for the speech signal generation stage in a speech synthesiser, based on concepts from nonlinear dynamical theory. It focuses on natural–sounding synthesis for voiced speech, coupled with the ability to generate the sound at the required pitch. The one–dimensional voiced speech time–domain signals are embedded into an appropriate higher dimensional space, using Takens’ method of delays. These reconstructed state space representations have approximately the same dynamical properties as the original speech generating system and are thus effective models. A new technique for marking epoch points in voiced speech that operates in the state space domain is proposed. Using the fact that one revolution of the state space representation is equal to one pitch period, pitch synchronous points can be found using a Poincar®e map. Evidently the epoch pulses are pitch synchronous and therefore can be marked. The same state space representation is also used in a locally–linear speech synthesiser. This models the nonlinear dynamics of the speech signal by a series of local approximations, using the original signal as a template. The synthesised speech is natural–sounding because, rather than simply copying the original data, the technique makes use of the local dynamics to create a new, unique signal trajectory. Pitch modification within this synthesis structure is also investigated, with an attempt made to exploit the ˇ Silnikov–type orbit of voiced speech state space reconstructions. However, this technique is found to be incompatible with the locally–linear modelling technique, leaving the pitch modification issue unresolved. A different modelling strategy, using a radial basis function neural network to model the state space dynamics, is then considered. This produces a parametric model of the speech sound. Synthesised speech is obtained by connecting a delayed version of the network output back to the input via a global feedback loop. The network then synthesises speech in a free–running manner. Stability of the output is ensured by using regularisation theory when learning the weights. Complexity is also kept to a minimum because the network centres are fixed on a data–independent hyper–lattice, so only the linear–in–the–parameters weights need to be learnt for each vowel realisation. Pitch modification is again investigated, based around the idea of interpolating the weight vector between different realisations of the same vowel, but at differing pitch values. However modelling the inter–pitch weight vector variations is very difficult, indicating that further study of pitch modification techniques is required before a complete nonlinear synthesiser can be implemented

    Entwicklung und PrĂŒfung eines akustischen Verfahrens zur objektiven StimmgĂŒtebeurteilung pathologischer Stimmen

    Get PDF
    Das Heiserkeits-Diagramm ist eine grafische Darstellung der StimmqualitĂ€t in zwei Dimensionen. In der einen Richtung ist die IrregularitĂ€t und in der anderen Richtung der Rauschanteil der Stimme aufgetragen. Besonderer Wert wird darauf gelegt, dass sich jede gesunde und pathologische Stimme, auch solche mit schweren Stimmstörungen, in dem Diagramm darstellen lassen. Die Messung des Rauschanteils beruht auf dem neuen akustischen Maß Glottal to Noise Excitation Ratio (GNE), dass in dieser Arbeit entwickelt wird. GNE zeigt gegenĂŒber anderen Maßen, die den Rauschanteil messen, den großen Vorteil, dass er unabhĂ€ngig gegenĂŒber typischen IrregularitĂ€ten des Stimmsignals ist. Die Messung der IrregularitĂ€t geschieht durch drei akustische Maße: Zwei statistische Maße zur Beschreibung der PeriodenlĂ€ngenschwankung (Jitter) und der Energieschwankung (Shimmer) sowie den mittleren Korrelationswert von je zwei aufeinanderfolgenden Perioden. Die vier akustischen Maße des Heiserkeits-Diagramms wurden aus 22 Maßen nach statistischen Kriterien selektiert. Der Einfluss des Vokaltraktes auf Jitter und Shimmer wird untersucht und das Verfahren zur Messung der PeriodenlĂ€ngen auf die Tauglichkeit fĂŒr sehr unregelmĂ€ĂŸige Stimmen getestet. Eine Theorie fĂŒr den durch Jitter induzierten Shimmer wird hergeleitet, die sehr gut mit den Messungen ĂŒbereinstimmt. Vokale bilden ein spezielles Muster im Heiserkeits-Diagramm. Sechs Gruppen mit verschiedenen Phonationsmechanismen, darunter normale Stimmen und FlĂŒsterstimmen, werden im Heiserkeits-Diagramm signifikant voneinander unterscheiden. Im Anhang ist die StimmgĂŒteentwicklung von 48 Patienten zusammengestellt

    An algorithm for the measurement of jitter

    No full text
    info:eu-repo/semantics/publishe

    An algorithm for the measurement of jitter

    No full text
    Jitter is the small fluctuation from one glottis cycle to the next in the duration of the fundamental period of the voice source. Analyzing jitter requires measuring glottal cycle durations accurately. Generally speaking, this is carried out by sampling at a medium rate and interpolating the discretized signal to obtain the required time resolution. In this article we describe an algorithm which solves the following two signal processing problems. Firstly, signal samples obtained by interpolation are only estimates of the original samples, which are unknown. The quality of the reconstruction of the signal therefore has to be evaluated. Secondly, small variations in cycle durations are easily corrupted by noise and measurement errors. The magnitude of measurement errors therefore has to be gauged. In our algorithm, the quality of reconstruction by signal interpolation is evaluated by a statistical test which takes into account the distribution of the corrections (which are brought about by interpolation) to the positions of the signal events which mark the beginnings of the glottal cycles. Three different interpolation methods have been implemented. Measurement errors are controlled by estimating independently the cycle durations of the speech and the electroglottographic signals. When the series obtained from both signals agree, we may then conclude that they reflect vocal fold activity and that they have not been unduly corrupted by errors or noise. The algorithm has been tested on 77 signals produced by healthy and dysphonic subjects. Its performance was satisfactory on all counts. © 1991.SCOPUS: ar.jinfo:eu-repo/semantics/publishe
    corecore