Search CORE

2 research outputs found

Метод підвищення інформативності частоти основного тону в задачах ідентифікації мовця

Author: Бєлозьорова Я. А.
Publication venue: 'Ukrainian National Forestry University'
Publication date: 27/02/2023
Field of study

The paper deals with the issue of increasing the accuracy of speaker identification due to the analysis of the physical nature of the formation of speech signal features and the mathematical description of the signal structure. In the course of research, the person identification process was reviewed, which enabled concluding that the signal should contain constant self-similar structures that are formed during the speech of each specific person. Similarity of structures to themselves is possible due to their scaling in language fragments. On the basis of the conducted analysis, the main parameters of the description of the individual features of the speaker's voice in the form of the frequency of the main tone and the spectral characteristics of the speech signal were determined. The conducted review of methods for determining the frequency of the fundamental tone allowed highlighting the direction of improving the accuracy of speaker identification due to a more accurate mathematical description of the unique features of the speech signal. Moreover, fractal and wavelet analysis is found to be the most successful tool for detecting self-similar structures. The use of the complex Morlet wavelet for describing the speech signal is substantiated. In the form of an estimate of the frequency of the main tone of the speech signal, the work considers the distances between the local frequency maxima of the scalograms. An important factor in the stability and reliability of estimates of the frequency of the fundamental tone for this method is the possibility of estimating the frequency of the fundamental tone not only by local maxima, but also by correlation between fragments of the regions of maxima. On the basis of wavelet transformation and multifractal spectrum, an algorithm for identifying the characteristics of self-similar structures inherent in the speaker is proposed, and the developed speech signal processing methods allow using them to build speech signal identification systems and creating intelligent user-computer interaction systems. Based on the algorithm, a method of increasing the informativeness of the fundamental tone frequency for person’s speech identification is proposed, in which the value of the wavelet transformation coefficients on segments of the speech signal unlike the existing ones, where the extremes of the correlation functions of the fundamental tone frequency are observed, are used as a feature for recognition. The analysis of the accuracy of the proposed method showed a sufficient level of its effectiveness for use.Розглянуто питання підвищення точності ідентифікації мовця завдяки аналізу фізичної природи формування особливостей мовного сигналу та математичного опису структури сигналу. Здійснено огляд процесу ідентифікації особи, зроблено висновок, що в сигналі мають бути постійні самоподібні структури, які формуються під час мовлення кожної конкретної особи. Подібність структур самим собі можлива за рахунок їхнього масштабування в мовних фрагментах. На підставі проведеного аналізу визначено основні параметри опису індивідуальних особливостей голосу мовця у вигляді частоти основного тону та спектральних характеристик мовного сигналу. Проведений огляд методів визначення частоти основного тону дав змогу виділити напрями поліпшення точності ідентифікації мовця внаслідок точнішого математичного опису унікальних ознак мовного сигналу. Під час аналізу зроблено висновок, що найвдалішим інструментом виявлення самоподібних структур можна вважати фрактальний та вейвлет-аналіз. Обґрунтовано використання комплексного вейвлету Морле для опису мовного сигналу. У вигляді оцінки частоти основного тону мовного сигналу в роботі розглянуто відстані між локальними частотними максимумами скейлограм. Важливим фактором стійкості і достовірності оцінок частоти основного тону для цього методу є можливість оцінки частоти основного тону не тільки по локальних максимумах, але і по кореляції між фрагментами областей максимумів. На підставі вейвлет-перетворення та мультифрактального спектра запропоновано алгоритм виділення характеристик самоподібних структур, притаманних мовцю, та розроблені методи оброблення мовного сигналу дають змогу використовувати їх для побудови систем ідентифікації мовного сигналу та для створення інтелектуальних систем взаємодії користувача й комп'ютера. На підставі алгоритму запропоновано метод підвищення інформативності частоти основного тону для мовної ідентифікації особи, в якому, на відміну від наявних, за ознаку для розпізнавання використано значення коефіцієнтів вейвлет-перетворення на відрізках мовного сигналу, де спостерігаються екстремуми кореляційних функцій частоти основного тону. Проведений аналіз точності запропонованого методу показав достатній для використання рівень його ефективності

Scientific Bulletin of UNFU (Ukrainian National Forestry University)

Speaker change detection using fundamental frequency with application to multi-talker segmentation

Author: Evers Christine
Hogg Aidan
Naylor Patrick A.
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/02/2019
Field of study

This paper shows that time varying pitch properties can be used advantageously within the segmentation step of a multi-talker diarization system. First a study is conducted to verify that changes in pitch are strong indicators of changes in the speaker. It is then highlighted that an individual's pitch is smoothly varying and, therefore, can be predicted by means of a Kalman filter. Subsequently it is shown that if the pitch is not predictable then this is most likely due to a change in the speaker. Finally, a novel system is proposed that uses this approach of pitch prediction for speaker change detection. This system is then evaluated against a commonly used MFCC segmentation system. The proposed system is shown to increase the speaker change detection rate from 43.3% to 70.5% on meetings in the AMI corpus. Therefore, there are two equally weighted contributions in this paper: 1. We address the question of whether a change in pitch is a reliable estimator of a speaker change in multi-talk meeting audio. 2. We develop a method to extract such speaker changes and test them on a widely available meeting corpus.<br/

Southampton (e-Prints Soton)

Crossref

Spiral - Imperial College Digital Repository