Search CORE

53 research outputs found

Метод моделирования эффекта ухудшения частотного разрешения слуха у больных нейросенсорной тугоухостью

Author: M. I. Porhun
M. I. Vashkevich
М. И. Вашкевич
М. И. Порхун
Publication venue: 'United Institute of Informatics Problems of the National Academy of Sciences of Belarus'
Publication date: 30/09/2021
Field of study

A method for the simulation of reduced frequency resolution of the ear in patients with sensorineural hearing loss is proposed. The method is based upon the ability to adjust it according to the audiogram of a concrete person by frame-by-frame signal processing in the frequency domain. Simulation of the effect of the reduced frequency resolution of the ear is achieved by processing the components of amplitude spectrum of the original sound signal by the "smearing" function. The "smearing" function is formed from the amplitude-frequency characteristics of the auditory filters, which bandwidth is determined by the audiogram of the deaf person. The proposed method is implemented in the MATLAB. An experimental study of the effect of the reduced frequency resolution of the ear using the speech intelligibility test was conducted. The experiment involved 15 people who listened the records processed by the proposed method with various settings and noise conditions. Experimental data have shown that reduced frequency resolution of the ear leads to the deterioration in speech intelligibility, especially in the presence of background noise. Based on the answers of the participants of the experiment, the confusion tables of sounds were compiled, reflecting the fact of indistinguishability of sounds similar in frequency, that confirms the correctness of the proposed method.Предложен метод моделирования эффекта ухудшения частотного разрешения уха у больных нейросенсорной тугоухостью, особенностью которого является возможность его настройки по аудиограмме конкретного человека. Метод основан на пофреймовой обработке сигнала в частотной области. Моделирование эффекта ухудшения частотного разрешения уха достигается за счет обработки составляющих амплитудного спектра исходного звукового сигнала «размывающей» функцией. «Размывающая» функция формируется из амплитудно-частотных характеристик слуховых фильтров, ширина полос которых определяется исходя из аудиограммы тугоухого человека. Предложенный метод реализован в среде MATLAB. Проведено экспериментальное исследование влияния эффекта ухудшения частотного разрешения уха с использованием теста на разборчивость речи. В эксперименте участвовало 15 человек, которым давали прослушивать записи, обработанные предложенным методом с различными настроечными параметрами, в том числе с добавлением белого шума и без добавления. Экспериментальные данные показали, что ухудшение частотного разрешения уха приводит к ухудшению разборчивости речи, особенно сильному при наличии фонового шума. На основании ответов участников эксперимента составлены таблицы спутывания звуков, отражающие факт неразличимости схожих по частоте звуков, что подтверждает корректность работы предложенного метода

Informatics (E-Journal) / Информатика

Hearing correction method based on psychoacoustically motivated frequency transposition in a speech signal

Author: Porhun M. I.
Vashkevich M. I.
Вашкевич М. И.
Порхун М. И.
Publication venue: 'Belarusian State University of Informatics and Radioelectronics'
Publication date: 01/01/2020
Field of study

Целью работы являлась разработка метода обработки речевого сигнала для коррекции слуховых патологий на основе психоакустически обусловленного переноса высокочастотных составляющих спектра сигнала в низкочастотную область с последующим частотно-зависимым усилением. Для достижения поставленной цели были решены задачи, связанные с разработкой принципов переноса частот в речевом сигнале. Разработанный метод является адаптивным, его настройка осуществляется согласно аудиограмме тугоухого человека. Для переноса частот выбираются две частотных полосы: исходная (откуда производится перенос) и целевая (куда производится перенос). Ширина исходной частотной полосы фиксирована, а ширина целевой полосы выбирается адаптивно. Перенос спектра выполняется только для согласных звуков, восприятие которых тугоухими людьми затруднено. Классификация звуков по признаку гласный/согласный/пауза реализована на базе нейронной сети. В качестве информационных признаков выбирались: среднее число переходов через нуль, кратковременная энергия, кратковременная амплитуда, нормализованная автокорреляционная функция и первый спектральный момент. Чтобы сохранить максимально натуральное звучание переносимых звуков используется концепция равной громкости. Для компенсации ослабления восприятия звука тугоухим человеком используется частотно-зависимое усиление сигнала на основе аудиограммы. Эффективность предложенного метода проверена экспериментально с использованием моделирования эффекта потери слуха. В эксперименте учувствовали 10 человек, которым давали прослушивать записи, пропущенные через модель потери слуха, а также записи, прощенные через модель потери слуха с последующей коррекцией. Результаты показали, что применение разработанного метода коррекции слуха в среднем улучшает разборчивость речи на 6 %. The purpose of the work was to develop a speech signal processing method for the hearing pathologies correction based on psychoacoustically motivated transposition of high-frequency components of the signal spectrum to the low-frequency region with subsequent frequency-dependent amplification. To achieve this goal, several tasks related to the development of principles of frequency transposition in a speech signal were solved. The adjustment of the method is carried out according to the audiogram of a deaf person. For frequency transposition, source and target frequency bands are selected. The width of the source frequency band is fixed, while the width of the target band is adaptive. Spectrum transposition is performed only for consonants, the perception of which is more difficult for people with hearing loss. The classification of sounds (into vowel-consonant - pause classes) is implemented using one-layer neural network. The feature vector consists of: the zero crossing rate, short-term energy, short-term magnitude, normalized autocorrelation function and the first spectral moment. To preserve the naturalness of transposed sounds, the concept of equal loudness is used. To compensate for the attenuation in the perception of sound by a deaf person, a frequencydependent signal amplification based on an audiogram is used. The effectiveness of the proposed method was verified experimentally using hearing loss effect simulation. The experiment involved 10 people who were given to listen to the recordings passed through the hearing loss model, as well as recordings passed through the hearing loss model with subsequent correction (using proposed method). The results showed that using the proposed hearing correction method improves speech intelligibility on average by 6 %

Belarusian State University of Informatics and Radioelectronics Repository

МЕТОД КОРРЕКЦИИ СЛУХА НА ОСНОВЕ ПСИХОАКУСТИЧЕСКИ ОБУСЛОВЛЕННОГО ПЕРЕНОСА ЧАСТОТ В РЕЧЕВОМ СИГНАЛЕ

Author: M. I. Porhun
M. I. Vashkevich
М. И. Вашкевич
М. И. Порхун
Publication venue: 'Belarusian State University of Informatics and Radioelectronics'
Publication date: 06/03/2020
Field of study

The purpose of the work was to develop a speech signal processing method for the hearing pathologies correction based on psychoacoustically motivated transposition of high-frequency components of the signal spectrum to the low-frequency region with subsequent frequency-dependent amplification. To achieve this goal, several tasks related to the development of principles of frequency transposition in a speech signal were solved. The adjustment of the method is carried out according to the audiogram of a deaf person. For frequency transposition, source and target frequency bands are selected. The width of the source frequency band is fixed, while the width of the target band is adaptive. Spectrum transposition is performed only for consonants, the perception of which is more difficult for people with hearing loss. The classification of sounds (into vowel-consonant - pause classes) is implemented using one-layer neural network. The feature vector consists of: the zero crossing rate, short-term energy, short-term magnitude, normalized autocorrelation function and the first spectral moment. To preserve the naturalness of transposed sounds, the concept of equal loudness is used. To compensate for the attenuation in the perception of sound by a deaf person, a frequencydependent signal amplification based on an audiogram is used. The effectiveness of the proposed method was verified experimentally using hearing loss effect simulation. The experiment involved 10 people who were given to listen to the recordings passed through the hearing loss model, as well as recordings passed through the hearing loss model with subsequent correction (using proposed method). The results showed that using the proposed hearing correction method improves speech intelligibility on average by 6 %.Целью работы являлась разработка метода обработки речевого сигнала для коррекции слуховых патологий на основе психоакустически обусловленного переноса высокочастотных составляющих спектра сигнала в низкочастотную область с последующим частотно-зависимым усилением. Для достижения поставленной цели были решены задачи, связанные с разработкой принципов переноса частот в речевом сигнале. Разработанный метод является адаптивным, его настройка осуществляется согласно аудиограмме тугоухого человека. Для переноса частот выбираются две частотных полосы: исходная (откуда производится перенос) и целевая (куда производится перенос). Ширина исходной частотной полосы фиксирована, а ширина целевой полосы выбирается адаптивно. Перенос спектра выполняется только для согласных звуков, восприятие которых тугоухими людьми затруднено. Классификация звуков по признаку гласный/согласный/пауза реализована на базе нейронной сети. В качестве информационных признаков выбирались: среднее число переходов через нуль, кратковременная энергия, кратковременная амплитуда, нормализованная автокорреляционная функция и первый спектральный момент. Чтобы сохранить максимально натуральное звучание переносимых звуков используется концепция равной громкости. Для компенсации ослабления восприятия звука тугоухим человеком используется частотно-зависимое усиление сигнала на основе аудиограммы. Эффективность предложенного метода проверена экспериментально с использованием моделирования эффекта потери слуха. В эксперименте учувствовали 10 человек, которым давали прослушивать записи, пропущенные через модель потери слуха, а также записи, прощенные через модель потери слуха с последующей коррекцией. Результаты показали, что применение разработанного метода коррекции слуха в среднем улучшает разборчивость речи на 6 %

Доклады БГУИР

A method for simulation the effect of the reduced frequency resolution of the ear in patients with sensorineural hearing loss

Author: Porhun M. I.
Vashkevich M. I.
Вашкевич М. И.
Порхун М. И.
Publication venue: 'United Institute of Informatics Problems of the National Academy of Sciences of Belarus'
Publication date: 01/01/2021
Field of study

Предложен метод моделирования эффекта ухудшения частотного разрешения уха у больных нейросенсорной тугоухостью, особенностью которого является возможность его настройки по аудио-грамме конкретного человека. Метод основан на пофреймовой обработке сигнала в частотной области. Моделирование эффекта ухудшения частотного разрешения уха достигается за счет обработки составляющих амплитудного спектра исходного звукового сигнала «размывающей» функцией. «Размывающая» функция формируется из амплитудно-частотных характеристик слуховых фильтров, ширина полос которых определяется исходя из аудиограммы тугоухого человека. Предложенный метод реализован в среде MATLAB. Проведено экспериментальное исследование влияния эффекта ухудшения частотного разрешения уха с использованием теста на разборчивость речи. В эксперименте участвовало 15 человек, которым давали прослушивать записи, обработанные предложенным методом с различными настроечными параметрами, в том числе с добавлением белого шума и без добавления. Экспериментальные данные показали, что ухудшение частотного разрешения уха приводит к ухудшению разборчивости речи, особенно сильному при наличии фонового шума. На основании ответов участников эксперимента составлены таблицы спутывания звуков, отражающие факт неразличимости схожих по частоте звуков, что подтверждает корректность работы предложенного метода.A method for the simulation of reduced frequency resolution of the ear in patients with sensorineural hearing loss is proposed. The method is based upon the ability to adjust it according to the audiogram of a concrete person by frame-by-frame signal processing in the frequency domain. Simulation of the effect of the reduced frequency resolution of the ear is achieved by processing the components of amplitude spectrum of the original sound signal by the "smearing" function. The "smearing" function is formed from the amplitude-frequency characteristics of the auditory filters, which bandwidth is determined by the audiogram of the deaf person. The proposed method is implemented in the MATLAB. An experimental study of the effect of the reduced frequency resolution of the ear using the speech intelligibility test was conducted. The experiment involved 15 people who listened the records processed by the proposed method with various settings and noise conditions. Experimental data have shown that reduced frequency resolution of the ear leads to the deterioration in speech intelligibility, especially in the presence of background noise. Based on the answers of the participants of the experiment, the confusion tables of sounds were compiled, reflecting the fact of indistinguishability of sounds similar in frequency, that confirms the correctness of the proposed method

Belarusian State University of Informatics and Radioelectronics Repository

Instantaneous pitch estimation algorithm based on multirate sampling

Author: Azarov E. S.
Petrovsky A. A.
Vashkevich M. I.
Publication venue
Publication date: 01/01/2016
Field of study

The paper presents an algorithm for accurate pitch estimation that takes advantage of the sinusoidal model with instantaneous parameters. The algorithm decomposes the signal into subband components, extracts their instantaneous parameters and evaluates period candidate generating function (PCGF). In order to achieve high accuracy for low and high-pitched sounds it is assumed that possible pitch variation range is proportional to current pitch value. The bandwidths of the decomposition filters and length of the analysis frame are scaled for each period candidate by multirate sampling. The algorithm is compared to other widely used pitch extractors on artificial quasiperiodic signals and natural speech. The proposed algorithm shows a remarkable frequency and time resolution for pitch-modulated sounds and performs well both in clean and noisy conditions

Belarusian State University of Informatics and Radioelectronics Repository

Deep multi-scale face detector based on deep neural network

Author: Susha A. V.
Vashkevich M. I.
Вашкевич М. И.
Суша А. В.
Publication venue: Беспринт, РБ
Publication date: 01/01/2020
Field of study

Целью настоящей работы являлось проектирование глубокой искусственной нейронной сети для детектирования лиц. Основное внимание при проектировании было уделено обеспечению высокой производительности и уменьшению требуемых вычислительных затрат за счет: 1) факторизации операции свертки; 2) применения точечных сверток; 3) комбинирования поканальных и точечных сверток. Разработанный детектор сравнивался со схожими детекторами лиц, полученными на основе широко распространенных архитектур нейронных сетей MobileNet и NasNet. Предложенная архитектура детектора лиц имеет вычислительную сложность 5.1 MFLOPs, что в два раза меньше, чем у MobileNet (11.7 MFLOPs) и в четыре раза меньше, чем у NasNet (22 MFLOPs). Соответственно время детектирования на изображении 416×416 составило 5.12 мс (или 195 FPS) с видеокарты GeForce 1080 Ti, а также 65.4 мс (или 15 FPS) на одном ядре процессора Intel Core i7-8700K. При этом точность нашей архитектуры равна 85% и уступает MobileNet лишь на 4%, а NasNet – на 9.5%. The main objective of this work was a development of a deep artificial neural network for face detection purposes. The focus of its design was made on providing of the high performance of the detector and lowering of its computational power requirements by using: 1) factorization of convolution; 2) pointwise convolution; 3) combination of depthwise and pointwise convolution. The detector was compared with similar face detectors based on other well-known neural network architectures MobileNet and NasNet. The proposed face detector has a computational complexity equalling 5.1 MFLOPs, which is two times less than MobileNet’s one (11,7 MFLOPs) and four times less than NasNet’s one (22 MFLOPs). The detection time for 416 × 416 image was 5.12 ms (or 195 FPS) using GPU GeForce 1080 Ti, and 65.4 ms (or 15 FPS) using one processor core of Intel Core i7-8700K. The precision of our design is 85% and less on 4% than MobileNet has, and less on 9.5% than NasNet has

Belarusian State University of Informatics and Radioelectronics Repository

Исследование применимости ДПФ-модулированного банка фильтров в системах со значительным усилением спектральных составляющих

Author: M. I. Vashkevich
N. S. Sanko
М. И. Вашкевич
Н. С. Санько
Publication venue: 'Belarusian State University of Informatics and Radioelectronics'
Publication date: 29/09/2021
Field of study

The purpose of this article is to investigate the application of DFT-modulated filter bank in systems with significant spectral component amplification like hearing aid. There is a description of analysis / synthesis method based on short-time Fourier transform (STFT), which is used in most systems of speech information processing. It is shown that DFT-modulated filter bank is a generalization of STFT-method. In analysis / synthesis system based on DFT-modulated filter bank, the input signal is divided into subbands, passing through the analysis filter bank then each subband is amplified and the last step is to reconstruct the signal with synthesis filter bank. However, in digital systems with significant spectral component amplification, the resulting signal is distorted after reconstruction because of amplification factor difference in each subband. The article provides expressions for the distortion and the aliasing functions, allowing to estimate the distortion value, which appears in analysis / synthesis system of DFT-modulated filter bank. Efficient algorithms for calculating the distortion and the aliasing functions are also offered. In future it is planning to develop a procedure for optimizing the DFT-modulated filter bank based on the proposed efficient algorithms for calculating distortion and spectral aliasing in the filter bank.Целью работы является исследование применимости ДПФ-модулированного банка фильтров в системах, требующих значительного усиления спектральных составляющих, таких как слуховой аппарат. Приведено описание метода анализа / синтеза звукового сигнала на основе кратковременного преобразования Фурье (КВПФ), используемого в большинстве систем обработки речевой информации. Показано, что ДПФ-модулированный банк фильтров является обобщением метода обработки на основе КВПФ. В системе анализ / синтез на основе ДПФ-модулированного банка фильтров входной сигнал разделяется на субполосы, проходя через банк фильтров анализа, далее происходит усиление сигнала в каждой субполосе, конечным этапом является восстановление сигнала с помощью банка фильтров синтеза. Однако в цифровых системах со значительным усилением спектральных составляющих из-за разницы в коэффициентах усиления для каждой субполосы результирующий сигнал после восстановления имеет искажения. В работе приводятся выражения для функции искажений и функции спектрального наложения, позволяющие оценить величину искажений, возникающих в системе анализ / синтез ДПФ-модулированного банка фильтров. Также предложены эффективные алгоритмы расчета функции искажений и функции спектральных наложений. В дальнейшем планируется разработка процедуры оптимизации ДПФ-модулированного банка фильтров, основанной на предложенных эффективных алгоритмах расчета функций искажений и спектральных наложений банка фильтров

Доклады БГУИР

ЦИФРОВЫЕ БАНКИ ФИЛЬТРОВ ДЛЯ СОВРЕМЕННЫХ ЗАДАЧ ОБРАБОТКИ ЗВУКОВЫХ СИГНАЛОВ

Author: A. A. Petrovsky
I. S. Azarov
M. I. Vashkevich
А. А. Петровский
И. С. Азаров
М. И. Вашкевич
Publication venue: БГУИР
Publication date: 03/06/2019
Field of study

The paper reviews techniques of digital filter bank synthesis that can be applied for contemporary speech processing challenges. The paper describes practical experience of using digital filter banks in original systems of sound processing, namely, musical player with noise-aware audio enhancement and hearing aid application for a smartphone.В работе выполнен обзор способов синтеза цифровых банков фильтров, которые могут применяться для решения современных прикладных задач обработки звуковых сигналов. Описывается практический опыт использования цифровых банков фильтров в оригинальных системах обработки звука: музыкальном плеере с функцией повышения разборчивости звучания при прослушивании в шумной акустической обстановке, а также слуховом аппарате на базе смартфона

Доклады БГУИР

КЛИНИКО-ЭЛЕКТРОКАРДИОГРАФИЧЕСКИЕ СИНДРОМЫ, СВЯЗАННЫЕ С РИСКОМ РАЗВИТИЯ ВНЕЗАПНОЙ СЕРДЕЧНОЙ СМЕРТИ: ПАТОГЕНЕЗ, КЛИНИЧЕСКИЕ ПРОЯВЛЕНИЯ, ДИАГНОСТИЧЕСКИЕ КРИТЕРИИ, ПОКАЗАНИЯ К ПРОВЕДЕНИЮ ГЕНЕТИЧЕСКИХ ИССЛЕДОВАНИЙ, ЛЕЧЕНИЕ

Author: I. M. Kuzmina
M. A. Vashkevich
V. V. Rezvan
В. В. Резван
И. М. Кузьмина
М. А. Вашкевич
Publication venue: “N.V. Sklifosovsky Research Institute for Emergency Medicine”
Publication date: 28/03/2016
Field of study

PURPOSE. The problem of sudden cardiac death (SCD) is the most relevant in the modern cardiology , and if organic heart diseases exist, treatment strategy and prevention of SCD is developed , this problem is not solved in the patients without organic changes. Currently, a group of diseases, clinical and electrocardiographic syndromes, has emerged, that are closely associated with the formation of fatal arrhythmias. Special hazard of the course of these pathological conditions is due to the high risk of SCD, especially in young people. These diseases are not accompanied by structural changes in the myocardium and manifest themselves mainly by electrophysiological abnormalities in cardiomyocytes. Mutations in genes encoding ion channel proteins expressed in the myocardium, and their modulators, is the basis of these diseases. This fact is accounted for the unification of these diseases in the group of «channelopathies». The article presents the current diagnostic criteria for these diseases and treatments. In 2011 Guidelines of the European Society of Cardiology for genetic research in channelopathies and cardiomyopathies that have defined the indications for genetic research in this pathology, were issued. РЕЗЮМЕ. Проблема внезапной сердечной смерти (ВСС) считается наиболее актуальной в современной кардиологии, и если при наличии органических заболеваний сердца тактика лечения и профилактика развития ВСС разработана, то у лиц без органических изменений эта проблема не решена. В настоящее время выделилась группа заболеваний и клинико-электрокардиографических синдромов, тесно ассоциированных с формированием фатальных для жизни аритмий. Особая опасность течения данных патологических состояний обусловлена высоким риском ВСС, особенно у лиц молодого возраста. Данные заболевания не сопровождаются структурными изменениями миокарда и проявляются преимущественно электрофизиологическими нарушениями в кардиомиоците. В основе этих заболеваний лежат мутации генов, кодирующих белки ионных каналов, экспрессирующихся в миокарде, а также их модуляторов. Это стало основанием для объединения этих заболевании в группу «каналопатий». В статье представлены современные критерии диагностики данных заболеваний и методы лечения. В 2011 году вышли Рекомендации Европейского общества кардиологов по генетическим исследованиям при каналопатиях и кардиомиопатиях, которые определили показания к генетическим исследованиям при данной патологии.

Sklifosovsky Journal "Emergency Medical Care" / Журнал им. Н.В. Склифосовского «Неотложная медицинская помощь»

Система анализа и классификации голосового сигнала на основе пертрубационных параметров и кепстрального представления в психоакустических шкалах

Author: D. S. Likhachov
E. S. Azarov
M. I. Vashkevich
Д. С. Лихачёв
И. С. Азаров
М. И. Вашкевич
Publication venue: 'Belarusian State University of Informatics and Radioelectronics'
Publication date: 01/03/2022
Field of study

The paper describes an approach to design a system for analyzing and classification of a voice signal based on perturbation parameters and cepstral representation. Two variants of the cepstral representation of the voice signal are considered: based on mel-frequency cepstral coefficients (MFCC) and based on bark-frequency cepstral coefficients (BFCC). The work used a generally accepted approach to calculating the MFCC based on the time-frequency analysis by the method of discrete Fourier transform (DFT) with summation of energy in subbands. This method approximates the frequency resolution of human hearing, but has a fixed temporal resolution. As an alternative, a variant of the cepstral representation based on the BFCC has been proposed. When calculating the BFCC, a warped DFT-modulated filter bank was used, which approximates the frequency and temporal resolution of hearing. The aim of the work was to compare the effectiveness of the use of features based on the MFCC and BFCC for the designing systems for the analysis and classification of the voice signal. The results of the experiment showed that in the case when using acoustic features based on the MFCC, it is possible to obtain a voice classification system with an average recall of 80.6 %, and in the case when using features based on the BFCC, this metric is 83.7 %. With the addition of the set of MFCC features with perturbation parameters of the voice, the average recall of the classification increased to 94.1 %, with a similar addition to the set of BFCC features, the average recall of the classification increased up to 96.7 %.Описан подход к построению системы анализа и классификации голосового сигнала на основе пертурбационных параметров и кепстрального представления. Рассмотрены два варианта кепстрального представления голосового сигнала: при помощи мел-частотных кепстральных коэффициентов (МЧКК) и при помощи барк-частотных кепстральных коэффициентов (БЧКК). В работе использовался общепринятый подход к вычислению МЧКК на основе частотно-временного анализа методом дискретного преобразования Фурье (ДПФ) с объединением энергии в субполосах. Данный метод аппроксимирует частотное разрешение слуха человека, но имеет фиксированное временное разрешение. В качестве альтернативы предложен вариант кепстрального представления на основе БЧКК. При расчете БЧКК использовался неравнополосный ДПФ-модулированный банк фильтров, аппроксимирующий частотную и временную разрешающую способность слуха. Целью работы ставилось сравнение эффективности применения признаков на основе МЧКК и БЧКК для построения систем анализа и классификации голосового сигнала. Результаты эксперимента показали, что в случае использования акустических признаков на основе МЧКК можно получить систему классификации голоса со средней полнотой классификации 80,6 %, а в случае использовании признаков на основе БЧКК этот показатель равен 83,7 %. При дополнении набора МЧКК признаков пертурбационными параметрами голоса средняя полнота классификации повысилась до 94,1 %, при аналогичном дополнении набора БЧКК признаков средняя полнота классификации увеличилась до 96,7 %

Доклады БГУИР