134 research outputs found
AN EFFICIENT SPEECH GENERATIVE MODEL BASED ON DETERMINISTIC/STOCHASTIC SEPARATION OF SPECTRAL ENVELOPES
The paper presents a speech generative model that provides an efficient way of generating speech waveform from its amplitude spectral envelopes. The model is based on hybrid speech representation that includes deterministic (harmonic) and stochastic (noise) components. The main idea behind the approach originates from the fact that speech signal has a determined spectral structure that is statistically bound with deterministic/stochastic energy distribution in the spectrum. The performance of the model is evaluated using an experimental low-bitrate wide-band speech coder. The quality of reconstructed speech is evaluated using objective and subjective methods. Two objective quality characteristics were calculated: Modified Bark Spectral Distortion (MBSD) and Perceptual Evaluation of Speech Quality (PESQ). Narrow-band and wide-band versions of the proposed solution were compared with MELP (Mixed Excitation Linear Prediction) speech coder and AMR (Adaptive Multi-Rate) speech coder, respectively. The speech base of two female and two male speakers were used for testing. The performed tests show that overall performance of the proposed approach is speaker-dependent and it is better for male voices. Supposedly, this difference indicates the influence of pitch highness on separation accuracy. In that way, using the proposed approach in experimental speech compression system provides decent MBSD values and comparable PESQ values with AMR speech coder at 6,6 kbit/s. Additional subjective listening testsdemonstrate that the implemented coding system retains phonetic content and speaker’s identity. It proves consistency of the proposed approach.The paper presents a speech generative model that provides an efficient way of generating speech waveform from its amplitude spectral envelopes. The model is based on hybrid speech representation that includes deterministic (harmonic) and stochastic (noise) components. The main idea behind the approach originates from the fact that speech signal has a determined spectral structure that is statistically bound with deterministic/stochastic energy distribution in the spectrum. The performance of the model is evaluated using an experimental low-bitrate wide-band speech coder. The quality of reconstructed speech is evaluated using objective and subjective methods. Two objective quality characteristics were calculated: Modified Bark Spectral Distortion (MBSD) and Perceptual Evaluation of Speech Quality (PESQ). Narrow-band and wide-band versions of the proposed solution were compared with MELP (Mixed Excitation Linear Prediction) speech coder and AMR (Adaptive Multi-Rate) speech coder, respectively. The speech base of two female and two male speakers were used for testing. The performed tests show that overall performance of the proposed approach is speaker-dependent and it is better for male voices. Supposedly, this difference indicates the influence of pitch highness on separation accuracy. In that way, using the proposed approach in experimental speech compression system provides decent MBSD values and comparable PESQ values with AMR speech coder at 6,6 kbit/s. Additional subjective listening testsdemonstrate that the implemented coding system retains phonetic content and speaker’s identity. It proves consistency of the proposed approach
Prolonged repolarization in the early phase of ischemia is associated with ventricular fibrillation development in a porcine model
Background: Repolarization prolongation can be the earliest electrophysiological change in ischemia, but its role in arrhythmogenesis is unclear. The aim of the present study was to evaluate the early ischemic action potential duration (APD) prolongation concerning its causes, expression in ECG and association with early ischemic ventricular fibrillation (phase 1A VF).Methods: Coronary occlusion was induced in 18 anesthetized pigs, and standard 12 lead ECG along with epicardial electrograms were recorded. Local activation time (AT), end of repolarization time (RT), and activation-repolarization interval (ARIc) were determined as dV/dt minimum during QRS-complex, dV/dt maximum during T-wave, and rate-corrected RT–AT differences, respectively. Patch-clamp studies were done in enzymatically isolated porcine cardiomyocytes. IK(ATP) activation and Ito1 inhibition were tested as possible causes of the APD change.Results: During the initial period of ischemia, a total of 11 pigs demonstrated maximal ARIc prolongation >10 ms at 1 and/or 2.5 min of occlusion (8 and 6 cases at 1 and 2.5 min, respectively) followed by typical ischemic ARIc shortening. The maximal ARIc across all leads was associated with VF development (OR 1.024 95% CI 1.003–1.046, p = 0.025) and maximal rate-corrected QT interval (QTc) (B 0.562 95% CI 0.346–0.775, p < 0.001) in logistic and linear regression analyses, respectively. Phase 1A VF incidence was associated with maximal QTc at the 2.5 min of occlusion in ROC curve analysis (AUC 0.867, p = 0.028) with optimal cut-off 456 ms (sensitivity 1.00, specificity 0.778). The pigs having maximal QTc at 2.5 min more and less than 450 ms significantly differed in phase 1A VF incidence in Kaplan-Meier analysis (log-rank p = 0.007). In the patch-clamp experiments, 4-aminopyridine did not produce any effects on the APD; however, pinacidil activated IK(ATP) and caused a biphasic change in the APD with initial prolongation and subsequent shortening.Conclusion: The transiently prolonged repolarization during the initial period of acute ischemia was expressed in the prolongation of the maximal QTc interval in the body surface ECG and was associated with phase 1A VF. IK(ATP) activation in the isolated cardiomyocytes reproduced the biphasic repolarization dynamics observed in vivo, which suggests the probable role of IK(ATP) in early ischemic arrhythmogenesis
Instantaneous pitch estimation algorithm based on multirate sampling
The paper presents an algorithm for accurate pitch estimation that takes advantage of the sinusoidal model with instantaneous parameters. The algorithm decomposes the signal into subband components, extracts their instantaneous parameters and evaluates period candidate generating function (PCGF). In order to achieve high accuracy for low and high-pitched sounds it is assumed that possible pitch variation range is proportional to current pitch value. The bandwidths of the decomposition filters and length of the analysis frame are scaled for each period candidate by multirate sampling. The algorithm is compared to other widely used pitch extractors on artificial quasiperiodic signals and natural speech. The proposed algorithm shows a remarkable frequency and time resolution for pitch-modulated sounds and performs well both in clean and noisy conditions
Numerical study of the normal current density behaviour in a narrow - gap glow discharge
A numerical study of normal glow discharge properties was performed in the
case of small electrodes separations (0.05-0.4 cm) and moderate gas pressures
(10-46 Torr). A recently observed new experimental effect of a considerable
reduction in the normal current density for smaller discharge lengths was
analyzed both by means of 2D fluid model and by a minimal 1D drift model of gas
discharge. A good agreement between theoretical and experimental behaviour was
demonstrated. An influence of the electrodes separation and of the gas heating
on the value of the normal current density is discussed.Comment: 20 pages, 4 figure
Выделение речевой активности на фоне шумов при помощи компактной сверточной нейронной сети
The paper investigates the problem of voice activity detection from a noisy sound signal. An extremely compact convolutional neural network is proposed. The model has only 385 trainable parameters. Proposed model doesn’t require a lot of computational resources that allows to use it as part of the “internet of things” concept for compact low power devices. At the same time the model provides state of the art results in voice activity detection in terms of detection accuracy. The properties of the model are achieved by using a special convolutional layer that considers the harmonic structure of vocal speech. This layer also eliminates redundancy of the model because it has invariance to changes of fundamental frequency. The model performance is evaluated in various noise conditions with different signal-to-noise ratios. The results show that the proposed model provides higher accuracy compared to voice activity detection model from the WebRTC framework by Google.Исследуется задача выделения речевой активности из зашумленного звукового сигнала. Предлагается компактная модель сверточной нейронной сети, которая имеет всего 385 параметров. Модель нетребовательна к вычислительным ресурсам, что позволяет использовать ее в рамках концепции Интернета вещей для портативных устройств с низким энергопотреблением. В то же время эта модель обеспечивает высокую точность определения речевой активности на уровне лучших современных аналогов. Указанные полезные свойства достигаются путем применения специального сверточного слоя, учитывающего гармоническую структуру вокализованной речи и устраняющего избыточность модели за счет инвариантности к изменениям частоты основного тона. В рамках экспериментов производительность модели оценивалась в различных шумовых условиях для разных соотношений сигнала и шума. Результаты экспериментов показали, что предложенная модель обеспечивает более высокую точность определения речевой активности по сравнению с моделью, представленной компанией Google в фреймворке WebRTC
Overview of convolutional neural networks for image recognition
Цель работы, результаты которой представлены в рамках статьи, заключалась в исследовании современных архитектур сверточных нейронных сетей для распознавания изображений. В статье рассмотрены такие архитектуры как AlexNet, ZFnet, VGGNet, GoogleNet, ResNet. Характеристикой о качестве распознавания изображения для нейронной сети является ошибка top-5. На основе полученных результатов было выявлено, что на данный момент сетью с наиболее точным результатом является свёрточная сеть ResNet с показателем точности в 3,57%. Преимуществом данного исследования является то, что приведенная статья дает краткую характеристику свёрточной нейронной сети, а также дает представление о современных архитектурах свёртчных сетей, их строением и качественными показателями. The purpose of the work, the results of which are presented in the article, was to study modern architectures of convolutional neural networks for image recognition. This article discusses such architectures as AlexNet, ZF net, Get, Google Net, Reset. The characteristic about the image recognition quality for a neural network is the top-5 error. Based on the results obtained, it was found that at the moment the network with the most accurate result is the RESNET convolutional network with an accuracy rate of 3.57%. The advantage of this study is that this article provides a brief description of the convolutional neural network, as well as gives an idea of modern architectures of convolutional networks, their structure and quality indicators
Voice Analysis and Classification System Based on Perturbation Parameters and Cepstral Presentation in Psychoacoustic Scales
Описан подход к построению системы анализа и классификации голосового сигнала
на основе пертурбационных параметров и кепстрального представления. Рассмотрены два варианта
кепстрального представления голосового сигнала: при помощи мел-частотных кепстральных
коэффициентов (МЧКК) и при помощи барк-частотных кепстральных коэффициентов (БЧКК). В работе
использовался общепринятый подход к вычислению МЧКК на основе частотно-временного анализа
методом дискретного преобразования Фурье (ДПФ) с объединением энергии в субполосах. Данный
метод аппроксимирует частотное разрешение слуха человека, но имеет фиксированное временное
разрешение. В качестве альтернативы предложен вариант кепстрального представления на основе
БЧКК. При расчете БЧКК использовался неравнополосный ДПФ-модулированный банк фильтров,
аппроксимирующий частотную и временную разрешающую способность слуха. Целью работы
ставилось сравнение эффективности применения признаков на основе МЧКК и БЧКК для построения
систем анализа и классификации голосового сигнала. Результаты эксперимента показали, что в случае
использования акустических признаков на основе МЧКК можно получить систему классификации
голоса со средней полнотой классификации 80,6 %, а в случае использовании признаков на основе БЧКК
этот показатель равен 83,7 %. При дополнении набора МЧКК признаков пертурбационными
параметрами голоса средняя полнота классификации повысилась до 94,1 %, при аналогичном
дополнении набора БЧКК признаков средняя полнота классификации увеличилась до 96,7 %. The paper describes an approach to design a system for analyzing and classification of a voice signal
based on perturbation parameters and cepstral representation. Two variants of the cepstral representation of the
voice signal are considered: based on mel-frequency cepstral coefficients (MFCC) and based on bark-frequency
cepstral coefficients (BFCC). The work used a generally accepted approach to calculating the MFCC based on
the time-frequency analysis by the method of discrete Fourier transform (DFT) with summation of energy in
subbands. This method approximates the frequency resolution of human hearing, but has a fixed temporal
resolution. As an alternative, a variant of the cepstral representation based on the BFCC has been proposed.
When calculating the BFCC, a warped DFT-modulated filter bank was used, which approximates the frequency
and temporal resolution of hearing. The aim of the work was to compare the effectiveness of the use of features
based on the MFCC and BFCC for the designing systems for the analysis and classification of the voice signal.
The results of the experiment showed that in the case when using acoustic features based on the MFCC, it is
possible to obtain a voice classification system with an average recall of 80.6 %, and in the case when using
features based on the BFCC, this metric is 83.7 %. With the addition of the set of MFCC features with
perturbation parameters of the voice, the average recall of the classification increased to 94.1 %, with a similar
addition to the set of BFCC features, the average recall of the classification increased up to 96.7 %
Система анализа и классификации голосового сигнала на основе пертрубационных параметров и кепстрального представления в психоакустических шкалах
The paper describes an approach to design a system for analyzing and classification of a voice signal based on perturbation parameters and cepstral representation. Two variants of the cepstral representation of the voice signal are considered: based on mel-frequency cepstral coefficients (MFCC) and based on bark-frequency cepstral coefficients (BFCC). The work used a generally accepted approach to calculating the MFCC based on the time-frequency analysis by the method of discrete Fourier transform (DFT) with summation of energy in subbands. This method approximates the frequency resolution of human hearing, but has a fixed temporal resolution. As an alternative, a variant of the cepstral representation based on the BFCC has been proposed. When calculating the BFCC, a warped DFT-modulated filter bank was used, which approximates the frequency and temporal resolution of hearing. The aim of the work was to compare the effectiveness of the use of features based on the MFCC and BFCC for the designing systems for the analysis and classification of the voice signal. The results of the experiment showed that in the case when using acoustic features based on the MFCC, it is possible to obtain a voice classification system with an average recall of 80.6 %, and in the case when using features based on the BFCC, this metric is 83.7 %. With the addition of the set of MFCC features with perturbation parameters of the voice, the average recall of the classification increased to 94.1 %, with a similar addition to the set of BFCC features, the average recall of the classification increased up to 96.7 %.Описан подход к построению системы анализа и классификации голосового сигнала на основе пертурбационных параметров и кепстрального представления. Рассмотрены два варианта кепстрального представления голосового сигнала: при помощи мел-частотных кепстральных коэффициентов (МЧКК) и при помощи барк-частотных кепстральных коэффициентов (БЧКК). В работе использовался общепринятый подход к вычислению МЧКК на основе частотно-временного анализа методом дискретного преобразования Фурье (ДПФ) с объединением энергии в субполосах. Данный метод аппроксимирует частотное разрешение слуха человека, но имеет фиксированное временное разрешение. В качестве альтернативы предложен вариант кепстрального представления на основе БЧКК. При расчете БЧКК использовался неравнополосный ДПФ-модулированный банк фильтров, аппроксимирующий частотную и временную разрешающую способность слуха. Целью работы ставилось сравнение эффективности применения признаков на основе МЧКК и БЧКК для построения систем анализа и классификации голосового сигнала. Результаты эксперимента показали, что в случае использования акустических признаков на основе МЧКК можно получить систему классификации голоса со средней полнотой классификации 80,6 %, а в случае использовании признаков на основе БЧКК этот показатель равен 83,7 %. При дополнении набора МЧКК признаков пертурбационными параметрами голоса средняя полнота классификации повысилась до 94,1 %, при аналогичном дополнении набора БЧКК признаков средняя полнота классификации увеличилась до 96,7 %
Научная школа профессора А. А. Петровского
Two periods of scientific activity of Professor Alexander Alexandrovich Petrovsky, who was a member of the editorial board of the journal "Informatics" for 15 years (2004–2019), are presented. The main scientific results, his contribution to the development of the theory and to the hardware and software of the problem-oriented real-time systems and the processing of audio, speech and graphic information are shown, a list of the most significant works of the scientist is given.Представлены два периода научной деятельности профессора Александра Александровича Петровского, который на протяжении 15 лет (2004–2019) являлся членом редакционной коллегии журнала «Информатика». Показаны основные научные результаты, его вклад в области разработки теории и аппаратно-программных средств проблемно-ориентированных систем реального времени и обработки звуковой, речевой, графической информации, приведен перечень наиболее значимых трудов ученого
АЛГОРИТМ ПОДАВЛЕНИЯ ШУМА И АКУСТИЧЕСКОЙ ОБРАТНОЙ СВЯЗИ НА ОСНОВЕ СПЕКТРАЛЬНОГО ВЫЧИТАНИЯ В СЛУХОВОМ ПРОТЕЗЕ НА БАЗЕ СМАРТФОНА
The paper presents a combined noise and acoustic feedback reduction algorithm. The algorithm is based on spectral subtraction and is robust to rapid changes in acoustic feedback path which makes it suitable for using in a smartphone-based hearing aid.В работе предлагается совмещенный алгоритм подавления шума и акустической обратной связи. Алгоритм основан на спектральном вычитании и является устойчивым к резким изменениям параметров пути распространения акустической обратной связи, что делает его подходящим для использования в слуховых протезах на основе смартфонов
- …