549 research outputs found

    Long-frame-shift Neural Speech Phase Prediction with Spectral Continuity Enhancement and Interpolation Error Compensation

    Full text link
    Speech phase prediction, which is a significant research focus in the field of signal processing, aims to recover speech phase spectra from amplitude-related features. However, existing speech phase prediction methods are constrained to recovering phase spectra with short frame shifts, which are considerably smaller than the theoretical upper bound required for exact waveform reconstruction of short-time Fourier transform (STFT). To tackle this issue, we present a novel long-frame-shift neural speech phase prediction (LFS-NSPP) method which enables precise prediction of long-frame-shift phase spectra from long-frame-shift log amplitude spectra. The proposed method consists of three stages: interpolation, prediction and decimation. The short-frame-shift log amplitude spectra are first constructed from long-frame-shift ones through frequency-by-frequency interpolation to enhance the spectral continuity, and then employed to predict short-frame-shift phase spectra using an NSPP model, thereby compensating for interpolation errors. Ultimately, the long-frame-shift phase spectra are obtained from short-frame-shift ones through frame-by-frame decimation. Experimental results show that the proposed LFS-NSPP method can yield superior quality in predicting long-frame-shift phase spectra than the original NSPP model and other signal-processing-based phase estimation algorithms.Comment: Published at IEEE Signal Processing Letter

    Explicit Estimation of Magnitude and Phase Spectra in Parallel for High-Quality Speech Enhancement

    Full text link
    Phase information has a significant impact on speech perceptual quality and intelligibility. However, existing speech enhancement methods encounter limitations in explicit phase estimation due to the non-structural nature and wrapping characteristics of the phase, leading to a bottleneck in enhanced speech quality. To overcome the above issue, in this paper, we proposed MP-SENet, a novel Speech Enhancement Network which explicitly enhances Magnitude and Phase spectra in parallel. The proposed MP-SENet adopts a codec architecture in which the encoder and decoder are bridged by time-frequency Transformers along both time and frequency dimensions. The encoder aims to encode time-frequency representations derived from the input distorted magnitude and phase spectra. The decoder comprises dual-stream magnitude and phase decoders, directly enhancing magnitude and wrapped phase spectra by incorporating a magnitude estimation architecture and a phase parallel estimation architecture, respectively. To train the MP-SENet model effectively, we define multi-level loss functions, including mean square error and perceptual metric loss of magnitude spectra, anti-wrapping loss of phase spectra, as well as mean square error and consistency loss of short-time complex spectra. Experimental results demonstrate that our proposed MP-SENet excels in high-quality speech enhancement across multiple tasks, including speech denoising, dereverberation, and bandwidth extension. Compared to existing phase-aware speech enhancement methods, it successfully avoids the bidirectional compensation effect between the magnitude and phase, leading to a better harmonic restoration. Notably, for the speech denoising task, the MP-SENet yields a state-of-the-art performance with a PESQ of 3.60 on the public VoiceBank+DEMAND dataset.Comment: Submmited to IEEE Transactions on Audio, Speech and Language Processin

    Source-Filter-Based Generative Adversarial Neural Vocoder for High Fidelity Speech Synthesis

    Full text link
    This paper proposes a source-filter-based generative adversarial neural vocoder named SF-GAN, which achieves high-fidelity waveform generation from input acoustic features by introducing F0-based source excitation signals to a neural filter framework. The SF-GAN vocoder is composed of a source module and a resolution-wise conditional filter module and is trained based on generative adversarial strategies. The source module produces an excitation signal from the F0 information, then the resolution-wise convolutional filter module combines the excitation signal with processed acoustic features at various temporal resolutions and finally reconstructs the raw waveform. The experimental results show that our proposed SF-GAN vocoder outperforms the state-of-the-art HiFi-GAN and Fre-GAN in both analysis-synthesis (AS) and text-to-speech (TTS) tasks, and the synthesized speech quality of SF-GAN is comparable to the ground-truth audio.Comment: Accepted by NCMMSC 202

    APNet2: High-quality and High-efficiency Neural Vocoder with Direct Prediction of Amplitude and Phase Spectra

    Full text link
    In our previous work, we proposed a neural vocoder called APNet, which directly predicts speech amplitude and phase spectra with a 5 ms frame shift in parallel from the input acoustic features, and then reconstructs the 16 kHz speech waveform using inverse short-time Fourier transform (ISTFT). APNet demonstrates the capability to generate synthesized speech of comparable quality to the HiFi-GAN vocoder but with a considerably improved inference speed. However, the performance of the APNet vocoder is constrained by the waveform sampling rate and spectral frame shift, limiting its practicality for high-quality speech synthesis. Therefore, this paper proposes an improved iteration of APNet, named APNet2. The proposed APNet2 vocoder adopts ConvNeXt v2 as the backbone network for amplitude and phase predictions, expecting to enhance the modeling capability. Additionally, we introduce a multi-resolution discriminator (MRD) into the GAN-based losses and optimize the form of certain losses. At a common configuration with a waveform sampling rate of 22.05 kHz and spectral frame shift of 256 points (i.e., approximately 11.6ms), our proposed APNet2 vocoder outperformed the original APNet and Vocos vocoders in terms of synthesized speech quality. The synthesized speech quality of APNet2 is also comparable to that of HiFi-GAN and iSTFTNet, while offering a significantly faster inference speed

    Towards High-Quality and Efficient Speech Bandwidth Extension with Parallel Amplitude and Phase Prediction

    Full text link
    Speech bandwidth extension (BWE) refers to widening the frequency bandwidth range of speech signals, enhancing the speech quality towards brighter and fuller. This paper proposes a generative adversarial network (GAN) based BWE model with parallel prediction of Amplitude and Phase spectra, named AP-BWE, which achieves both high-quality and efficient wideband speech waveform generation. The proposed AP-BWE generator is entirely based on convolutional neural networks (CNNs). It features a dual-stream architecture with mutual interaction, where the amplitude stream and the phase stream communicate with each other and respectively extend the high-frequency components from the input narrowband amplitude and phase spectra. To improve the naturalness of the extended speech signals, we employ a multi-period discriminator at the waveform level and design a pair of multi-resolution amplitude and phase discriminators at the spectral level, respectively. Experimental results demonstrate that our proposed AP-BWE achieves state-of-the-art performance in terms of speech quality for BWE tasks targeting sampling rates of both 16 kHz and 48 kHz. In terms of generation efficiency, due to the all-convolutional architecture and all-frame-level operations, the proposed AP-BWE can generate 48 kHz waveform samples 292.3 times faster than real-time on a single RTX 4090 GPU and 18.1 times faster than real-time on a single CPU. Notably, to our knowledge, AP-BWE is the first to achieve the direct extension of the high-frequency phase spectrum, which is beneficial for improving the effectiveness of existing BWE methods.Comment: Submitted to IEEE/ACM Transactions on Audio, Speech, and Language Processin

    APCodec: A Neural Audio Codec with Parallel Amplitude and Phase Spectrum Encoding and Decoding

    Full text link
    This paper introduces a novel neural audio codec targeting high waveform sampling rates and low bitrates named APCodec, which seamlessly integrates the strengths of parametric codecs and waveform codecs. The APCodec revolutionizes the process of audio encoding and decoding by concurrently handling the amplitude and phase spectra as audio parametric characteristics like parametric codecs. It is composed of an encoder and a decoder with the modified ConvNeXt v2 network as the backbone, connected by a quantizer based on the residual vector quantization (RVQ) mechanism. The encoder compresses the audio amplitude and phase spectra in parallel, amalgamating them into a continuous latent code at a reduced temporal resolution. This code is subsequently quantized by the quantizer. Ultimately, the decoder reconstructs the audio amplitude and phase spectra in parallel, and the decoded waveform is obtained by inverse short-time Fourier transform. To ensure the fidelity of decoded audio like waveform codecs, spectral-level loss, quantization loss, and generative adversarial network (GAN) based loss are collectively employed for training the APCodec. To support low-latency streamable inference, we employ feed-forward layers and causal convolutional layers in APCodec, incorporating a knowledge distillation training strategy to enhance the quality of decoded audio. Experimental results confirm that our proposed APCodec can encode 48 kHz audio at bitrate of just 6 kbps, with no significant degradation in the quality of the decoded audio. At the same bitrate, our proposed APCodec also demonstrates superior decoded audio quality and faster generation speed compared to well-known codecs, such as SoundStream, Encodec, HiFi-Codec and AudioDec.Comment: Submitted to IEEE/ACM Transactions on Audio, Speech, and Language Processin

    A comparison about the inhibitory effect of curcunmin and Avastin on the rat corneal neovascularization

    Get PDF
    AIM: To compare the inhibitory effect of curcunmin and Avastin on the rat corneal neovascularization(CNV), and approach the mechanism of the curcunmin's inhibition. METHODS: CNV was established in thirty SD rats by alkaline burning. Rats were divided equally to group A and group B at random. In group A, right eyes were experimental group A1, treated by 40μmol/L curcunmin solution, and left eyes were control group A2, treated by 0.09% sodium chloride. In group B, right eyes were experimental group group B1, treated by 5g/L avastin, and left eyes were control group B2, treated by 0.09% sodium chloride. Cornea and aqueous humor were collected by time spot. The capillary vessels were study, and the expressions of VEGF were detected by Enzyme-Linked immunosorbnent Assay(ELISA). RESULTS: No toxic effects of the drugs were found. The capillary vessels in experimental group were less than those of control group(P<0.01). No statistical different of the capillary vessels between two drugs were found. The expressions of VEGF in experimental group were less than those in control group(P<0.01). The expressions of VEGF in B1 group were less than in group A1. CONCLUSION: The inhibitory effect to CNV of curcunmin and avastin have no statistical different in the experiment, but curcunmin has the less inhibitory effect to the expressions of VEGF than avastin. Curcunmin may have other mechanism in the inhibitory action on CNV

    Genetic variations of the porcine PRKAG3 gene in Chinese indigenous pig breeds

    Get PDF
    Four missense substitutions (T30N, G52S, V199I and R200Q) in the porcine PRKAG3 gene were considered as the likely candidate loci affecting meat quality. In this study, the R200Q substitution was investigated in a sample of 62 individuals from Hampshire, Chinese Min and Erhualian pigs, and the genetic variations of T30N, G52S and V199I substitutions were detected in 1505 individuals from 21 Chinese indigenous breeds, 5 Western commercial pig breeds, and the wild pig. Allele 200R was fixed in Chinese Min and Erhualian pigs. Haplotypes II-QQ and IV-QQ were not observed in the Hampshire population, supporting the hypothesis that allele 200Q is tightly linked with allele 199V. Significant differences in allele frequencies of the three substitutions (T30N, G52S and V199I) between Chinese indigenous pigs and Western commercial pigs were observed. Obvious high frequencies of the "favorable" alleles 30T and 52G in terms of meat quality were detected in Chinese indigenous pigs, which are well known for high meat quality. However, the frequency of the "favorable" allele 199I, which was reported to have a greater effect on meat quality in comparison with 30T and 52G, was very low in all of the Chinese indigenous pigs except for the Min pig. The reasons accounting for this discrepancy remain to be addressed. The presence of the three substitutions in purebred Chinese Tibetan pigs indicates that the three substitutions were ancestral mutations. A novel A/G substitution at position 51 in exon 1 was identified. The results suggest that further studies are required to investigate the associations of these substitutions in the PRKAG3 gene with meat quality of Chinese indigenous pigs, and to uncover other polymorphisms in the PRKAG3 gene with potential effects on meat quality in Chinese indigenous pigs
    • …
    corecore