31 research outputs found

    A Sparse Representation Speech Denoising Method Based on Adapted Stopping Residue Error

    Get PDF
    A sparse representation speech denoising method based on adapted stopping residue error was presented in this paper. Firstly, the cross-correlation between the clean speech spectrum and the noise spectrum was analyzed, and an estimation method was proposed. In the denoising method, an over-complete dictionary of the clean speech power spectrum was learned with the K-singular value decomposition (K-SVD) algorithm. In the sparse representation stage, the stopping residue error was adaptively achieved according to the estimated cross-correlation and the adjusted noise spectrum, and the orthogonal matching pursuit (OMP) approach was applied to reconstruct the clean speech spectrum from the noisy speech. Finally, the clean speech was re-synthesised via the inverse Fourier transform with the reconstructed speech spectrum and the noisy speech phase. The experiment results show that the proposed method outperforms the conventional methods in terms of subjective and objective measure

    An Audio Retrieval Algorithm Based on Audio Shot and Inverted Index

    Full text link

    Acoustic Scene Clustering Using Joint Optimization of Deep Embedding Learning and Clustering Iteration

    Full text link
    Recent efforts have been made on acoustic scene classification in the audio signal processing community. In contrast, few studies have been conducted on acoustic scene clustering, which is a newly emerging problem. Acoustic scene clustering aims at merging the audio recordings of the same class of acoustic scene into a single cluster without using prior information and training classifiers. In this study, we propose a method for acoustic scene clustering that jointly optimizes the procedures of feature learning and clustering iteration. In the proposed method, the learned feature is a deep embedding that is extracted from a deep convolutional neural network (CNN), while the clustering algorithm is the agglomerative hierarchical clustering (AHC). We formulate a unified loss function for integrating and optimizing these two procedures. Various features and methods are compared. The experimental results demonstrate that the proposed method outperforms other unsupervised methods in terms of the normalized mutual information and the clustering accuracy. In addition, the deep embedding outperforms many state-of-the-art features.Comment: 9 pages, 6 figures, 11 tables. Accepted for publication in IEEE TM

    Few-shot Class-incremental Audio Classification Using Adaptively-refined Prototypes

    Full text link
    New classes of sounds constantly emerge with a few samples, making it challenging for models to adapt to dynamic acoustic environments. This challenge motivates us to address the new problem of few-shot class-incremental audio classification. This study aims to enable a model to continuously recognize new classes of sounds with a few training samples of new classes while remembering the learned ones. To this end, we propose a method to generate discriminative prototypes and use them to expand the model's classifier for recognizing sounds of new and learned classes. The model is first trained with a random episodic training strategy, and then its backbone is used to generate the prototypes. A dynamic relation projection module refines the prototypes to enhance their discriminability. Results on two datasets (derived from the corpora of Nsynth and FSD-MIX-CLIPS) show that the proposed method exceeds three state-of-the-art methods in average accuracy and performance dropping rate.Comment: 5 pages,2 figures, Accepted by Interspeech 202

    Few-shot Class-incremental Audio Classification Using Stochastic Classifier

    Full text link
    It is generally assumed that number of classes is fixed in current audio classification methods, and the model can recognize pregiven classes only. When new classes emerge, the model needs to be retrained with adequate samples of all classes. If new classes continually emerge, these methods will not work well and even infeasible. In this study, we propose a method for fewshot class-incremental audio classification, which continually recognizes new classes and remember old ones. The proposed model consists of an embedding extractor and a stochastic classifier. The former is trained in base session and frozen in incremental sessions, while the latter is incrementally expanded in all sessions. Two datasets (NS-100 and LS-100) are built by choosing samples from audio corpora of NSynth and LibriSpeech, respectively. Results show that our method exceeds four baseline ones in average accuracy and performance dropping rate. Code is at https://github.com/vinceasvp/meta-sc.Comment: 5 pages, 3 figures, 4 tables. Accepted for publication in INTERSPEECH 202

    Low-Complexity Acoustic Scene Classification Using Data Augmentation and Lightweight ResNet

    Full text link
    We present a work on low-complexity acoustic scene classification (ASC) with multiple devices, namely the subtask A of Task 1 of the DCASE2021 challenge. This subtask focuses on classifying audio samples of multiple devices with a low-complexity model, where two main difficulties need to be overcome. First, the audio samples are recorded by different devices, and there is mismatch of recording devices in audio samples. We reduce the negative impact of the mismatch of recording devices by using some effective strategies, including data augmentation (e.g., mix-up, spectrum correction, pitch shift), usages of multi-patch network structure and channel attention. Second, the model size should be smaller than a threshold (e.g., 128 KB required by the DCASE2021 challenge). To meet this condition, we adopt a ResNet with both depthwise separable convolution and channel attention as the backbone network, and perform model compression. In summary, we propose a low-complexity ASC method using data augmentation and a lightweight ResNet. Evaluated on the official development and evaluation datasets, our method obtains classification accuracy scores of 71.6% and 66.7%, respectively; and obtains Log-loss scores of 1.038 and 1.136, respectively. Our final model size is 110.3 KB which is smaller than the maximum of 128 KB.Comment: 5 pages, 5 figures, 4 tables. Accepted for publication in the 16th IEEE International Conference on Signal Processing (IEEE ICSP

    The Use of Adaptive Frame for Speech Recognition

    No full text
    We propose an adaptive frame speech analysis scheme through dividing speech signal into stationary and dynamic region. Long frame analysis is used for stationary speech, and short frame analysis for dynamic speech. For computation convenience, the feature vector of short frame is designed to be identical to that of long frame. Two expressions are derived to represent the feature vector of short frames. Word recognition experiments on the TIMIT and NON-TIMIT with discrete Hidden Markov Model (HMM) and continuous density HMM showed that steady performance improvement could be achieved for open set testing. On the TIMIT database, adaptive frame length approach (AFL) reduces the error reduction rates from 4.47% to 11.21% and 4.54% to 9.58% for DHMM and CHMM, respectively. In the NON-TIMIT database, AFL also can reduce the error reduction rates from 1.91% to 11.55% and 2.63% to 9.5% for discrete hidden Markov model (DHMM) and continuous HMM (CHMM), respectively. These results proved the effectiveness of our proposed adaptive frame length feature extraction scheme especially for the open testing. In fact, this is a practical measurement for evaluating the performance of a speech recognition system.</p

    CBC-Based Synthetic Speech Detection

    No full text

    PTBP1 promotes hepatocellular carcinoma progression by regulating the skipping of exon 9 in NUMB pre-mRNA

    No full text
    Aberrant alternative splicing is one of the important causes of cancer. Polypyrimidine tract binding protein 1 (PTBP1) has been found to be involved in splicing regulation in a variety of tumors. Here, we observed significant up-regulation of PTBP1 in primary hepatocellular carcinoma (HCC) tissues. High levels of PTBP1 expression were associated with poor prognosis and increased metastatic potential in HCC. In vitro studies demonstrated that elevated PTBP1 promoted both migration and invasion by HCC cells. In contrast, knockdown of PTBP1 significantly inhibited the migration and invasion of HCC cells in vitro. Further, up-regulation of PTBP1 markedly accumulated the expression of oncogenic isoform of NUMB, NUMB-PRRL. We observed two isoforms of NUMB, NUMB-PRRL and NUMB-PRRS exhibit opposite function in HCC cells, which partially explain PTBP1 plays the tumor promoting roles in a NUMB splicing-dependent manner. In summary, our study indicates that PTBP1 may serve as an oncogene in HCC patients by regulating the alternative splicing of NUMB exon 9 and could potentially serve as a prognostic indicator
    corecore