37 research outputs found

    Dynamic Acoustic Compensation and Adaptive Focal Training for Personalized Speech Enhancement

    Full text link
    Recently, more and more personalized speech enhancement systems (PSE) with excellent performance have been proposed. However, two critical issues still limit the performance and generalization ability of the model: 1) Acoustic environment mismatch between the test noisy speech and target speaker enrollment speech; 2) Hard sample mining and learning. In this paper, dynamic acoustic compensation (DAC) is proposed to alleviate the environment mismatch, by intercepting the noise or environmental acoustic segments from noisy speech and mixing it with the clean enrollment speech. To well exploit the hard samples in training data, we propose an adaptive focal training (AFT) strategy by assigning adaptive loss weights to hard and non-hard samples during training. A time-frequency multi-loss training is further introduced to improve and generalize our previous work sDPCCN for PSE. The effectiveness of proposed methods are examined on the DNS4 Challenge dataset. Results show that, the DAC brings large improvements in terms of multiple evaluation metrics, and AFT reduces the hard sample rate significantly and produces obvious MOS score improvement

    RealFlow: EM-based Realistic Optical Flow Dataset Generation from Videos

    Full text link
    Obtaining the ground truth labels from a video is challenging since the manual annotation of pixel-wise flow labels is prohibitively expensive and laborious. Besides, existing approaches try to adapt the trained model on synthetic datasets to authentic videos, which inevitably suffers from domain discrepancy and hinders the performance for real-world applications. To solve these problems, we propose RealFlow, an Expectation-Maximization based framework that can create large-scale optical flow datasets directly from any unlabeled realistic videos. Specifically, we first estimate optical flow between a pair of video frames, and then synthesize a new image from this pair based on the predicted flow. Thus the new image pairs and their corresponding flows can be regarded as a new training set. Besides, we design a Realistic Image Pair Rendering (RIPR) module that adopts softmax splatting and bi-directional hole filling techniques to alleviate the artifacts of the image synthesis. In the E-step, RIPR renders new images to create a large quantity of training data. In the M-step, we utilize the generated training data to train an optical flow network, which can be used to estimate optical flows in the next E-step. During the iterative learning steps, the capability of the flow network is gradually improved, so is the accuracy of the flow, as well as the quality of the synthesized dataset. Experimental results show that RealFlow outperforms previous dataset generation methods by a considerably large margin. Moreover, based on the generated dataset, our approach achieves state-of-the-art performance on two standard benchmarks compared with both supervised and unsupervised optical flow methods. Our code and dataset are available at https://github.com/megvii-research/RealFlowComment: ECCV 2022 Ora

    Mesoporous Polydopamine Loaded Pirfenidone Target to Fibroblast Activation Protein for Pulmonary Fibrosis Therapy

    Get PDF
    Recently, fibroblast activation protein (FAP), an overexpressed transmembrane protein of activated fibroblast in pulmonary fibrosis, has been considered as the new target for diagnosing and treating pulmonary fibrosis. In this work, mesoporous polydopamine (MPDA), which is facile prepared and easily modified, is developed as a carrier to load antifibrosis drug pirfenidone (PFD) and linking FAP inhibitor (FAPI) to realize lesion-targeted drug delivery for pulmonary fibrosis therapy. We have found that PFD@MPDA-FAPI is well biocompatible and with good properties of antifibrosis, when ICG labels MPDA-FAPI, the accumulation of the nanodrug at the fibrosis lung in vivo can be observed by NIR imaging, and the antifibrosis properties of PFD@MPDA-FAPI in vivo were also better than those of pure PFD and PFD@MPDA; therefore, the easily produced and biocompatible nanodrug PFD@MPDA-FAPI developed in this study is promising for further clinical translations in pulmonary fibrosis antifibrosis therapy

    Heterogeneous separation consistency training for adaptation of unsupervised speech separation

    No full text
    Abstract Recently, supervised speech separation has made great progress. However, limited by the nature of supervised training, most existing separation methods require ground-truth sources and are trained on synthetic datasets. This ground-truth reliance is problematic, because the ground-truth signals are usually unavailable in real conditions. Moreover, in many industry scenarios, the real acoustic characteristics deviate far from the ones in simulated datasets. Therefore, the performance usually degrades significantly when applying the supervised speech separation models to real applications. To address these problems, in this study, we propose a novel separation consistency training, termed SCT, to exploit the real-world unlabeled mixtures for improving cross-domain unsupervised speech separation in an iterative manner, by leveraging upon the complementary information obtained from heterogeneous (structurally distinct but behaviorally complementary) models. SCT follows a framework using two heterogeneous neural networks (HNNs) to produce high confidence pseudo labels of unlabeled real speech mixtures. These labels are then updated and used to refine the HNNs to produce more reliable consistent separation results for real mixture pseudo-labeling. To maximally utilize the large complementary information between different separation networks, a cross-knowledge adaptation is further proposed. Together with simulated dataset, those real mixtures with high confidence pseudo labels are then used to update the HNN separation models iteratively. In addition, we find that combing the heterogeneous separation outputs by a simple linear fusion can further slightly improve the final system performance. In this paper, we use cross-dataset to simulate the cross-domain situation in real-life. The term of “source domain” and “target domain” refer to the simulation set for model pre-training and the real unlabeled mixture for model adaptation. The proposed SCT is evaluated on both public reverberant English and anechoic Mandarin cross-domain separation tasks. Results show that, without any available ground-truth of target domain mixtures, the SCT can still significantly outperform our two strong baselines with up to 1.61 dB and 3.44 dB scale-invariant signal-to-noise ratio (SI-SNR) improvements, on the English and Mandarin cross-domain conditions, respectively

    Mixed Ownership Reform and Corporate Governance in China\u27s State-Owned Enterprises

    Get PDF
    This Article provides an early assessment of the impact on corporate governance of the most recent wave of reform of China\u27s state-owned enterprises (SOEs) announced by the CCP in 2013, officially known as the mixed-ownership reform (MOR). It offers a comprehensive and detailed account of the background, policy and regulatory frameworks, and rationale of the MOR in light of the history of ownership reform in China. It also conducts empirical studies of the change in ownership and board composition in over 30 SOEs which have recently completed their MOR experiments, as well as several case studies. It observes that MOR\u27s impact on SOE corporate governance has been embodied in the retreat of the state, the advance of the Chinese Communist Party (the Party), and a limited yet emerging separation of power between the Party and the board in SOEs. To explain this observation, the Article argues that the MOR program is driven by three current beliefs of the Chinese Party-state on the future of SOEs in China. First, ownership and ownership reform matter. Second, sharing control, rather than dominance by a single state shareholder, improves both the efficiency and governance of SOEs. Third, the MOR was designed to develop partnerships or alliances between the state shareholders and strategic investors in order to help the post-MOR state enterprises improve their efficiency and enhance market opportunities

    DiaCorrect: End-to-end error correction for speaker diarization

    Full text link
    In recent years, speaker diarization has attracted widespread attention. To achieve better performance, some studies propose to diarize speech in multiple stages. Although these methods might bring additional benefits, most of them are quite complex. Motivated by spelling correction in automatic speech recognition (ASR), in this paper, we propose an end-to-end error correction framework, termed DiaCorrect, to refine the initial diarization results in a simple but efficient way. By exploiting the acoustic interactions between input mixture and its corresponding speaker activity, DiaCorrect could automatically adapt the initial speaker activity to minimize the diarization errors. Without bells and whistles, experiments on LibriSpeech based 2-speaker meeting-like data show that, the self-attentitive end-to-end neural diarization (SA-EEND) baseline with DiaCorrect could reduce its diarization error rate (DER) by over 62.4% from 12.31% to 4.63%. Our source code is available online at https://github.com/jyhan03/diacorrect.Comment: This paper has been superseded by arXiv:2309.08377 (merged from arXiv:2210.17189
    corecore