197 research outputs found

    On disjoint range operators in a Hilbert space

    Get PDF
    AbstractFor a bounded linear operator M in a Hilbert space H, various relations among the ranges R(M),R(M∗), R(M+M∗) and the null spaces N(M),N(M∗) are considered from the point of view of their relations to the known classes of operators, such as EP, co-EP, weak-EP, GP, DR, or SR. Particular attention is paid to the range projectors of the operators M, M∗ and some further characteristics of these projectors are derived as well

    Post-combustion carbon capture

    Get PDF

    CLN-VC: Text-Free Voice Conversion Based on Fine-Grained Style Control and Contrastive Learning with Negative Samples Augmentation

    Full text link
    Better disentanglement of speech representation is essential to improve the quality of voice conversion. Recently contrastive learning is applied to voice conversion successfully based on speaker labels. However, the performance of model will reduce in conversion between similar speakers. Hence, we propose an augmented negative sample selection to address the issue. Specifically, we create hard negative samples based on the proposed speaker fusion module to improve learning ability of speaker encoder. Furthermore, considering the fine-grain modeling of speaker style, we employ a reference encoder to extract fine-grained style and conduct the augmented contrastive learning on global style. The experimental results show that the proposed method outperforms previous work in voice conversion tasks.Comment: Accepted by the 21st IEEE International Symposium on Parallel and Distributed Processing with Applications (IEEE ISPA 2023

    CP-EB: Talking Face Generation with Controllable Pose and Eye Blinking Embedding

    Full text link
    This paper proposes a talking face generation method named "CP-EB" that takes an audio signal as input and a person image as reference, to synthesize a photo-realistic people talking video with head poses controlled by a short video clip and proper eye blinking embedding. It's noted that not only the head pose but also eye blinking are both important aspects for deep fake detection. The implicit control of poses by video has already achieved by the state-of-art work. According to recent research, eye blinking has weak correlation with input audio which means eye blinks extraction from audio and generation are possible. Hence, we propose a GAN-based architecture to extract eye blink feature from input audio and reference video respectively and employ contrastive training between them, then embed it into the concatenated features of identity and poses to generate talking face images. Experimental results show that the proposed method can generate photo-realistic talking face with synchronous lips motions, natural head poses and blinking eyes.Comment: Accepted by the 21st IEEE International Symposium on Parallel and Distributed Processing with Applications (IEEE ISPA 2023

    PMVC: Data Augmentation-Based Prosody Modeling for Expressive Voice Conversion

    Full text link
    Voice conversion as the style transfer task applied to speech, refers to converting one person's speech into a new speech that sounds like another person's. Up to now, there has been a lot of research devoted to better implementation of VC tasks. However, a good voice conversion model should not only match the timbre information of the target speaker, but also expressive information such as prosody, pace, pause, etc. In this context, prosody modeling is crucial for achieving expressive voice conversion that sounds natural and convincing. Unfortunately, prosody modeling is important but challenging, especially without text transcriptions. In this paper, we firstly propose a novel voice conversion framework named 'PMVC', which effectively separates and models the content, timbre, and prosodic information from the speech without text transcriptions. Specially, we introduce a new speech augmentation algorithm for robust prosody extraction. And building upon this, mask and predict mechanism is applied in the disentanglement of prosody and content information. The experimental results on the AIShell-3 corpus supports our improvement of naturalness and similarity of converted speech.Comment: Accepted by the 31st ACM International Conference on Multimedia (MM2023

    Learning Disentangled Speech Representations with Contrastive Learning and Time-Invariant Retrieval

    Full text link
    Voice conversion refers to transferring speaker identity with well-preserved content. Better disentanglement of speech representations leads to better voice conversion. Recent studies have found that phonetic information from input audio has the potential ability to well represent content. Besides, the speaker-style modeling with pre-trained models making the process more complex. To tackle these issues, we introduce a new method named "CTVC" which utilizes disentangled speech representations with contrastive learning and time-invariant retrieval. Specifically, a similarity-based compression module is used to facilitate a more intimate connection between the frame-level hidden features and linguistic information at phoneme-level. Additionally, a time-invariant retrieval is proposed for timbre extraction based on multiple segmentations and mutual information. Experimental results demonstrate that "CTVC" outperforms previous studies and improves the sound quality and similarity of converted results.Comment: Accepted by 2024 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP2024

    Flue Gas Desulphurization in Circulating Fluidized Beds

    Get PDF
    Sulphur dioxide (SO2) is mostly emitted from coal‐fueled power plants, from waste incineration, from sulphuric acid manufacturing, from clay brick plants and from treating nonferrous metals. The emission of SO2 needs to be abated. Both wet scrubbing (absorption) and dry or semi‐dry (reaction) systems are used. In the dry process, both bubbling and circulating fluidized beds (BFB, CFB) can be used as contactor. Experimental results demonstrate a SO2‐removal efficiency in excess of 94% in a CFB application. A general model of the heterogeneous reaction is proposed, combining the external diffusion of SO2 across the gas film, the internal diffusion of SO2 in the porous particles and the reaction as such (irreversible, 1st order). For the reaction of SO2 with a fine particulate reactant, the reaction rate constant and the relevant contact time are the dominant parameters. Application of the model equations reveals that the circulating fluidized bed is the most appropriate technique, where the high solid to gas ratio guarantees a high conversion in a short reaction time. For the CFB operation, the required gas contact time in a CFB at given superficial gas velocities and solids circulation rates will determine the SO2 removal rate

    Health risk appraisal of urban thermal environment and characteristic analysis on vulnerable populations

    Get PDF
    Continuous global warming and frequent extreme high temperatures keep the urban climate health risk increasing, seriously threatening residents’ emotional health. Therefore, analysis on spatial distribution of the health risk that the urban heat island (UHI) effect imposes on emotional health as well as basic research on the characteristics of vulnerable populations need to be conducted. This study, with Tianjin city as the case, analyzed data from Landsat remote-sensing images, meteorological stations, and digital maps, explored the influence of summer UHI effect on distress (a typical negative emotion factor) and its spatiotemporal evolution, and conducted difference analysis on the age groups, genders, family state, and distress levels of vulnerable populations. The results show: (1) During the period of 1992–2020, the level and area of UHI influence on residents’ distress drastically increased–influence level elevated from level 2–4 to level 4–7, and highlevel influence areas were concentrated in six districts of central Tianjin. (2) Influence of the UHI effect on distress varied in different age groups–generally dropping with fluctuations as residents got older, especially residents aged 50–59. (3) Men experienced a W-shaped pattern in distress and were more irritable and unsteady emotionally; while women were more sensitive to distress in the beginning, but they became more placid as temperature got higher. (4) Studies on family status show that couples living together showed sound heat resistance in the face of heat stress, while middle-aged and elderly people living alone or with children were relatively weak in adjusting to high ambient temperature
    • 

    corecore