197 research outputs found
On disjoint range operators in a Hilbert space
AbstractFor a bounded linear operator M in a Hilbert space H, various relations among the ranges R(M),R(Mâ), R(M+Mâ) and the null spaces N(M),N(Mâ) are considered from the point of view of their relations to the known classes of operators, such as EP, co-EP, weak-EP, GP, DR, or SR. Particular attention is paid to the range projectors of the operators M, Mâ and some further characteristics of these projectors are derived as well
CLN-VC: Text-Free Voice Conversion Based on Fine-Grained Style Control and Contrastive Learning with Negative Samples Augmentation
Better disentanglement of speech representation is essential to improve the
quality of voice conversion. Recently contrastive learning is applied to voice
conversion successfully based on speaker labels. However, the performance of
model will reduce in conversion between similar speakers. Hence, we propose an
augmented negative sample selection to address the issue. Specifically, we
create hard negative samples based on the proposed speaker fusion module to
improve learning ability of speaker encoder. Furthermore, considering the
fine-grain modeling of speaker style, we employ a reference encoder to extract
fine-grained style and conduct the augmented contrastive learning on global
style. The experimental results show that the proposed method outperforms
previous work in voice conversion tasks.Comment: Accepted by the 21st IEEE International Symposium on Parallel and
Distributed Processing with Applications (IEEE ISPA 2023
CP-EB: Talking Face Generation with Controllable Pose and Eye Blinking Embedding
This paper proposes a talking face generation method named "CP-EB" that takes
an audio signal as input and a person image as reference, to synthesize a
photo-realistic people talking video with head poses controlled by a short
video clip and proper eye blinking embedding. It's noted that not only the head
pose but also eye blinking are both important aspects for deep fake detection.
The implicit control of poses by video has already achieved by the state-of-art
work. According to recent research, eye blinking has weak correlation with
input audio which means eye blinks extraction from audio and generation are
possible. Hence, we propose a GAN-based architecture to extract eye blink
feature from input audio and reference video respectively and employ
contrastive training between them, then embed it into the concatenated features
of identity and poses to generate talking face images. Experimental results
show that the proposed method can generate photo-realistic talking face with
synchronous lips motions, natural head poses and blinking eyes.Comment: Accepted by the 21st IEEE International Symposium on Parallel and
Distributed Processing with Applications (IEEE ISPA 2023
PMVC: Data Augmentation-Based Prosody Modeling for Expressive Voice Conversion
Voice conversion as the style transfer task applied to speech, refers to
converting one person's speech into a new speech that sounds like another
person's. Up to now, there has been a lot of research devoted to better
implementation of VC tasks. However, a good voice conversion model should not
only match the timbre information of the target speaker, but also expressive
information such as prosody, pace, pause, etc. In this context, prosody
modeling is crucial for achieving expressive voice conversion that sounds
natural and convincing. Unfortunately, prosody modeling is important but
challenging, especially without text transcriptions. In this paper, we firstly
propose a novel voice conversion framework named 'PMVC', which effectively
separates and models the content, timbre, and prosodic information from the
speech without text transcriptions. Specially, we introduce a new speech
augmentation algorithm for robust prosody extraction. And building upon this,
mask and predict mechanism is applied in the disentanglement of prosody and
content information. The experimental results on the AIShell-3 corpus supports
our improvement of naturalness and similarity of converted speech.Comment: Accepted by the 31st ACM International Conference on Multimedia
(MM2023
Learning Disentangled Speech Representations with Contrastive Learning and Time-Invariant Retrieval
Voice conversion refers to transferring speaker identity with well-preserved
content. Better disentanglement of speech representations leads to better voice
conversion. Recent studies have found that phonetic information from input
audio has the potential ability to well represent content. Besides, the
speaker-style modeling with pre-trained models making the process more complex.
To tackle these issues, we introduce a new method named "CTVC" which utilizes
disentangled speech representations with contrastive learning and
time-invariant retrieval. Specifically, a similarity-based compression module
is used to facilitate a more intimate connection between the frame-level hidden
features and linguistic information at phoneme-level. Additionally, a
time-invariant retrieval is proposed for timbre extraction based on multiple
segmentations and mutual information. Experimental results demonstrate that
"CTVC" outperforms previous studies and improves the sound quality and
similarity of converted results.Comment: Accepted by 2024 IEEE International Conference on Acoustics, Speech,
and Signal Processing (ICASSP2024
Flue Gas Desulphurization in Circulating Fluidized Beds
Sulphur dioxide (SO2) is mostly emitted from coalâfueled power plants, from waste incineration, from sulphuric acid manufacturing, from clay brick plants and from treating nonferrous metals. The emission of SO2 needs to be abated. Both wet scrubbing (absorption) and dry or semiâdry (reaction) systems are used. In the dry process, both bubbling and circulating fluidized beds (BFB, CFB) can be used as contactor. Experimental results demonstrate a SO2âremoval efficiency in excess of 94% in a CFB application. A general model of the heterogeneous reaction is proposed, combining the external diffusion of SO2 across the gas film, the internal diffusion of SO2 in the porous particles and the reaction as such (irreversible, 1st order). For the reaction of SO2 with a fine particulate reactant, the reaction rate constant and the relevant contact time are the dominant parameters. Application of the model equations reveals that the circulating fluidized bed is the most appropriate technique, where the high solid to gas ratio guarantees a high conversion in a short reaction time. For the CFB operation, the required gas contact time in a CFB at given superficial gas velocities and solids circulation rates will determine the SO2 removal rate
Health risk appraisal of urban thermal environment and characteristic analysis on vulnerable populations
Continuous global warming and frequent extreme high temperatures keep the urban climate health risk increasing, seriously threatening residentsâ emotional health. Therefore, analysis on spatial distribution of the health risk that the urban heat island (UHI) effect imposes on emotional health as well as basic research on the characteristics of vulnerable populations need to be conducted. This study, with Tianjin city as the case, analyzed data from Landsat remote-sensing images, meteorological stations, and digital maps, explored the influence of summer UHI effect on distress (a typical negative emotion factor) and its spatiotemporal evolution, and conducted difference analysis on the age groups, genders, family state, and distress levels of vulnerable populations. The results show: (1) During the period of 1992â2020, the level and area of UHI influence on residentsâ distress drastically increasedâinfluence level elevated from level 2â4 to level 4â7, and highlevel influence areas were concentrated in six districts of central Tianjin. (2) Influence of the UHI effect on distress varied in different age groupsâgenerally dropping with fluctuations as residents got older, especially residents aged 50â59. (3) Men experienced a W-shaped pattern in distress and were more irritable and unsteady emotionally; while women were more sensitive to distress in the beginning, but they became more placid as temperature got higher. (4) Studies on family status show that couples living together showed sound heat resistance in the face of heat stress, while middle-aged and elderly people living alone or with children were relatively weak in adjusting to high ambient temperature
- âŠ