119 research outputs found
Mixed-EVC: Mixed Emotion Synthesis and Control in Voice Conversion
Emotional voice conversion (EVC) traditionally targets the transformation of
spoken utterances from one emotional state to another, with previous research
mainly focusing on discrete emotion categories. This paper departs from the
norm by introducing a novel perspective: a nuanced rendering of mixed emotions
and enhancing control over emotional expression. To achieve this, we propose a
novel EVC framework, Mixed-EVC, which only leverages discrete emotion training
labels. We construct an attribute vector that encodes the relationships among
these discrete emotions, which is predicted using a ranking-based support
vector machine and then integrated into a sequence-to-sequence (seq2seq) EVC
framework. Mixed-EVC not only learns to characterize the input emotional style
but also quantifies its relevance to other emotions during training. As a
result, users have the ability to assign these attributes to achieve their
desired rendering of mixed emotions. Objective and subjective evaluations
confirm the effectiveness of our approach in terms of mixed emotion synthesis
and control while surpassing traditional baselines in the conversion of
discrete emotions from one to another
Progress in the seasonal variations of blood lipids: a mini-review.
The seasonal variations of blood lipids have recently gained increasing interest in this field of lipid metabolism. Elucidating the seasonal patterns of blood lipids is particularly helpful for the prevention and treatment of cardiovascular and cerebrovascular diseases. However, the previous results remain controversial and the underlying mechanisms are still unclear. This mini-review is focused on summarizing the literature relevant to the seasonal variability of blood lipid parameters, as well as on discussing its significance in clinical diagnoses and management decisions
Long Short-term Memory with Two-Compartment Spiking Neuron
The identification of sensory cues associated with potential opportunities
and dangers is frequently complicated by unrelated events that separate useful
cues by long delays. As a result, it remains a challenging task for
state-of-the-art spiking neural networks (SNNs) to identify long-term temporal
dependencies since bridging the temporal gap necessitates an extended memory
capacity. To address this challenge, we propose a novel biologically inspired
Long Short-Term Memory Leaky Integrate-and-Fire spiking neuron model, dubbed
LSTM-LIF. Our model incorporates carefully designed somatic and dendritic
compartments that are tailored to retain short- and long-term memories. The
theoretical analysis further confirms its effectiveness in addressing the
notorious vanishing gradient problem. Our experimental results, on a diverse
range of temporal classification tasks, demonstrate superior temporal
classification capability, rapid training convergence, strong network
generalizability, and high energy efficiency of the proposed LSTM-LIF model.
This work, therefore, opens up a myriad of opportunities for resolving
challenging temporal processing tasks on emerging neuromorphic computing
machines
NIST 2007 Language Recognition Evaluation: From the Perspective of IIR
PACLIC / The University of the Philippines Visayas Cebu College Cebu City, Philippines / November 20-22, 200
Independent language modeling architecture for end-to-end ASR
The attention-based end-to-end (E2E) automatic speech recognition (ASR)
architecture allows for joint optimization of acoustic and language models
within a single network. However, in a vanilla E2E ASR architecture, the
decoder sub-network (subnet), which incorporates the role of the language model
(LM), is conditioned on the encoder output. This means that the acoustic
encoder and the language model are entangled that doesn't allow language model
to be trained separately from external text data. To address this problem, in
this work, we propose a new architecture that separates the decoder subnet from
the encoder output. In this way, the decoupled subnet becomes an independently
trainable LM subnet, which can easily be updated using the external text data.
We study two strategies for updating the new architecture. Experimental results
show that, 1) the independent LM architecture benefits from external text data,
achieving 9.3% and 22.8% relative character and word error rate reduction on
Mandarin HKUST and English NSC datasets respectively; 2)the proposed
architecture works well with external LM and can be generalized to different
amount of labelled data
Robust Speaker Verification Using Short-Time Frequency with Long-Time Window and Fusion of Multi-Resolutions
Abstract This study presents a novel approach of feature analysis to speaker verification. There are two main contributions in this paper. First, the feature analysis of short-time frequency with long-time window (SFLW) is a compact feature for the efficiency of speaker verification. The purpose of SFLW is to take account of short-time frequency characteristics and longtime resolution at the same time. Secondly, the fusion of multi-resolutions is used for the effectiveness of robust speaker verification. The speaker verification system can be further improved using multi-resolution features. The experimental results indicate that the proposed approaches not only speed up the processing time but also improve the performance of speaker verification
Diagenesis of the first member of Canglangpu Formation of the Cambrian Terreneuvian in northern part of the central Sichuan Basin and its influence on porosity
In this paper, taking the first Member of the Canglangpu Formation of the Cambrian Terreneuvian in the northern central Sichuan Basin as an example, the diagenesis and its influence on porosity are systemically studied based on the observations and identifications of cores, casts and cathodoluminescence thin sections. The results show that the rock types of the first member of Canglangpu Formation are various, including mixed rocks, carbonate rocks and clastic rocks. The specific lithology is dominated by sand-bearing oolitic dolomite, sandy oolitic dolomite, sparry oolotic dolomite and fine-grained detrital sandstone. At the same time, the Cang 1 Member has experienced five types of diagenetic environments, including seawater, meteoric water, evaporative seawater, shallow burial, and medium-deep burial diagenetic environments. Moreover, the main diagenetic processes under different diagenetic environments include cementation, dissolution, compaction, chemical compaction, dolomitization and structural fractures. According to the analysis, fabric-selective dissolution in meteoric water diagenetic environment, dolomitization in evaporative seawater environment, and non-fabric-selective dissolution, dolomitization and structural fractures in buried diagenetic environment are beneficial to the development of pores. However, cementation, compaction and chemical compaction in medium and deep burial environments, are unfavorable for the development of pores
- …