119 research outputs found

    Mixed-EVC: Mixed Emotion Synthesis and Control in Voice Conversion

    Full text link
    Emotional voice conversion (EVC) traditionally targets the transformation of spoken utterances from one emotional state to another, with previous research mainly focusing on discrete emotion categories. This paper departs from the norm by introducing a novel perspective: a nuanced rendering of mixed emotions and enhancing control over emotional expression. To achieve this, we propose a novel EVC framework, Mixed-EVC, which only leverages discrete emotion training labels. We construct an attribute vector that encodes the relationships among these discrete emotions, which is predicted using a ranking-based support vector machine and then integrated into a sequence-to-sequence (seq2seq) EVC framework. Mixed-EVC not only learns to characterize the input emotional style but also quantifies its relevance to other emotions during training. As a result, users have the ability to assign these attributes to achieve their desired rendering of mixed emotions. Objective and subjective evaluations confirm the effectiveness of our approach in terms of mixed emotion synthesis and control while surpassing traditional baselines in the conversion of discrete emotions from one to another

    Progress in the seasonal variations of blood lipids: a mini-review.

    Get PDF
    The seasonal variations of blood lipids have recently gained increasing interest in this field of lipid metabolism. Elucidating the seasonal patterns of blood lipids is particularly helpful for the prevention and treatment of cardiovascular and cerebrovascular diseases. However, the previous results remain controversial and the underlying mechanisms are still unclear. This mini-review is focused on summarizing the literature relevant to the seasonal variability of blood lipid parameters, as well as on discussing its significance in clinical diagnoses and management decisions

    Long Short-term Memory with Two-Compartment Spiking Neuron

    Full text link
    The identification of sensory cues associated with potential opportunities and dangers is frequently complicated by unrelated events that separate useful cues by long delays. As a result, it remains a challenging task for state-of-the-art spiking neural networks (SNNs) to identify long-term temporal dependencies since bridging the temporal gap necessitates an extended memory capacity. To address this challenge, we propose a novel biologically inspired Long Short-Term Memory Leaky Integrate-and-Fire spiking neuron model, dubbed LSTM-LIF. Our model incorporates carefully designed somatic and dendritic compartments that are tailored to retain short- and long-term memories. The theoretical analysis further confirms its effectiveness in addressing the notorious vanishing gradient problem. Our experimental results, on a diverse range of temporal classification tasks, demonstrate superior temporal classification capability, rapid training convergence, strong network generalizability, and high energy efficiency of the proposed LSTM-LIF model. This work, therefore, opens up a myriad of opportunities for resolving challenging temporal processing tasks on emerging neuromorphic computing machines

    Sparse Classifier Fusion for Speaker Verification

    Full text link

    NIST 2007 Language Recognition Evaluation: From the Perspective of IIR

    Get PDF
    PACLIC / The University of the Philippines Visayas Cebu College Cebu City, Philippines / November 20-22, 200

    Independent language modeling architecture for end-to-end ASR

    Full text link
    The attention-based end-to-end (E2E) automatic speech recognition (ASR) architecture allows for joint optimization of acoustic and language models within a single network. However, in a vanilla E2E ASR architecture, the decoder sub-network (subnet), which incorporates the role of the language model (LM), is conditioned on the encoder output. This means that the acoustic encoder and the language model are entangled that doesn't allow language model to be trained separately from external text data. To address this problem, in this work, we propose a new architecture that separates the decoder subnet from the encoder output. In this way, the decoupled subnet becomes an independently trainable LM subnet, which can easily be updated using the external text data. We study two strategies for updating the new architecture. Experimental results show that, 1) the independent LM architecture benefits from external text data, achieving 9.3% and 22.8% relative character and word error rate reduction on Mandarin HKUST and English NSC datasets respectively; 2)the proposed architecture works well with external LM and can be generalized to different amount of labelled data

    Robust Speaker Verification Using Short-Time Frequency with Long-Time Window and Fusion of Multi-Resolutions

    Get PDF
    Abstract This study presents a novel approach of feature analysis to speaker verification. There are two main contributions in this paper. First, the feature analysis of short-time frequency with long-time window (SFLW) is a compact feature for the efficiency of speaker verification. The purpose of SFLW is to take account of short-time frequency characteristics and longtime resolution at the same time. Secondly, the fusion of multi-resolutions is used for the effectiveness of robust speaker verification. The speaker verification system can be further improved using multi-resolution features. The experimental results indicate that the proposed approaches not only speed up the processing time but also improve the performance of speaker verification

    Diagenesis of the first member of Canglangpu Formation of the Cambrian Terreneuvian in northern part of the central Sichuan Basin and its influence on porosity

    Get PDF
    In this paper, taking the first Member of the Canglangpu Formation of the Cambrian Terreneuvian in the northern central Sichuan Basin as an example, the diagenesis and its influence on porosity are systemically studied based on the observations and identifications of cores, casts and cathodoluminescence thin sections. The results show that the rock types of the first member of Canglangpu Formation are various, including mixed rocks, carbonate rocks and clastic rocks. The specific lithology is dominated by sand-bearing oolitic dolomite, sandy oolitic dolomite, sparry oolotic dolomite and fine-grained detrital sandstone. At the same time, the Cang 1 Member has experienced five types of diagenetic environments, including seawater, meteoric water, evaporative seawater, shallow burial, and medium-deep burial diagenetic environments. Moreover, the main diagenetic processes under different diagenetic environments include cementation, dissolution, compaction, chemical compaction, dolomitization and structural fractures. According to the analysis, fabric-selective dissolution in meteoric water diagenetic environment, dolomitization in evaporative seawater environment, and non-fabric-selective dissolution, dolomitization and structural fractures in buried diagenetic environment are beneficial to the development of pores. However, cementation, compaction and chemical compaction in medium and deep burial environments, are unfavorable for the development of pores
    corecore