2,069 research outputs found

    Mandarin Singing Voice Synthesis Based on Harmonic Plus Noise Model and Singing Expression Analysis

    Full text link
    The purpose of this study is to investigate how humans interpret musical scores expressively, and then design machines that sing like humans. We consider six factors that have a strong influence on the expression of human singing. The factors are related to the acoustic, phonetic, and musical features of a real singing signal. Given real singing voices recorded following the MIDI scores and lyrics, our analysis module can extract the expression parameters from the real singing signals semi-automatically. The expression parameters are used to control the singing voice synthesis (SVS) system for Mandarin Chinese, which is based on the harmonic plus noise model (HNM). The results of perceptual experiments show that integrating the expression factors into the SVS system yields a notable improvement in perceptual naturalness, clearness, and expressiveness. By one-to-one mapping of the real singing signal and expression controls to the synthesizer, our SVS system can simulate the interpretation of a real singer with the timbre of a speaker.Comment: 8 pages, technical repor

    Improving Automatic Jazz Melody Generation by Transfer Learning Techniques

    Full text link
    In this paper, we tackle the problem of transfer learning for Jazz automatic generation. Jazz is one of representative types of music, but the lack of Jazz data in the MIDI format hinders the construction of a generative model for Jazz. Transfer learning is an approach aiming to solve the problem of data insufficiency, so as to transfer the common feature from one domain to another. In view of its success in other machine learning problems, we investigate whether, and how much, it can help improve automatic music generation for under-resourced musical genres. Specifically, we use a recurrent variational autoencoder as the generative model, and use a genre-unspecified dataset as the source dataset and a Jazz-only dataset as the target dataset. Two transfer learning methods are evaluated using six levels of source-to-target data ratios. The first method is to train the model on the source dataset, and then fine-tune the resulting model parameters on the target dataset. The second method is to train the model on both the source and target datasets at the same time, but add genre labels to the latent vectors and use a genre classifier to improve Jazz generation. The evaluation results show that the second method seems to perform better overall, but it cannot take full advantage of the genre-unspecified dataset.Comment: 8 pages, Accepted to APSIPA ASC(Asia-Pacific Signal and Information Processing Association Annual Summit and Conference ) 201

    Estimating systemic fibrosis by combining galectin-3 and ST2 provides powerful risk stratification value for patients after acute decompensated heart failure

    Get PDF
    Background: Two fibrosis biomarkers, galectin-3 (Gal-3) and suppression of tumorigenicity 2 (ST2), provide prognostic value additive to natriuretic peptides and traditional risk factors in patients with heart failure (HF). However, it is to be investigated whether their combined measurement before discharge provides incremental risk stratification for patients after acute HF. Methods: A total of 344 patients with acute HF were analyzed with Gal-3, and ST2 measured. Patients were prospectively followed for 3.7 ± 1.3 years for deaths, and composite events (death/HF-related re-hospitalizations). Results: The levels of Gal-3 and ST2 were only slightly related (r = 0.20, p < 0.001). The medians of Gal-3 and ST2 were 18 ng/mL and 32.4 ng/mL, respectively. These biomarkers compensated each other and characterized patients with different risk factors. According to the cutoff at median values, patients were separated into four subgroups based on high and low Gal-3 (HG and LG, respectively) and ST2 levels (HS and LS, respectively). Kaplan-Meier survival curves showed that HGHS powerfully identified patients at risk of mortality (Log rank = 21.27, p < 0.001). In multivariable analysis, combined log(Gal-3) and log(ST2) was an in­dependent predictor. For composite events, Kaplan-Meier survival curves showed a lower event- -free survival rate in the HGHS subgroup compared to others (Log rank = 34.62, p < 0.001; HGHS vs. HGLS, Log rank = 4.00, p = 0.045). In multivariable analysis, combined log(Gal-3) and log(ST2) was also an independent predictor. Conclusions: Combination of biomarkers involving heterogeneous fibrosis pathways may identify patients with high systemic fibrosis, providing powerful risk stratification value

    CasNet: Investigating Channel Robustness for Speech Separation

    Full text link
    Recording channel mismatch between training and testing conditions has been shown to be a serious problem for speech separation. This situation greatly reduces the separation performance, and cannot meet the requirement of daily use. In this study, inheriting the use of our previously constructed TAT-2mix corpus, we address the channel mismatch problem by proposing a channel-aware audio separation network (CasNet), a deep learning framework for end-to-end time-domain speech separation. CasNet is implemented on top of TasNet. Channel embedding (characterizing channel information in a mixture of multiple utterances) generated by Channel Encoder is introduced into the separation module by the FiLM technique. Through two training strategies, we explore two roles that channel embedding may play: 1) a real-life noise disturbance, making the model more robust, or 2) a guide, instructing the separation model to retain the desired channel information. Experimental results on TAT-2mix show that CasNet trained with both training strategies outperforms the TasNet baseline, which does not use channel embeddings.Comment: Submitted to ICASSP 202
    • …
    corecore