2,069 research outputs found
Mandarin Singing Voice Synthesis Based on Harmonic Plus Noise Model and Singing Expression Analysis
The purpose of this study is to investigate how humans interpret musical
scores expressively, and then design machines that sing like humans. We
consider six factors that have a strong influence on the expression of human
singing. The factors are related to the acoustic, phonetic, and musical
features of a real singing signal. Given real singing voices recorded following
the MIDI scores and lyrics, our analysis module can extract the expression
parameters from the real singing signals semi-automatically. The expression
parameters are used to control the singing voice synthesis (SVS) system for
Mandarin Chinese, which is based on the harmonic plus noise model (HNM). The
results of perceptual experiments show that integrating the expression factors
into the SVS system yields a notable improvement in perceptual naturalness,
clearness, and expressiveness. By one-to-one mapping of the real singing signal
and expression controls to the synthesizer, our SVS system can simulate the
interpretation of a real singer with the timbre of a speaker.Comment: 8 pages, technical repor
Improving Automatic Jazz Melody Generation by Transfer Learning Techniques
In this paper, we tackle the problem of transfer learning for Jazz automatic
generation. Jazz is one of representative types of music, but the lack of Jazz
data in the MIDI format hinders the construction of a generative model for
Jazz. Transfer learning is an approach aiming to solve the problem of data
insufficiency, so as to transfer the common feature from one domain to another.
In view of its success in other machine learning problems, we investigate
whether, and how much, it can help improve automatic music generation for
under-resourced musical genres. Specifically, we use a recurrent variational
autoencoder as the generative model, and use a genre-unspecified dataset as the
source dataset and a Jazz-only dataset as the target dataset. Two transfer
learning methods are evaluated using six levels of source-to-target data
ratios. The first method is to train the model on the source dataset, and then
fine-tune the resulting model parameters on the target dataset. The second
method is to train the model on both the source and target datasets at the same
time, but add genre labels to the latent vectors and use a genre classifier to
improve Jazz generation. The evaluation results show that the second method
seems to perform better overall, but it cannot take full advantage of the
genre-unspecified dataset.Comment: 8 pages, Accepted to APSIPA ASC(Asia-Pacific Signal and Information
Processing Association Annual Summit and Conference ) 201
Estimating systemic fibrosis by combining galectin-3 and ST2 provides powerful risk stratification value for patients after acute decompensated heart failure
Background: Two fibrosis biomarkers, galectin-3 (Gal-3) and suppression of tumorigenicity 2 (ST2), provide prognostic value additive to natriuretic peptides and traditional risk factors in patients with heart failure (HF). However, it is to be investigated whether their combined measurement before discharge provides incremental risk stratification for patients after acute HF.
Methods: A total of 344 patients with acute HF were analyzed with Gal-3, and ST2 measured. Patients were prospectively followed for 3.7 ± 1.3 years for deaths, and composite events (death/HF-related re-hospitalizations).
Results: The levels of Gal-3 and ST2 were only slightly related (r = 0.20, p < 0.001). The medians of Gal-3 and ST2 were 18 ng/mL and 32.4 ng/mL, respectively. These biomarkers compensated each other and characterized patients with different risk factors. According to the cutoff at median values, patients were separated into four subgroups based on high and low Gal-3 (HG and LG, respectively) and ST2 levels (HS and LS, respectively). Kaplan-Meier survival curves showed that HGHS powerfully identified patients at risk of mortality (Log rank = 21.27, p < 0.001). In multivariable analysis, combined log(Gal-3) and log(ST2) was an inÂdependent predictor. For composite events, Kaplan-Meier survival curves showed a lower event- -free survival rate in the HGHS subgroup compared to others (Log rank = 34.62, p < 0.001; HGHS vs. HGLS, Log rank = 4.00, p = 0.045). In multivariable analysis, combined log(Gal-3) and log(ST2) was also an independent predictor.
Conclusions: Combination of biomarkers involving heterogeneous fibrosis pathways may identify patients with high systemic fibrosis, providing powerful risk stratification value
CasNet: Investigating Channel Robustness for Speech Separation
Recording channel mismatch between training and testing conditions has been
shown to be a serious problem for speech separation. This situation greatly
reduces the separation performance, and cannot meet the requirement of daily
use. In this study, inheriting the use of our previously constructed TAT-2mix
corpus, we address the channel mismatch problem by proposing a channel-aware
audio separation network (CasNet), a deep learning framework for end-to-end
time-domain speech separation. CasNet is implemented on top of TasNet. Channel
embedding (characterizing channel information in a mixture of multiple
utterances) generated by Channel Encoder is introduced into the separation
module by the FiLM technique. Through two training strategies, we explore two
roles that channel embedding may play: 1) a real-life noise disturbance, making
the model more robust, or 2) a guide, instructing the separation model to
retain the desired channel information. Experimental results on TAT-2mix show
that CasNet trained with both training strategies outperforms the TasNet
baseline, which does not use channel embeddings.Comment: Submitted to ICASSP 202
- …