Search CORE

566 research outputs found

The Effect of Explicit Structure Encoding of Deep Neural Networks for Symbolic Music Generation

Author: Chen Ke
Dubnov Shlomo
Li Wei
Xia Gus
Zhang Weilin
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 24/01/2019
Field of study

With recent breakthroughs in artificial neural networks, deep generative models have become one of the leading techniques for computational creativity. Despite very promising progress on image and short sequence generation, symbolic music generation remains a challenging problem since the structure of compositions are usually complicated. In this study, we attempt to solve the melody generation problem constrained by the given chord progression. This music meta-creation problem can also be incorporated into a plan recognition system with user inputs and predictive structural outputs. In particular, we explore the effect of explicit architectural encoding of musical structure via comparing two sequential generative models: LSTM (a type of RNN) and WaveNet (dilated temporal-CNN). As far as we know, this is the first study of applying WaveNet to symbolic music generation, as well as the first systematic comparison between temporal-CNN and RNN for music generation. We conduct a survey for evaluation in our generations and implemented Variable Markov Oracle in music pattern discovery. Experimental results show that to encode structure more explicitly using a stack of dilated convolution layers improved the performance significantly, and a global encoding of underlying chord progression into the generation procedure gains even more.Comment: 8 pages, 13 figure

arXiv.org e-Print Archive

Crossref

Exploring efficient neural architectures for linguistic-acoustic mapping in text-to-speech

Author: Bonafonte Cávez Antonio
Pascual de la Puente Santiago
Serra Joan
Publication venue: 'MDPI AG'
Publication date: 01/01/2019
Field of study

Conversion from text to speech relies on the accurate mapping from linguistic to acoustic symbol sequences, for which current practice employs recurrent statistical models such as recurrent neural networks. Despite the good performance of such models (in terms of low distortion in the generated speech), their recursive structure with intermediate affine transformations tends to make them slow to train and to sample from. In this work, we explore two different mechanisms that enhance the operational efficiency of recurrent neural networks, and study their performance–speed trade-off. The first mechanism is based on the quasi-recurrent neural network, where expensive affine transformations are removed from temporal connections and placed only on feed-forward computational directions. The second mechanism includes a module based on the transformer decoder network, designed without recurrent connections but emulating them with attention and positioning codes. Our results show that the proposed decoder networks are competitive in terms of distortion when compared to a recurrent baseline, whilst being significantly faster in terms of CPU and GPU inference time. The best performing model is the one based on the quasi-recurrent mechanism, reaching the same level of naturalness as the recurrent neural network based model with a speedup of 11.2 on CPU and 3.3 on GPU.Peer ReviewedPostprint (published version

Multidisciplinary Digital Publishing Institute

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

UPCommons. Portal del coneixement obert de la UPC

Unidirectional-bidirectional recurrent networks for cardiac disorders classification

Author: Darmawahyuni Annisa
Firdaus Firdaus
Nurmaini Siti
Rachmatullah Muhammad Naufal
Tutuko Bambang
Publication venue: 'Universitas Ahmad Dahlan'
Publication date: 01/06/2021
Field of study

The deep learning approach of supervised recurrent network classifiers model, i.e., recurrent neural networks (RNNs), long short-term memory (LSTM), and gated recurrent units (GRUs) are used in this study. The unidirectional and bidirectional for each cardiac disorder (CDs) class is also compared. Comparing both phases is needed to figure out the optimum phase and the best model performance for ECG using the Physionet dataset to classify five classes of CDs with 15 leads ECG signals. The result shows that the bidirectional RNNs method produces better results than the unidirectional method. In contrast to RNNs, the unidirectional LSTM and GRU outperformed the bidirectional phase. The best recurrent network classifier performance is unidirectional GRU with average accuracy, sensitivity, specificity, precision, and F1-score of 98.50%, 95.54%, 98.42%, 89.93% 92.31%, respectively. Overall, deep learning is a promising improved method for ECG classification

Journal of Education and Learning (EduLearn)

TELKOMNIKA (Telecommunication Computing Electronics and Control)

UAD Journal Management System