Search CORE

397 research outputs found

Deep Learning Techniques for Music Generation -- A Survey

Author: Briot Jean-Pierre
Hadjeres Gaëtan
Pachet François-David
Publication venue
Publication date: 23/03/2019
Field of study

This paper is a survey and an analysis of different ways of using deep learning (deep artificial neural networks) to generate musical content. We propose a methodology based on five dimensions for our analysis: Objective - What musical content is to be generated? Examples are: melody, polyphony, accompaniment or counterpoint. - For what destination and for what use? To be performed by a human(s) (in the case of a musical score), or by a machine (in the case of an audio file). Representation - What are the concepts to be manipulated? Examples are: waveform, spectrogram, note, chord, meter and beat. - What format is to be used? Examples are: MIDI, piano roll or text. - How will the representation be encoded? Examples are: scalar, one-hot or many-hot. Architecture - What type(s) of deep neural network is (are) to be used? Examples are: feedforward network, recurrent network, autoencoder or generative adversarial networks. Challenge - What are the limitations and open challenges? Examples are: variability, interactivity and creativity. Strategy - How do we model and control the process of generation? Examples are: single-step feedforward, iterative feedforward, sampling or input manipulation. For each dimension, we conduct a comparative analysis of various models and techniques and we propose some tentative multidimensional typology. This typology is bottom-up, based on the analysis of many existing deep-learning based systems for music generation selected from the relevant literature. These systems are described and are used to exemplify the various choices of objective, representation, architecture, challenge and strategy. The last section includes some discussion and some prospects.Comment: 209 pages. This paper is a simplified version of the book: J.-P. Briot, G. Hadjeres and F.-D. Pachet, Deep Learning Techniques for Music Generation, Computational Synthesis and Creative Systems, Springer, 201

arXiv.org e-Print Archive

Automatically generated summaries of sports videos based on semantic content

Author: Miguel André Almeida Tomás Ferreira de Barros
Publication venue
Publication date: 18/07/2019
Field of study

The sport has been a part of our lives since the beginning of times, whether we are spectators or participants. The diffusion and increase of multimedia platforms made the consumption of these contents available to everyone. Sports videos appeal to a large population all around the world and have become an important form of multimedia content that is streamed over the Internet and television networks. Moreover, sport content creators want to provide the users with relevant information such as live commentary, summarization of the games in form of text or video using automatic tools.As a result, MOG-Technologies wants to create a tool capable of summarizing football matches based on semantic content, and this problem was explored in the scope of this Dissertation. The main objective is to convert the television football commentator's speech into text taking advantage of Google's Speech-to-Text tool. Several machine learning models were then tested to classify sentences into important events. For the model training, a dataset was created, combining 43 games transcription from different television channels also from 72 games provided by Google Search timeline commentary, the combined dataset contains 3260 sentences. To validate the proposed solution the accuracy and f1 score were extracted for each machine learning model.The results show that the developed tool is capable of predicting events in live events, with low error rate. Also, combining multiple sources, not only the sport commentator speech, will help to increase the performance of the tool. It is important to notice that the dataset created during this Dissertation will allow MOG-Technologies to expand and perfect the concept discussed in this project

Multilingual opinion mining

Author: García Pablos Aitor
Publication venue
Publication date: 01/01/2017
Field of study

170 p.Cada día se genera gran cantidad de texto en diferentes medios online. Gran parte de ese texto contiene opiniones acerca de multitud de entidades, productos, servicios, etc. Dada la creciente necesidad de disponer de medios automatizados para analizar, procesar y explotar esa información, las técnicas de análisis de sentimiento han recibido gran cantidad de atención por parte de la industria y la comunidad científica durante la última década y media. No obstante, muchas de las técnicas empleadas suelen requerir de entrenamiento supervisado utilizando para ello ejemplos anotados manualmente, u otros recursos lingüísticos relacionados con un idioma o dominio de aplicación específicos. Esto limita la aplicación de este tipo de técnicas, ya que dicho recursos y ejemplos anotados no son sencillos de obtener. En esta tesis se explora una serie de métodos para realizar diversos análisis automáticos de texto en el marco del análisis de sentimiento, incluyendo la obtención automática de términos de un dominio, palabras que expresan opinión, polaridad del sentimiento de dichas palabras (positivas o negativas), etc. Finalmente se propone y se evalúa un método que combina representación continua de palabras (continuous word embeddings) y topic-modelling inspirado en la técnica de Latent Dirichlet Allocation (LDA), para obtener un sistema de análisis de sentimiento basado en aspectos (ABSA), que sólo necesita unas pocas palabras semilla para procesar textos de un idioma o dominio determinados. De este modo, la adaptación a otro idioma o dominio se reduce a la traducción de las palabras semilla correspondientes

Archivo Digital para la Docencia y la Investigación

Identification Of Streptococcus Pyogenes Using Raman Spectroscopy

Author: Majidi Ehsan
Publication venue: DigitalCommons@WayneState
Publication date: 01/01/2018
Field of study

Despite the attention that Raman Spectroscopy has gained recently in the area of pathogen identification, the spectra analyses techniques are not well developed. In most scenarios, they rely on expert intervention to detect and assign the peaks of the spectra to specific molecular vibration. Although some investigators have used machine-learning techniques to classify pathogens, these studies are usually limited to a specific application, and the generalization of these techniques is not clear. Also, a wide range of algorithms have been developed for classification problems, however, there is less insight to applying such methods on Raman spectra. Furthermore, analyzing the Raman spectra requires pre-processing of the raw spectra, in particular, background removing. Various techniques are developed to remove the background of the raw spectra accurately and with or without less expert intervention. Nevertheless, as the background of the spectra varies in the different media, these methods still require expert effort adding complexity and inefficiency to the identification task. This dissertation describes the development of state-of-the-art classification techniques to identify S. pyogenes from other species, including water and other confounding background pathogens. We compared these techniques in terms of their classification accuracy, sensitivity, and specificity in addition to providing a bias-variance insight in selecting the number of principal components in a principal component analysis (PCA). It was observed that Random Forest provided a better result with an accuracy of 94.11%. Next, a novel deep learning technique was developed to remove the background of the Raman spectra and then identify the pathogen. The architecture of the network was discussed and it was found that this method yields an accuracy of 100% in our test samples. This outperforms other traditional machine learning techniques as discussed. In clinical applications of Raman Spectroscopy, the samples have confounding background creates a challenging task for the removal of the spectral background and subsequent identification of the pathogen in real- time. We tested our methodology on datasets composed of confounding background such as throat swabs from patients and discussed the robustness and generalization of the developed method. It was found that the misclassification error of the test dataset was around 3.7%. Also, the realization of the trained model is discussed in detail to provide a better understating and insight into the efficacy of the deep learning architecture. This technique provides a platform for general analysis of other pathogens in confounding environments as well

Digital Commons@Wayne State University