22 research outputs found
The multiple voices of musical emotions: source separation for improving music emotion recognition models and their interpretability
Despite the manifold developments in music emotion recognition and related areas, estimating the emotional impact of music still poses many challenges. These are often associated to the complexity of the acoustic codes to emotion and the lack of large amounts of data with robust golden standards. In this paper, we propose a new computational model (EmoMucs) that considers the role of different musical voices in the prediction of the emotions induced by music. We combine source separation algorithms for breaking up music signals into independent song elements (vocals, bass, drums, other) and end-to-end state-of-the-art machine learning techniques for feature extraction and emotion modelling (valence and arousal regression). Through a series of computational experiments on a benchmark dataset using source-specialised models trained independently and different fusion strategies, we demonstrate that EmoMucs outperforms state-of-the-art approaches with the advantage of providing insights into the relative contribution of different musical elements to the emotions perceived by listeners
Modelling long- and short-term structure in symbolic music with attention and recurrence
The automatic composition of music with long-term structure is a central problem in music generation. Neural network-based models have been shown to perform relatively well in melody generation, but generating music with long-term structure is still a major challenge. This paper introduces a new approach for music modelling that combines recent advancements of transformer models with recurrent networks – the long-short term universal transformer (LSTUT), and compare its ability to predict music against current state-of-the-art music models. Our experiments are designed to push the boundaries of music models on considerably long music sequences – a crucial requirement for learning long-term structure effectively. Results show that the LSTUT outperforms all the other models and can potentially learn features related to music structure at different time scales. Overall, we show the importance of integrating both recurrence and attention in the architecture of music models, and their potential use in future automatic composition systems
SOLIS: Autonomous Solubility Screening using Deep Neural Networks
Accelerating material discovery has tremendous societal and industrial
impact, particularly for pharmaceuticals and clean energy production. Many
experimental instruments have some degree of automation, facilitating
continuous running and higher throughput. However, it is common that sample
preparation is still carried out manually. This can result in researchers
spending a significant amount of their time on repetitive tasks, which
introduces errors and can prohibit production of statistically relevant data.
Crystallisation experiments are common in many chemical fields, both for
purification and in polymorph screening experiments. The initial step often
involves a solubility screen of the molecule; that is, understanding whether
molecular compounds have dissolved in a particular solvent. This usually can be
time consuming and work intensive. Moreover, accurate knowledge of the precise
solubility limit of the molecule is often not required, and simply measuring a
threshold of solubility in each solvent would be sufficient. To address this,
we propose a novel cascaded deep model that is inspired by how a human chemist
would visually assess a sample to determine whether the solid has completely
dissolved in the solution. In this paper, we design, develop, and evaluate the
first fully autonomous solubility screening framework, which leverages
state-of-the-art methods for image segmentation and convolutional neural
networks for image classification. To realise that, we first create a dataset
comprising different molecules and solvents, which is collected in a real-world
chemistry laboratory. We then evaluated our method on the data recorded through
an eye-in-hand camera mounted on a seven degree-of-freedom robotic manipulator,
and show that our model can achieve 99.13% test accuracy across various setups.Comment: 7 pages, 4 figure
Semantic Integration of MIR Datasets with the Polifonia Ontology Network
Integration between different data formats, and between data belonging to different collections, is an ongoing challenge in the MIR field. Semantic Web tools have proved to be promising resources for making different types of music information interoperable. However, the use of these technologies has so far been limited and scattered in the field. To address this, the Polifonia project is developing an ontological ecosystem that can cover a wide variety of musical aspects (musical features, instruments, emotions, performances). In this paper, we present the Polifonia Ontology Network, an ecosystem that enables and fosters the transition towards semantic MIR