88 research outputs found

    Revisiting Inter-Genre Similarity

    Get PDF

    Synthesis of variable dancing styles based on a compact spatiotemporal representation of dance

    Get PDF
    Dance as a complex expressive form of motion is able to convey emotion, meaning and social idiosyncrasies that opens channels for non-verbal communication, and promotes rich cross-modal interactions with music and the environment. As such, realistic dancing characters may incorporate crossmodal information and variability of the dance forms through compact representations that may describe the movement structure in terms of its spatial and temporal organization. In this paper, we propose a novel method for synthesizing beatsynchronous dancing motions based on a compact topological model of dance styles, previously captured with a motion capture system. The model was based on the Topological Gesture Analysis (TGA) which conveys a discrete three-dimensional point-cloud representation of the dance, by describing the spatiotemporal variability of its gestural trajectories into uniform spherical distributions, according to classes of the musical meter. The methodology for synthesizing the modeled dance traces back the topological representations, constrained with definable metrical and spatial parameters, into complete dance instances whose variability is controlled by stochastic processes that considers both TGA distributions and the kinematic constraints of the body morphology. In order to assess the relevance and flexibility of each parameter into feasibly reproducing the style of the captured dance, we correlated both captured and synthesized trajectories of samba dancing sequences in relation to the level of compression of the used model, and report on a subjective evaluation over a set of six tests. The achieved results validated our approach, suggesting that a periodic dancing style, and its musical synchrony, can be feasibly reproduced from a suitably parametrized discrete spatiotemporal representation of the gestural motion trajectories, with a notable degree of compression

    Studio report : DIGITÓPIA at casa da Musica

    Get PDF
    Today's increasing availability of free music software and musical content. Digitópia, a platform for collaborative music creation recently started at Casa da Música, Porto's main concert venue, addresses how these trends can affect generalized music creation and music software design, promote social inclusion, and lead to the emergence of multicultural communities of music makers/lovers. In this paper, we report on the musical activities conducted in the first months of existence of Digitópia, and highlight some developments for the future

    Visualizing networks of music artists with RAMA

    Get PDF
    In this paper we present RAMA (Relational Artist MAps), a simple yet efficient interface to navigate throughnetworks of music artists. RAMA is built upon a dataset of artist similarity and user-defined tags regarding583.000 artists gathered from Last.fm. This third-party, publicly available, data about artists similarity andartists tags is used to produce a visualization of artists relations. RAMA provides two simultaneous layers ofinformation: (i) a graph built from artist similarity data, and (ii) overlaid labels containing user-defined tags.Differing from existing artist network visualization tools, the proposed prototype emphasizes commonalitiesas well as main differences between artist categorizations derived from user-defined tags, hence providingenhanced browsing experiences to users.In this paper we present RAMA (Relational Artist MAps), a simple yet efficient interface to navigate through networks of music artists. RAMA is built upon a dataset of artist similarity and user-defined tags regarding 583.000 artists gathered from Last.fm. This third-party, publicly available, data about artists similarity and artists tags is used to produce a visualization of artists relations. RAMA provides two simultaneous layers of information: (i) a graph built from artist similarity data, and (ii) overlaid labels containing user-defined tags. Differing from existing artist network visualization tools, the proposed prototype emphasizes commonalities as well as main differences between artist categorizations derived from user-defined tags, hence providing enhanced browsing experiences to users

    Multidimensional microtiming in Samba music

    Get PDF
    The connection of “groove” with low-level features in the audio signal has been mostly associated with temporal characteristics of fast metrical structures. However, the production and perception of rhythm in Afro-Brazilian contexts is often described as a result of multiple experience flows, which expands the description of rhythmical events to multiple features such as loudness, spectrum regions, metrical layers, movement and others. In this study, we analyzed how the microtiming of samba music interacts with an expanded set of musical descriptors. More specifically, we analyzed the interaction between fast timing structures with meter, intensity and spectral distribution within the auditory domain. The methodology for feature detection was supported by a psychoacoustically based auditory model, which provided the low-level descriptors for a database of 106 samba music excerpts. A cluster analysis technique was used to provide an overview of emergent microtiming models present in the features. The results confirm findings of previous studies in the field but introduce new systematic devices that may characterize microtiming in samba music. Systematic models of interactions between microtiming, amplitude, metrical structure and spectral distribution seem to be available in the structure of low-level auditory descriptors used in the methodology

    Short-term Feature Space and Music Genre Classification

    Get PDF
    In music genre classification, most approaches rely on statistical characteristics of low-level features computed on short audio frames. In these methods, it is implicitly considered that frames carry equally relevant information loads and that either individual frames, or distributions thereof, somehow capture the specificities of each genre. In this paper we study the representation space defined by short-term audio features with respect to class boundaries, and compare different processing techniques to partition this space. These partitions are evaluated in terms of accuracy on two genre classification tasks, with several types of classifiers. Experiments show that a randomized and unsupervised partition of the space, used in conjunction with a Markov Model classifier lead to accuracies comparable to the state of the art. We also show that unsupervised partitions of the space tend to create less hubs

    Contrastive Learning for Cross-modal Artist Retrieval

    Full text link
    Music retrieval and recommendation applications often rely on content features encoded as embeddings, which provide vector representations of items in a music dataset. Numerous complementary embeddings can be derived from processing items originally represented in several modalities, e.g., audio signals, user interaction data, or editorial data. However, data of any given modality might not be available for all items in any music dataset. In this work, we propose a method based on contrastive learning to combine embeddings from multiple modalities and explore the impact of the presence or absence of embeddings from diverse modalities in an artist similarity task. Experiments on two datasets suggest that our contrastive method outperforms single-modality embeddings and baseline algorithms for combining modalities, both in terms of artist retrieval accuracy and coverage. Improvements with respect to other methods are particularly significant for less popular query artists. We demonstrate our method successfully combines complementary information from diverse modalities, and is more robust to missing modality data (i.e., it better handles the retrieval of artists with different modality embeddings than the query artist's)

    Bootstrapping a Music Voice Assistant with Weak Supervision

    Get PDF
    One of the first building blocks to create a voice assistant relates to the task of tagging entities or attributes in user queries. This can be particularly challenging when entities are in the tenth of millions, as is the case of e.g. music catalogs. Training slot tagging models at an industrial scale requires large quantities of accurately labeled user queries, which are often hard and costly to gather. On the other hand, voice assistants typically collect plenty of unlabeled queries that often remain unexploited. This paper presents a weakly-supervised methodology to label large amounts of voice query logs, enhanced with a manual filtering step. Our experimental evaluations show that slot tagging models trained on weakly-supervised data outperform models trained on hand-annotated or synthetic data, at a lower cost. Further, manual filtering of weakly-supervised data leads to a very significant reduction in Sentence Error Rate, while allowing us to drastically reduce human curation efforts from weeks to hours, with respect to hand-annotation of queries. The method is applied to successfully bootstrap a slot tagging system for a major music streaming service that currently serves several tens of thousands of daily voice queries

    Semi-Automatic Ambiance Genereation

    Get PDF
    Ambiances are background recordings of places used in audiovisual productions to make listeners feel they are in places like a pub or a farm. Accessing to commercially available atmosphere libraries is a convenient alternative to sending teams to record ambiances yet they limit the creation in different ways. First, they are already mixed, which reduces the flexibility to add, remove sounds or change the panning. Secondly the number of ambient libraries is limited. We propose a semi-automatic system for ambient generation. The system creates ambiances on demand given textual queries by fetching relevant sounds from a big sound effect database and delivering them into a sequencer multitrack project. Ambiances of diverse nature can be created easily. Controls are offered to the user to further specify its needs

    Supervised and Unsupervised Learning of Audio Representations for Music Understanding

    Full text link
    In this work, we provide a broad comparative analysis of strategies for pre-training audio understanding models for several tasks in the music domain, including labelling of genre, era, origin, mood, instrumentation, key, pitch, vocal characteristics, tempo and sonority. Specifically, we explore how the domain of pre-training datasets (music or generic audio) and the pre-training methodology (supervised or unsupervised) affects the adequacy of the resulting audio embeddings for downstream tasks. We show that models trained via supervised learning on large-scale expert-annotated music datasets achieve state-of-the-art performance in a wide range of music labelling tasks, each with novel content and vocabularies. This can be done in an efficient manner with models containing less than 100 million parameters that require no fine-tuning or reparameterization for downstream tasks, making this approach practical for industry-scale audio catalogs. Within the class of unsupervised learning strategies, we show that the domain of the training dataset can significantly impact the performance of representations learned by the model. We find that restricting the domain of the pre-training dataset to music allows for training with smaller batch sizes while achieving state-of-the-art in unsupervised learning -- and in some cases, supervised learning -- for music understanding. We also corroborate that, while achieving state-of-the-art performance on many tasks, supervised learning can cause models to specialize to the supervised information provided, somewhat compromising a model's generality
    corecore