1,049 research outputs found

    Efficient Supervised Training of Audio Transformers for Music Representation Learning

    Full text link
    In this work, we address music representation learning using convolution-free transformers. We build on top of existing spectrogram-based audio transformers such as AST and train our models on a supervised task using patchout training similar to PaSST. In contrast to previous works, we study how specific design decisions affect downstream music tagging tasks instead of focusing on the training task. We assess the impact of initializing the models with different pre-trained weights, using various input audio segment lengths, using learned representations from different blocks and tokens of the transformer for downstream tasks, and applying patchout at inference to speed up feature extraction. We find that 1) initializing the model from ImageNet or AudioSet weights and using longer input segments are beneficial both for the training and downstream tasks, 2) the best representations for the considered downstream tasks are located in the middle blocks of the transformer, and 3) using patchout at inference allows faster processing than our convolutional baselines while maintaining superior performance. The resulting models, MAEST, are publicly available and obtain the best performance among open models in music tagging tasks.Comment: Accepted at the 2023 International Society for Music Information Retrieval Conference (ISMIR'23

    mir_ref: A Representation Evaluation Framework for Music Information Retrieval Tasks

    Full text link
    Music Information Retrieval (MIR) research is increasingly leveraging representation learning to obtain more compact, powerful music audio representations for various downstream MIR tasks. However, current representation evaluation methods are fragmented due to discrepancies in audio and label preprocessing, downstream model and metric implementations, data availability, and computational resources, often leading to inconsistent and limited results. In this work, we introduce mir_ref, an MIR Representation Evaluation Framework focused on seamless, transparent, local-first experiment orchestration to support representation development. It features implementations of a variety of components such as MIR datasets, tasks, embedding models, and tools for result analysis and visualization, while facilitating the implementation of custom components. To demonstrate its utility, we use it to conduct an extensive evaluation of several embedding models across various tasks and datasets, including evaluating their robustness to various audio perturbations and the ease of extracting relevant information from them.Comment: Machine Learning for Audio Workshop, Neural Information Processing Systems (NeurIPS) 2023, New Orleans, L

    INDUSTRIAL CLUSTERS IN THE DEVELOPING COUNTRIES: A CASE STUDY OF THE CHILE FISH CLUSTER

    Get PDF
    The issue of existence of an industrial cluster should be correlated with the stage of life cycle of basic industry production. The technological cycle of some industries lasts decades, and the life cycle of industrial cluster can be shorter. While developing consequent stages of the life cycle of the cluster it is necessary to consider the factor of availability of resources or the duration of exploitation, created for functioning of this sector of infrastructure. The development of practically all industrial clusters took a lot of time, the development of the Russian clusters is calculated for 5–10 years. In the world practice there are no cases of creating efficient industrial companies in the developing countries over such periods. There is often a need to use ready objects of infrastructure, but even in this case it may need renovation. Therefore, at the stage of formation, it is desirable to carry out the minimum number of investments that are mainly associated with the creation of the necessary infrastructure and complementary set of institutional conditions that maximize the use of existing infrastructure

    VALUE: EMPIRICS AND THEORY

    Get PDF
    Disclosure of the essence of the category "value" is not a prerequisite for its use: the introduction of this concept is correct and simple a priori. At the same time, it is obviously desirable. Objectives: 1) to show the connection between the concept of "value" and the empirical level; 2)to justify the interpretation of value from the perspective of information theory. Results. Manifestations of the "value" category in economic practice: the notions of "fair price", "reasonable profit", statistics of input-output balances, the world practice of planning of long-term energy production projects, etc. An adequate interpretation of value is the information concept: at the basis of value as the result of labor, and the rarity of a thing is information. Value is an information measure of the object’s worth. The complexity of operationalization does not ensue the unscientific (metaphysical) character of the concept of "value" as such. From the recognition of value as an objective basis for the observable price phenomenon, there appear very specific consequences: 1) feedback through the market must have an objective basis as a starting measure, that is, money must have a standard; 2) in some cases direct pricing (and / or their directive definition) is justified and appropriate

    How Low Can You Go? Reducing Frequency and Time Resolution in Current CNN Architectures for Music Auto-tagging

    Full text link
    Automatic tagging of music is an important research topic in Music Information Retrieval and audio analysis algorithms proposed for this task have achieved improvements with advances in deep learning. In particular, many state-of-the-art systems use Convolutional Neural Networks and operate on mel-spectrogram representations of the audio. In this paper, we compare commonly used mel-spectrogram representations and evaluate model performances that can be achieved by reducing the input size in terms of both lesser amount of frequency bands and larger frame rates. We use the MagnaTagaTune dataset for comprehensive performance comparisons and then compare selected configurations on the larger Million Song Dataset. The results of this study can serve researchers and practitioners in their trade-off decision between accuracy of the models, data storage size and training and inference times.Comment: The 28th European Signal Processing Conference (EUSIPCO

    Bad Directions in Cryptographic Hash Functions

    Full text link
    A 25-gigabyte "point obfuscation" challenge "using security parameter 60" was announced at the Crypto 2015 rump session; "point obfuscation" is another name for password hashing. This paper shows that the particular matrix-multiplication hash function used in the challenge is much less secure than previous password-hashing functions are believed to be. This paper's attack algorithm broke the challenge in just 19 minutes using a cluster of 21 PCs. Keywords: symmetric cryptography, hash functions, password hashing, point obfuscation, matrix multiplication, meet-in-the-middle attacks, meet-in-many-middles attack

    Pre-Training Strategies Using Contrastive Learning and Playlist Information for Music Classification and Similarity

    Full text link
    In this work, we investigate an approach that relies on contrastive learning and music metadata as a weak source of supervision to train music representation models. Recent studies show that contrastive learning can be used with editorial metadata (e.g., artist or album name) to learn audio representations that are useful for different classification tasks. In this paper, we extend this idea to using playlist data as a source of music similarity information and investigate three approaches to generate anchor and positive track pairs. We evaluate these approaches by fine-tuning the pre-trained models for music multi-label classification tasks (genre, mood, and instrument tagging) and music similarity. We find that creating anchor and positive track pairs by relying on co-occurrences in playlists provides better music similarity and competitive classification results compared to choosing tracks from the same artist as in previous works. Additionally, our best pre-training approach based on playlists provides superior classification performance for most datasets.Comment: Accepted at the 2023 International Conference on Acoustics, Speech, and Signal Processing (ICASSP'23
    • …
    corecore