1,049 research outputs found
Efficient Supervised Training of Audio Transformers for Music Representation Learning
In this work, we address music representation learning using convolution-free
transformers. We build on top of existing spectrogram-based audio transformers
such as AST and train our models on a supervised task using patchout training
similar to PaSST. In contrast to previous works, we study how specific design
decisions affect downstream music tagging tasks instead of focusing on the
training task. We assess the impact of initializing the models with different
pre-trained weights, using various input audio segment lengths, using learned
representations from different blocks and tokens of the transformer for
downstream tasks, and applying patchout at inference to speed up feature
extraction. We find that 1) initializing the model from ImageNet or AudioSet
weights and using longer input segments are beneficial both for the training
and downstream tasks, 2) the best representations for the considered downstream
tasks are located in the middle blocks of the transformer, and 3) using
patchout at inference allows faster processing than our convolutional baselines
while maintaining superior performance. The resulting models, MAEST, are
publicly available and obtain the best performance among open models in music
tagging tasks.Comment: Accepted at the 2023 International Society for Music Information
Retrieval Conference (ISMIR'23
mir_ref: A Representation Evaluation Framework for Music Information Retrieval Tasks
Music Information Retrieval (MIR) research is increasingly leveraging
representation learning to obtain more compact, powerful music audio
representations for various downstream MIR tasks. However, current
representation evaluation methods are fragmented due to discrepancies in audio
and label preprocessing, downstream model and metric implementations, data
availability, and computational resources, often leading to inconsistent and
limited results. In this work, we introduce mir_ref, an MIR Representation
Evaluation Framework focused on seamless, transparent, local-first experiment
orchestration to support representation development. It features
implementations of a variety of components such as MIR datasets, tasks,
embedding models, and tools for result analysis and visualization, while
facilitating the implementation of custom components. To demonstrate its
utility, we use it to conduct an extensive evaluation of several embedding
models across various tasks and datasets, including evaluating their robustness
to various audio perturbations and the ease of extracting relevant information
from them.Comment: Machine Learning for Audio Workshop, Neural Information Processing
Systems (NeurIPS) 2023, New Orleans, L
INDUSTRIAL CLUSTERS IN THE DEVELOPING COUNTRIES: A CASE STUDY OF THE CHILE FISH CLUSTER
The issue of existence of an industrial cluster should be correlated with the stage of life cycle of basic industry production. The technological cycle of some industries lasts decades, and the life cycle of industrial cluster can be shorter. While developing consequent stages of the life cycle of the cluster it is necessary to consider the factor of availability of resources or the duration of exploitation, created for functioning of this sector of infrastructure.
The development of practically all industrial clusters took a lot of time, the development of the Russian clusters is calculated for 5–10 years. In the world practice there are no cases of creating efficient industrial companies in the developing countries over such periods. There is often a need to use ready objects of infrastructure, but even in this case it may need renovation. Therefore, at the stage of formation, it is desirable to carry out the minimum number of investments that are mainly associated with the creation of the necessary infrastructure and complementary set of institutional conditions that maximize the use of existing infrastructure
VALUE: EMPIRICS AND THEORY
Disclosure of the essence of the category "value" is not a prerequisite for its use: the introduction of this concept is correct and simple a priori. At the same time, it is obviously desirable. Objectives: 1) to show the connection between the concept of "value" and the empirical level; 2)to justify the interpretation of value from the perspective of information theory. Results. Manifestations of the "value" category in economic practice: the notions of "fair price", "reasonable profit", statistics of input-output balances, the world practice of planning of long-term energy production projects, etc. An adequate interpretation of value is the information concept: at the basis of value as the result of labor, and the rarity of a thing is information. Value is an information measure of the object’s worth. The complexity of operationalization does not ensue the unscientific (metaphysical) character of the concept of "value" as such. From the recognition of value as an objective basis for the observable price phenomenon, there appear very specific consequences: 1) feedback through the market must have an objective basis as a starting measure, that is, money must have a standard; 2) in some cases direct pricing (and / or their directive definition) is justified and appropriate
How Low Can You Go? Reducing Frequency and Time Resolution in Current CNN Architectures for Music Auto-tagging
Automatic tagging of music is an important research topic in Music
Information Retrieval and audio analysis algorithms proposed for this task have
achieved improvements with advances in deep learning. In particular, many
state-of-the-art systems use Convolutional Neural Networks and operate on
mel-spectrogram representations of the audio. In this paper, we compare
commonly used mel-spectrogram representations and evaluate model performances
that can be achieved by reducing the input size in terms of both lesser amount
of frequency bands and larger frame rates. We use the MagnaTagaTune dataset for
comprehensive performance comparisons and then compare selected configurations
on the larger Million Song Dataset. The results of this study can serve
researchers and practitioners in their trade-off decision between accuracy of
the models, data storage size and training and inference times.Comment: The 28th European Signal Processing Conference (EUSIPCO
Bad Directions in Cryptographic Hash Functions
A 25-gigabyte "point obfuscation" challenge "using security parameter 60" was announced at the Crypto 2015 rump session; "point obfuscation" is another name for password hashing. This paper shows that the particular matrix-multiplication hash function used in the challenge is much less secure than previous password-hashing functions are believed to be. This paper's attack algorithm broke the challenge in just 19 minutes using a cluster of 21 PCs. Keywords: symmetric cryptography, hash functions, password hashing, point obfuscation, matrix multiplication, meet-in-the-middle attacks, meet-in-many-middles attack
Pre-Training Strategies Using Contrastive Learning and Playlist Information for Music Classification and Similarity
In this work, we investigate an approach that relies on contrastive learning
and music metadata as a weak source of supervision to train music
representation models. Recent studies show that contrastive learning can be
used with editorial metadata (e.g., artist or album name) to learn audio
representations that are useful for different classification tasks. In this
paper, we extend this idea to using playlist data as a source of music
similarity information and investigate three approaches to generate anchor and
positive track pairs. We evaluate these approaches by fine-tuning the
pre-trained models for music multi-label classification tasks (genre, mood, and
instrument tagging) and music similarity. We find that creating anchor and
positive track pairs by relying on co-occurrences in playlists provides better
music similarity and competitive classification results compared to choosing
tracks from the same artist as in previous works. Additionally, our best
pre-training approach based on playlists provides superior classification
performance for most datasets.Comment: Accepted at the 2023 International Conference on Acoustics, Speech,
and Signal Processing (ICASSP'23
- …