4,032 research outputs found
Audio Compression Exploiting Repetition (ACER): Challenges and Solutions
This paper presents issues relating to information and musical content dealt with as part of the development of an innovative audio compression system, designed to exploit repetition sequences in audio, and particularly, music. The paper briefly introduces and describes how musical content and structure within audio can be exploited to achieve compression. A new system to take advantage of these hypotheses is described. The paper introduces a new file format to deal with high-level data chunks and repetitive structuring. The practicalities of searching for large blocks of audio data are discussed and evidence provided of the high amounts of complexity involved in performing brute-force searches for self-similar matching. This high complexity is the subject of current work by the authors and research to date so far reveals that search time can be reduced by considering content analysis, this time to reduce complexity rather than data volume
Generation of folk song melodies using Bayes transforms
The paper introduces the `Bayes transform', a mathematical procedure for putting data into a hierarchical representation. Applicable to any type of data, the procedure yields interesting results when applied to sequences. In this case, the representation obtained implicitly models the repetition hierarchy of the source. There are then natural applications to music. Derivation of Bayes transforms can be the means of determining the repetition hierarchy of note sequences (melodies) in an empirical and domain-general way. The paper investigates application of this approach to Folk Song, examining the results that can be obtained by treating such transforms as generative models
Subjective Evaluation of Music Compressed with the ACER Codec Compared to AAC, MP3, and Uncompressed PCM
Audio data compression has revolutionised the way in which the music industry and musicians sell and distribute their products. Our previous research presented a novel codec named ACER (Audio Compression Exploiting Repetition), which achieves data reduction by exploiting irrelevancy and redundancy in musical structure whilst generally maintaining acceptable levels of noise and distortion in objective evaluations. However, previous work did not evaluate ACER using subjective listening tests, leaving a gap to demonstrate its applicability under human audio perception tests. In this paper, we present a double-blind listening test that was conducted with a range of listeners (N=100). The aim was to determine the efficacy of the ACER codec, in terms of perceptible noise and spatial distortion artefacts, against de facto standards for audio data compression and an uncompressed reference. Results show that participants reported no perceived differences between the uncompressed, MP3, AAC, ACER high quality, and ACER medium quality compressed audio in terms of noise and distortions but that the ACER low quality format was perceived as being of lower quality. However, in terms of participants’ perceptions of the stereo field, all formats under test performed as well as each other, with no statistically significant differences. A qualitative, thematic analysis of listeners’ feedback revealed that the noise artefacts that produced the ACER technique are different from those of comparator codecs, reflecting its novel approach. Results show that the quality of contemporary audio compression systems has reached a stage where their performance is perceived to be as good as uncompressed audio. The ACER format is able to compete as an alternative, with results showing a preference for the ACER medium quality versions over WAV, MP3, and AAC. The ACER process itself is viable on its own or in conjunction with techniques such as MP3 and AAC
Multimodal Content Analysis for Effective Advertisements on YouTube
The rapid advances in e-commerce and Web 2.0 technologies have greatly
increased the impact of commercial advertisements on the general public. As a
key enabling technology, a multitude of recommender systems exists which
analyzes user features and browsing patterns to recommend appealing
advertisements to users. In this work, we seek to study the characteristics or
attributes that characterize an effective advertisement and recommend a useful
set of features to aid the designing and production processes of commercial
advertisements. We analyze the temporal patterns from multimedia content of
advertisement videos including auditory, visual and textual components, and
study their individual roles and synergies in the success of an advertisement.
The objective of this work is then to measure the effectiveness of an
advertisement, and to recommend a useful set of features to advertisement
designers to make it more successful and approachable to users. Our proposed
framework employs the signal processing technique of cross modality feature
learning where data streams from different components are employed to train
separate neural network models and are then fused together to learn a shared
representation. Subsequently, a neural network model trained on this joint
feature embedding representation is utilized as a classifier to predict
advertisement effectiveness. We validate our approach using subjective ratings
from a dedicated user study, the sentiment strength of online viewer comments,
and a viewer opinion metric of the ratio of the Likes and Views received by
each advertisement from an online platform.Comment: 11 pages, 5 figures, ICDM 201
Reduction in Computer Music:Bodies, Temporalities, and Generative Computation
In the age of pervasive computing the way our body interacts with reality needs to be reconceptualized. The reduction of embodiment is a problem for computer music since this music relies heavily on different layers of (digital) technology and mediation in order to be produced and performed. The article shows that such a mediation should not be conceived of as an obstacle but rather as a constitutive element of a permanent, complex negotiation between the artist, the machinery, and the audience, aimed at shaping a different temporality for musical language (as the Italian artist Caterina Barbieri develops).Federica Buongiorno, ‘Reduction in Computer Music: Bodies, Temporalities, and Generative Computation’, in The Case for Reduction, ed. by Christoph F. E. Holzhey and Jakob Schillinger, Cultural Inquiry, 25 (Berlin: ICI Berlin Press, 2022), pp. 175-90 <https://doi.org/10.37050/ci-25_09
Automatic chord transcription from audio using computational models of musical context
PhDThis thesis is concerned with the automatic transcription of chords from audio, with an emphasis
on modern popular music. Musical context such as the key and the structural segmentation aid
the interpretation of chords in human beings. In this thesis we propose computational models
that integrate such musical context into the automatic chord estimation process.
We present a novel dynamic Bayesian network (DBN) which integrates models of metric
position, key, chord, bass note and two beat-synchronous audio features (bass and treble
chroma) into a single high-level musical context model. We simultaneously infer the most probable
sequence of metric positions, keys, chords and bass notes via Viterbi inference. Several
experiments with real world data show that adding context parameters results in a significant
increase in chord recognition accuracy and faithfulness of chord segmentation. The proposed,
most complex method transcribes chords with a state-of-the-art accuracy of 73% on the song
collection used for the 2009 MIREX Chord Detection tasks. This method is used as a baseline
method for two further enhancements.
Firstly, we aim to improve chord confusion behaviour by modifying the audio front end
processing. We compare the effect of learning chord profiles as Gaussian mixtures to the effect
of using chromagrams generated from an approximate pitch transcription method. We show
that using chromagrams from approximate transcription results in the most substantial increase
in accuracy. The best method achieves 79% accuracy and significantly outperforms the state of
the art.
Secondly, we propose a method by which chromagram information is shared between
repeated structural segments (such as verses) in a song. This can be done fully automatically
using a novel structural segmentation algorithm tailored to this task. We show that the technique
leads to a significant increase in accuracy and readability. The segmentation algorithm itself
also obtains state-of-the-art results. A method that combines both of the above enhancements
reaches an accuracy of 81%, a statistically significant improvement over the best result (74%)
in the 2009 MIREX Chord Detection tasks.Engineering and Physical Research Council U
Identification of expressive descriptors for style extraction in music analysis using linear and nonlinear models
La formalización de las interpretaciones expresivas aún se considera relevante debido a la complejidad de la música. La interpretación expresiva forma un aspecto importante de la música, teniendo en cuenta diferentes convenciones como géneros o estilos que una interpretación puede desarrollar con el tiempo. Modelar la relación entre las expresiones musicales y los aspectos estructurales de la información acústica requiere una base probabilística y estadística mínima para la robustez, validación y reproducibilidad de aplicaciones computacionales. Por lo tanto, es necesaria una relación cohesiva y una justificación sobre los resultados. Esta tesis se sustenta en la teoría y aplicaciones de modelos discriminativos y generativos en el marco del aprendizaje de maquina y la relación de procedimientos sistemáticos con los conceptos de la musicología utilizando técnicas de procesamiento de señales y minería de datos. Los resultados se validaron mediante pruebas estadísticas y una experimentación no paramétrica con la implementación de un conjunto de métricas para medir aspectos acústicos y temporales de archivos de audio para entrenar un modelo discriminativo y mejorar el proceso de síntesis de un modelo neuronal profundo. Adicionalmente, el modelo implementado presenta la oportunidad para la aplicación de procedimientos sistemáticos, automatización de transcripciones usando notación musical, entrenamiento de habilidades auditivas para estudiantes de música y mejorar la implementación de redes neuronales profundas usando CPU en lugar de GPU debido a las ventajas de las redes convolucionales para el procesamiento de archivos de audio como vectores o matriz con una secuencia de notas.MaestríaMagister en Ingeniería Electrónic
A Cross-Version Approach for Harmonic Analysis of Music Recordings
The automated extraction of chord labels from audio recordings is a central task in music information retrieval. Here, the chord labeling is typically performed on a specific audio version of a piece of music, produced under certain recording conditions, played on specific instruments and characterized by individual styles of the musicians. As a consequence, the obtained chord labeling results are strongly influenced by version-dependent characteristics. In this chapter, we show that analyzing the harmonic properties of several audio versions synchronously stabilizes the chord labeling result in the sense that inconsistencies indicate version-dependent characteristics, whereas consistencies across several versions indicate harmonically stable passages in the piece of music. In particular, we show that consistently labeled passages often correspond to correctly labeled passages. Our experiments show that the cross-version labeling procedure significantly increases the precision of the result while keeping the recall at a relatively high level. Furthermore, we introduce a powerful visualization which reveals the harmonically stable passages on a musical time axis specified in bars. Finally, we demonstrate how this visualization facilitates a better understanding of classification errors and may be used by music experts as a helpful tool for exploring harmonic structures
- …