Search CORE

9 research outputs found

Proposal for SMDL compliance in MPEG7, ISO/IEC JTC/SC29/WG11/P620 Lancaster, MPEG, 1999. Lancaster, Motion Picture Expert Group.

Author: BOEHM Carola
Hall Cordy
Publication venue: INTERNATIONAL ORGANIZATION FOR STANDARDIZATION ORGANISATION INTERNATIONAL NORMALIZATION ISO/IEC JTC1/SC29/WG11 CODING OF MOVING PICTURES AND ASSOCIATED AUDIO
Publication date: 01/02/1999
Field of study

This proposal-discussion takes on board discussions held at the AdHoc Evaluation Meeting in Lancaster including Pre-proposals P620 ("Structured time-based event") and P622 ("music event") and the evaluated proposals P155 ("Sonata Forms Description Scheme") and P154/169/163/170 ("Structured and unstructured links") and document M3649 ("Some remarks on Document Structure and Description Schemes"). Due to the need to create a generic method of specifying and describing music content, this document proposes a set of DS and D to provide a general way of describing music content. It is closely based on an existent standard SMDL1 in order to maximize compatibility in the future. This set of Description Schemes and its associated set of Descriptors describe a structured time-based entity (or musical note or musical entity or audio entity) in its relation to other internal or external structured time-based entities. At a very low-level, a Description Scheme named "thread” can take on form (with different Descriptors) to describe a music event down to note level (or further). These Descriptors can take on different forms, and it is possible that additional descriptors with different means of describing a music event exactly, can be added. Proposed in this document are two , one describing the music entity in its abstract form with its four "logical" characteristics: a) duration, b) p itch, c) loudness and d) other characteristics. The second one describes musical events with SMDL syntax. Another descriptor used in this scheme could be a link to a certain section of a structured or unstructured format

STORE - Staffordshire Online Repository

On Automatic Music Genre Recognition by Sparse Representation Classification using Auditory Temporal Modulations

Author: Noorzad Pardis
Sturm Bob L.
Publication venue
Publication date: 01/01/2012
Field of study

VBN

Perceptually motivated blind source separation of convolutive audio mixtures

Author: Guddeti Ram Mohana Reddy
Publication venue: The University of Edinburgh
Publication date: 01/01/2005
Field of study

Edinburgh Research Archive

A architecture for MHEG objects

Author: Albuquerque Eduardo Simoes de
Publication venue
Publication date: 21/06/2022
Field of study

Hypermedia applications are one of the most recent and most demanding computer uses. It is accepted that one of the main impediments to their widespread use is the lack of standards, and the lack of Open Systems with the possibility of having documents interchangeable between different hardware and software platforms. Several standards are emerging, one of which is the one being developed by the ISO/IEC WG12 known as the Multimedia and Hypermedia Information Coding Expert Group (MHEG). As desktop systems become more powerful, one of the main users of hypermedia applications is the home market. Therefore it is important to have standards and applications suitable for those platforms. This work reviews existing proposals for hypermedia architectures and interchange standards. It then assesses the suitability of the MHEG standard for use in open, distributed, and extensible hypermedia systems. An architecture for the implementation of MHEG objects taking into account the limitations imposed by current desktop computers is also proposed. To assess the suitability of the proposed architecture, a prototype has been implemented. An analysis of the performance obtained in the prototype is presented and conclusions on the requirements for future implementations drawn. Finally, some suggestions to improve the MHEG standard are made

Kent Academic Repository

Creating music by listening

Author: Jehan Tristan, 1974-
Publication venue: Massachusetts Institute of Technology
Publication date: 01/01/2005
Field of study

Thesis (Ph. D.)--Massachusetts Institute of Technology, School of Architecture and Planning, Program in Media Arts and Sciences, 2005.Includes bibliographical references (p. 127-139).Machines have the power and potential to make expressive music on their own. This thesis aims to computationally model the process of creating music using experience from listening to examples. Our unbiased signal-based solution models the life cycle of listening, composing, and performing, turning the machine into an active musician, instead of simply an instrument. We accomplish this through an analysis-synthesis technique by combined perceptual and structural modeling of the musical surface, which leads to a minimal data representation. We introduce a music cognition framework that results from the interaction of psychoacoustically grounded causal listening, a time-lag embedded feature representation, and perceptual similarity clustering. Our bottom-up analysis intends to be generic and uniform by recursively revealing metrical hierarchies and structures of pitch, rhythm, and timbre. Training is suggested for top-down un-biased supervision, and is demonstrated with the prediction of downbeat. This musical intelligence enables a range of original manipulations including song alignment, music restoration, cross-synthesis or song morphing, and ultimately the synthesis of original pieces.by Tristan Jehan.Ph.D

DSpace@MIT

Semantic Audio Analysis Utilities and Applications.

Author: Fazekas Gy¨orgy
Publication venue: 'Queen Mary University of London'
Publication date: 01/04/2012
Field of study

PhDExtraction, representation, organisation and application of metadata about audio recordings are in the concern of semantic audio analysis. Our broad interpretation, aligned with recent developments in the field, includes methodological aspects of semantic audio, such as those related to information management, knowledge representation and applications of the extracted information. In particular, we look at how Semantic Web technologies may be used to enhance information management practices in two audio related areas: music informatics and music production. In the first area, we are concerned with music information retrieval (MIR) and related research. We examine how structured data may be used to support reproducibility and provenance of extracted information, and aim to support multi-modality and context adaptation in the analysis. In creative music production, our goals can be summarised as follows: O↵-the-shelf sound editors do not hold appropriately structured information about the edited material, thus human-computer interaction is inefficient. We believe that recent developments in sound analysis and music understanding are capable of bringing about significant improvements in the music production workflow. Providing visual cues related to music structure can serve as an example of intelligent, context-dependent functionality. The central contributions of this work are a Semantic Web ontology for describing recording studios, including a model of technological artefacts used in music production, methodologies for collecting data about music production workflows and describing the work of audio engineers which facilitates capturing their contribution to music production, and finally a framework for creating Web-based applications for automated audio analysis. This has applications demonstrating how Semantic Web technologies and ontologies can facilitate interoperability between music research tools, and the creation of semantic audio software, for instance, for music recommendation, temperament estimation or multi-modal music tutorin

Queen Mary Research Online

An Artificial Intelligence Approach to Concatenative Sound Synthesis

Author: Mohd Norowi Noris
Publication venue: 'University of Plymouth'
Publication date: 01/01/2013
Field of study

Sound examples are included with this thesisTechnological advancement such as the increase in processing power, hard disk capacity and network bandwidth has opened up many exciting new techniques to synthesise sounds, one of which is Concatenative Sound Synthesis (CSS). CSS uses data-driven method to synthesise new sounds from a large corpus of small sound snippets. This technique closely resembles the art of mosaicing, where small tiles are arranged together to create a larger image. A ‘target’ sound is often specified by users so that segments in the database that match those of the target sound can be identified and then concatenated together to generate the output sound. Whilst the practicality of CSS in synthesising sounds currently looks promising, there are still areas to be explored and improved, in particular the algorithm that is used to find the matching segments in the database. One of the main issues in CSS is the basis of similarity, as there are many perceptual attributes which sound similarity can be based on, for example it can be based on timbre, loudness, rhythm, and tempo and so on. An ideal CSS system needs to be able to decipher which of these perceptual attributes are anticipated by the users and then accommodate them by synthesising sounds that are similar with respect to the particular attribute. Failure to communicate the basis of sound similarity between the user and the CSS system generally results in output that mismatches the sound which has been envisioned by the user. In order to understand how humans perceive sound similarity, several elements that affected sound similarity judgment were first investigated. Of the four elements tested (timbre, melody, loudness, tempo), it was found that the basis of similarity is dependent on humans’ musical training where musicians based similarity on the timbral information, whilst non-musicians rely on melodic information. Thus, for the rest of the study, only features that represent the timbral information were included, as musicians are the target user for the findings of this study. Another issue with the current state of CSS systems is the user control flexibility, in particular during segment matching, where features can be assigned with different weights depending on their importance to the search. Typically, the weights (in some existing CSS systems that support the weight assigning mechanism) can only be assigned manually, resulting in a process that is both labour intensive and time consuming. Additionally, another problem was identified in this study, which is the lack of mechanism to handle homosonic and equidistant segments. These conditions arise when too few features are compared causing otherwise aurally different sounds to be represented by the same sonic values, or can also be a result of rounding off the values of the features extracted. This study addresses both of these problems through an extended use of Artificial Intelligence (AI). The Analysis Hierarchy Process (AHP) is employed to enable order dependent features selection, allowing weights to be assigned for each audio feature according to their relative importance. Concatenation distance is used to overcome the issues with homosonic and equidistant sound segments. The inclusion of AI results in a more intelligent system that can better handle tedious tasks and minimize human error, allowing users (composers) to worry less of the mundane tasks, and focusing more on the creative aspects of music making. In addition to the above, this study also aims to enhance user control flexibility in a CSS system and improve similarity result. The key factors that affect the synthesis results of CSS were first identified and then included as parametric options which users can control in order to communicate their intended creations to the system to synthesise. Comprehensive evaluations were carried out to validate the feasibility and effectiveness of the proposed solutions (timbral-based features set, AHP, and concatenation distance). The final part of the study investigates the relationship between perceived sound similarity and perceived sound interestingness. A new framework that integrates all these solutions, the query-based CSS framework, was then proposed. The proof-of-concept of this study, ConQuer, was developed based on this framework. This study has critically analysed the problems in existing CSS systems. Novel solutions have been proposed to overcome them and their effectiveness has been tested and discussed, and these are also the main contributions of this study.Malaysian Minsitry of Higher Education, Universiti Putra Malaysi

Plymouth Electronic Archive and Research Library

Effets audionumériques adaptatifs : théorie, mise en œuvre et usage en création musicale numérique.

Author: Verfaille Vincent
Publication venue: HAL CCSD
Publication date: 12/09/2003
Field of study

Présidente : Myriam Desainte-Catherine, LABRI, Université Bordeaux 1 Rapporteurs : Philippe Depalle, SPCL, Université McGill, Montréal (CANADA) Xavier Serra MTG, Université Pompeu Fabre, Barcelone (ESPAGNE) Invités : Emmanuel Favreau, INA-GRM, Paris Patrick Boussard, GENESIS S.A., Aix-en-ProvenceThis PhD thesis addresses the theory, the implementation and the musical use of adaptive digital audio effects. In the first part, we situate the subject in the context of sound transformations. There exist a great number of signal processing techniques that complete each other and provide a complete set of algorithms for sound transformations. These transformations are applied according to the sound perceptive dimensions, namely dynamics, duration, pitch, spatialisation and timbre. For some effects, the control evolves in an automatic or periodic way, and this control is integrated to the algorithm. The control let to the user is about some parameters of the algorithm. It is given by real controllers, such as knobs, switches, or by virtual controllers, such as the graphical interfaces on computer screens. A main interest in sound synthesis today is the mapping: the topic is to find how we can map the gesture transducer data to the parameters of the synthesis algorithm. Our study is situated at the intersection between digital audio effects, adaptive and gestural control, and sound features. In the second part, we present adaptive digital audio effects, in the way we formalised and developed them. These effects have their controls automated according to sound features. We studied and used a lot of processing algorithms, some in real-time and some out of real-time. We improved them in order to use varying control values. A reflexion was carried out in order to choose a meaningful classification to the musician: the perceptive taxonomy. In parallel, we studied sound features and descriptors, and the ways to control an effect, by the sound and by gestures. We brought together numerous sound features that are used in psycho-acoustics, for analysis-synthesis, for sound segmentation, for sound classification and retrieval, and for automatic transcription of music. We propose a generalised control for adaptive effects, structured with two levels. The first control level is the adaption level: sound features control the effect with mapping functions. We give a set of warping functions (non-linear transfer functions) allowing transformations of the evolution of sound feature curves; we also give feature combination functions and specific warping functions used to warp a control curve according to specific rules. The second control level is the gesture control, which is applied onto the mapping functions between sound features and controls, during combination or during specific warping. This study provides a generalisation of the control of digital audio effects, as well as the conception of toolboxes for composition, and their use in musical context. Numerous experiments and sound examples have been made, among which an adaptive spatialisation controlled by a dancer, and an adaptive stereophonic equaliser. The experiments confirm the interest of such an adaptive and gestural control, for example to change expressiveness of a musical sentence, or to create new sounds.Ce travail de thèse porte sur la théorie, la mise en œuvre et les applications musicales des effets audionumériques adaptatifs. Dans la première partie, nous plaçons le sujet dans le contexte des transformations sonores. Un grand nombre de techniques de traitement du signal sonore numérique se complètent et fournissent un ensemble d'algorithmes permettant de transformer le son. Ces transformations sont appliquées selon les dimensions perceptives du son musical, à savoir la dynamique, la durée, la hauteur, la spatialisation et le timbre. Pour quelques effets, les contrôles évoluent de manière automatique ou périodique, et ce contrôle est intégré à l'algorithme. Le contrôle offert à l'utilisateur porte sur les valeurs de certains paramètres de l'algorithme. Il se réalise à l'aide de contrôleurs réels, tels des potentiomètres, des interrupteurs, ou à l'aide de contrôleurs virtuels, telles les interfaces graphiques sur écran d'ordinateur. En synthèse sonore, l'un des sujets majeurs d'étude à l'heure actuelle est le mapping : il s'agit de savoir comment mettre en correspondance les paramètres d'un contrôleur gestuel et les paramètres d'un algorithme de synthèse. Notre étude se situe à l'intersection entre les effets audionumériques, le contrôle adaptatif et gestuel, et la description de contenu sonore. Dans la seconde partie, nous présentons les effets audionumériques adaptatifs tels que nous les avons formalisés et développés. Ce sont des effets dont le contrôle est automatisé en fonction de descripteurs sonores. Nous avons étudié puis utilisé de nombreux algorithmes de traitement, certains en temps-réel et d'autres hors temps-réel. Nous les avons améliorés afin de permettre l'utilisation de valeurs de contrôle variables. Une réflexion a été menée pour choisir une classification des effets qui ait du sens pour le musicien ; elle a logiquement abouti à la taxonomie perceptive. Parallèlement, nous avons étudié les descripteurs sonores et les moyens de contrôle d'un effet, par le son et par le geste. Nous avons rassemblé de nombreux descripteurs sonores, utilisés en psychoacoustique, en analyse-synthèse, pour la segmentation et la classification d'extraits sonores, et pour la transcription automatique de partition. Nous proposons un contrôle généralisé pour les effets adaptatifs, hiérarchisé en deux niveaux. Le premier niveau de contrôle est le niveau d'adaptation : le contrôle de l'effet est effectué par des descripteurs du son, à l'aide de fonctions de mapping. Nous indiquons des fonctions de conformation (fonctions de transfert non linéaires) permettant de transformer la courbe d'évolution temporelle d'un descripteur, des fonctions de combinaisons des descripteurs ainsi que des fonctions de conformations spécifiques des paramètres de contrôle. Le second niveau de contrôle est celui du contrôle gestuel : le geste agit sur les fonctions de mapping, soit sur la combinaison, soit sur la conformation spécifique des contrôles. De cette étude, il ressort non seulement une généralisation du contrôle des effets audionumériques, mais aussi la réalisation d'outils pour la composition, et leur utilisation en situation musicale. De nombreuses expériences et illustrations sonores ont été réalisées, parmi lesquelles une spatialisation adaptative contrôlée par une danseuse, et un équalisateur stéréophonique adaptatif. Les expériences confirment l'intérêt d'un tel contrôle adaptatif et gestuel, notamment pour modifier l'expressivité d'une phrase musicale, ou pour créer des sons inouïs

Thèses en Ligne

HAL AMU

Cross-Coding SDIF into MPEG-4 Structured Audio

Author: Eric D. Scheirer
Matthew Wright
Publication venue
Publication date
Field of study

We have created a link between the Sound Description Interchange Format ("SDIF") and MPEG-4's Structured Audio ("SA") tools. We cross-code SDIF data into SA bitstreams, and write SA programs to synthesize this SDIF data. By making a link between these two powerful formats, both communities of users benefit: the SDIF community gets a fixed, standard synthesis platform that will soon be widespread, and the MPEG-4 community gets a set of powerful, robust analysis-synthesis tools. We have made the cross-coding tools available at no cost. 1. Introduction The International Standards Organization completed the MPEG-4 standard, ISO/IEC 14496-3, in October 1998, and will publish and designate it as International Standard in mid-1999 [1]. One of the tools in MPEG-4 is a new sound-coding format called Structured Audio ("SA") [2]. SA allows audio to be transmitted from a server to a receiver as a set of instructions in a software-synthesis language. Upon receipt, a real-time synthesizer converts..

CiteSeerX

University of Michigan Library Repository