752 research outputs found
Sparse and structured decomposition of audio signals on hybrid dictionaries using musical priors
International audienceThis paper investigates the use of musical priors for sparse expansion of audio signals of music, on an overcomplete dual-resolution dictionary taken from the union of two orthonormal bases that can describe both transient and tonal components of a music audio signal. More specifically, chord and metrical structure information are used to build a structured model that takes into account dependencies between coefficients of the decomposition, both for the tonal and for the transient layer. The denoising task application is used to provide a proof of concept of the proposed musical priors. Several configurations of the model are analyzed. Evaluation on monophonic and complex polyphonic excerpts of real music signals shows that the proposed approach provides results whose quality measured by the signal-to-noise ratio is competitive with state-of-the-art approaches, and more coherent with the semantic content of the signal. A detailed analysis of the model in terms of sparsity and in terms of interpretability of the representation is also provided, and shows that the model is capable of giving a relevant and legible representation of Western tonal music audio signals
Towards the automated analysis of simple polyphonic music : a knowledge-based approach
PhDMusic understanding is a process closely related to the knowledge and experience
of the listener. The amount of knowledge required is relative to the
complexity of the task in hand.
This dissertation is concerned with the problem of automatically decomposing
musical signals into a score-like representation. It proposes that, as
with humans, an automatic system requires knowledge about the signal and
its expected behaviour to correctly analyse music.
The proposed system uses the blackboard architecture to combine the
use of knowledge with data provided by the bottom-up processing of the
signal's information. Methods are proposed for the estimation of pitches,
onset times and durations of notes in simple polyphonic music.
A method for onset detection is presented. It provides an alternative to
conventional energy-based algorithms by using phase information. Statistical
analysis is used to create a detection function that evaluates the expected
behaviour of the signal regarding onsets.
Two methods for multi-pitch estimation are introduced. The first concentrates
on the grouping of harmonic information in the frequency-domain.
Its performance and limitations emphasise the case for the use of high-level
knowledge.
This knowledge, in the form of the individual waveforms of a single
instrument, is used in the second proposed approach. The method is based
on a time-domain linear additive model and it presents an alternative to
common frequency-domain approaches.
Results are presented and discussed for all methods, showing that, if
reliably generated, the use of knowledge can significantly improve the quality
of the analysis.Joint Information Systems Committee (JISC) in the UK National Science Foundation (N.S.F.) in the United states. Fundacion Gran Mariscal Ayacucho in Venezuela
Unattended acoustic sensor systems for noise monitoring in national parks
2017 Spring.Includes bibliographical references.Detection and classification of transient acoustic signals is a difficult problem. The problem is often complicated by factors such as the variety of sources that may be encountered, the presence of strong interference and substantial variations in the acoustic environment. Furthermore, for most applications of transient detection and classification, such as speech recognition and environmental monitoring, online detection and classification of these transient events is required. This is even more crucial for applications such as environmental monitoring as it is often done at remote locations where it is unfeasible to set up a large, general-purpose processing system. Instead, some type of custom-designed system is needed which is power efficient yet able to run the necessary signal processing algorithms in near real-time. In this thesis, we describe a custom-designed environmental monitoring system (EMS) which was specifically designed for monitoring air traffic and other sources of interest in national parks. More specifically, this thesis focuses on the capabilities of the EMS and how transient detection, classification and tracking are implemented on it. The Sparse Coefficient State Tracking (SCST) transient detection and classification algorithm was implemented on the EMS board in order to detect and classify transient events. This algorithm was chosen because it was designed for this particular application and was shown to have superior performance compared to other algorithms commonly used for transient detection and classification. The SCST algorithm was implemented on an Artix 7 FPGA with parts of the algorithm running as dedicated custom logic and other parts running sequentially on a soft-core processor. In this thesis, the partitioning and pipelining of this algorithm is explained. Each of the partitions was tested independently to very their functionality with respect to the overall system. Furthermore, the entire SCST algorithm was tested in the field on actual acoustic data and the performance of this implementation was evaluated using receiver operator characteristic (ROC) curves and confusion matrices. In this test the FPGA implementation of SCST was able to achieve acceptable source detection and classification results despite a difficult data set and limited training data. The tracking of acoustic sources is done through successive direction of arrival (DOA) angle estimation using a wideband extension of the Capon beamforming algorithm. This algorithm was also implemented on the EMS in order to provide real-time DOA estimates for the detected sources. This algorithm was partitioned into several stages with some stages implemented in custom logic while others were implemented as software running on the soft-core processor. Just as with SCST, each partition of this beamforming algorithm was verified independently and then a full system test was conducted to evaluate whether it would be able to track an airborne source. For the full system test, a model airplane was flown at various trajectories relative to the EMS and the trajectories estimated by the system were compared to the ground truth. Although in this test the accuracy of the DOA estimates could not be evaluated, it was show that the algorithm was able to approximately form the general trajectory of a moving source which is sufficient for our application as only a general heading of the acoustic sources is desired
A Parametric Sound Object Model for Sound Texture Synthesis
This thesis deals with the analysis and synthesis of sound textures based on parametric sound objects. An overview is provided about the acoustic and perceptual principles of textural acoustic scenes, and technical challenges for analysis and synthesis are considered. Four essential processing steps for sound texture analysis are identifi ed, and existing sound texture systems are reviewed, using the four-step model as a guideline. A theoretical framework for analysis and synthesis is proposed. A parametric sound object synthesis (PSOS) model is introduced, which is able to describe individual recorded sounds through a fi xed set of parameters. The model, which applies to harmonic and noisy sounds, is an extension of spectral modeling and uses spline curves to approximate spectral envelopes, as well as the evolution of parameters over time. In contrast to standard spectral modeling techniques, this representation uses the concept of objects instead of concatenated frames, and it provides a direct mapping between sounds of diff erent length. Methods for automatic and manual conversion are shown. An evaluation is presented in which the ability of the model to encode a wide range of di fferent sounds has been examined. Although there are aspects of sounds that the model cannot accurately capture, such as polyphony and certain types of fast modulation, the results indicate that high quality synthesis can be achieved for many different acoustic phenomena, including instruments and animal vocalizations. In contrast to many other forms of sound encoding, the parametric model facilitates various techniques of machine learning and intelligent processing, including sound clustering and principal component analysis. Strengths and weaknesses of the proposed method are reviewed, and possibilities for future development are discussed
Recommended from our members
Characterizing Audio Events for Video Soundtrack Analysis
There is an entire emerging ecosystem of amateur video recordings on the internet today, in addition to the abundance of more professionally produced content. The ability to automatically scan and evaluate the content of these recordings would be very useful for search and indexing, especially as amateur content tends to be more poorly labeled and tagged than professional content. Although the visual content is often considered to be of primary importance, the audio modality contains rich information which may be very helpful in the context of video search and understanding. Any technology that could help to interpret video soundtrack data would also be applicable in a number of other scenarios, such as mobile device audio awareness, surveillance, and robotics. In this thesis we approach the problem of extracting information from these kinds of unconstrained audio recordings. Specifically we focus on techniques for characterizing discrete audio events within the soundtrack (e.g. a dog bark or door slam), since we expect events to be particularly informative about content. Our task is made more complicated by the extremely variable recording quality and noise present in this type of audio. Initially we explore the idea of using the matching pursuit algorithm to decompose and isolate components of audio events. Using these components we develop an approach for non-exact (approximate) fingerprinting as a way to search audio data for similar recurring events. We demonstrate a proof of concept for this idea. Subsequently we extend the use of matching pursuit to build an actual audio fingerprinting system, with the goal of identifying simultaneously recorded amateur videos (i.e. videos taken in the same place at the same time by different people, which contain overlapping audio). Automatic discovery of these simultaneous recordings is one particularly interesting facet of general video indexing. We evaluate this fingerprinting system on a database of 733 internet videos. Next we return to searching for features to directly characterize soundtrack events. We develop a system to detect transient sounds and represent audio clips as a histogram of the transients it contains. We use this representation for video classification over a database of 1873 internet videos. When we combine these features with a spectral feature baseline system we achieve a relative improvement of 7.5% in mean average precision over the baseline. In another attempt to devise features to better describe and compare events, we investigate decomposing audio using a convolutional form of non-negative matrix factorization, resulting in event-like spectro-temporal patches. We use the resulting representation to build an event detection system that is more robust to additive noise than a comparative baseline system. Lastly we investigate a promising feature representation that has been used by others previously to describe event-like sound effect clips. These features derive from an auditory model and are meant to capture fine time structure in sound events. We compare these features and a related but simpler feature set on the task of video classification over 9317 internet videos. We find that combinations of these features with baseline spectral features produce a significant improvement in mean average precision over the baseline
Computer Models for Musical Instrument Identification
PhDA particular aspect in the perception of sound is concerned with what is commonly
termed as texture or timbre. From a perceptual perspective, timbre is what allows us
to distinguish sounds that have similar pitch and loudness. Indeed most people are
able to discern a piano tone from a violin tone or able to distinguish different voices
or singers.
This thesis deals with timbre modelling. Specifically, the formant theory of timbre
is the main theme throughout. This theory states that acoustic musical instrument
sounds can be characterised by their formant structures. Following this principle, the
central point of our approach is to propose a computer implementation for building
musical instrument identification and classification systems.
Although the main thrust of this thesis is to propose a coherent and unified
approach to the musical instrument identification problem, it is oriented towards the
development of algorithms that can be used in Music Information Retrieval (MIR)
frameworks. Drawing on research in speech processing, a complete supervised system
taking into account both physical and perceptual aspects of timbre is described.
The approach is composed of three distinct processing layers. Parametric models
that allow us to represent signals through mid-level physical and perceptual representations
are considered. Next, the use of the Line Spectrum Frequencies as spectral
envelope and formant descriptors is emphasised. Finally, the use of generative and
discriminative techniques for building instrument and database models is investigated.
Our system is evaluated under realistic recording conditions using databases of isolated
notes and melodic phrases
Doctor of Philosophy
dissertationThe goal of the work presented in this dissertation was to find out how physical ordering of organic semiconductors affects spin-dependent electronic charge carrier transitions. Organic light emitting diodes in distinct morphological phases were created out of thin films of the ï°-conjugated polymer poly[9,9-dioctylfluorenyl-2,7-diyl] (polyfluorene), allowing diodes to be studied with the sole difference being the degree of polymeric order in the active layer of the device. The polyfluorene morphologies studied ranged from an amorphous (glassy) phase through mixed phases, to a highly ordered (beta) phase. The phase control was achieved through a dipping procedure where a glassy polyfluorene layer is immersed in a solvent mixture that structures the side chains of a monomer unit into an alternating planar ladder structure. Continuous-wave (cw) and pulsed (p) electrically detected magnetic resonance (EDMR) spectroscopies were used to probe charge carrier spin states within the polyfluorene layers. For cw EDMR, microwave frequencies between ~1 and 20 GHz were used, while all pEDMR measurements were conducted at X-band (~9.6 GHz). The experiments allowed for a comparison of how polymer morphology affects spin-dependent charge carrier transitions, coherent spin motion, spin relaxation times, the local nuclear (hyperfine) magnetic fields and spin-orbit effects. In addition to film morphology, temperature and device-bias dependencies of the spin-dependent charge carrier transitions were studied experimentally and a set of global fit and bootstrap error analysis techniques were adapted for EDMR spectroscopy
Non-parametric modeling in non-intrusive load monitoring
Non-intrusive Load Monitoring (NILM) is an approach to the increasingly important task of residential energy analytics. Transparency of energy resources and consumption habits presents opportunities and benefits at all ends of the energy supply-chain, including the end-user. At present, there is no feasible infrastructure available to monitor individual appliances at a large scale. The goal of NILM is to provide appliance monitoring using only the available aggregate data, side-stepping the need for expensive and intrusive monitoring equipment. The present work showcases two self-contained, fully unsupervised NILM solutions: the first featuring non-parametric mixture models, and the second featuring non-parametric factorial Hidden Markov Models with explicit duration distributions. The present implementation makes use of traditional and novel constraints during inference, showing marked improvement in disaggregation accuracy with very little effect on computational cost, relative to the motivating work. To constitute a complete unsupervised solution, labels are applied to the inferred components using a Res-Net-based deep learning architecture. Although this preliminary approach to labelling proves less than satisfactory, it is well-founded and several opportunities for improvement are discussed. Both methods, along with the labelling network, make use of block-filtered data: a steady-state representation that removes transient behaviour and signal noise. A novel filter to achieve this steady-state representation that is both fast and reliable is developed and discussed at length. Finally, an approach to monitor the aggregate for novel events during deployment is developed under the framework of Bayesian surprise. The same non-parametric modelling can be leveraged to examine how the predictive and transitional distributions change given new windows of observations. This framework is also shown to have potential elsewhere, such as in regularizing models against over-fitting, which is an important problem in existing supervised NILM
NMR studies of silicate & aluminosilicate solutions
The work described in this thesis deals with the use of (^29)Si and (^27)Al NMR to obtain information about the chemical structure of aqueous silicate and aluminosilicate solutions. This has extended the knowledge gained in previous studies. A wide range of alkaline and tetraalkylammonium hydroxide silicate and aluminosilicate solutions (mostly also containing sodium) has been examined. Such solutions are shown to contain a large range of anions. The first highly-resolved (^27)Al NMR spectra of alkaline aluminosilicate solutions are presented and discussed. The linewidths and number of resolved lines are shown to depend critically on several factors, especially the pH and Si:Al ratio, as well as the concentration of various components. At least thirteen separate peaks or bands are observed at the optimum conditions of pH~10.35 and Si:Al = 1. Tentative assignments of some bands are presented, based on (^27)Al and (^29)Si shift omparability, (^27)Al linewidths and (^27)Al spin-lattice relaxation measurements. Relative intensities of the various (^27)Al signals are given for the pH ~10.35 solution. A correlation of (^27)A1 and (^29)Si chemical shifts has been established. Dilution effects and dynamic equilibria are considered for the aluminosilicate solutions. Trends in the spectra with changing concentrations of the various components have been identified. Under certain conditions, additional peaks are observed. Two of these are tentatively assigned to aluminium nuclei in "three-membered" rings. Moreover, a few very sharp hues (i.e. with very low electric field gradients), currently of unknown origin, are observed in several spectra. At certain concentrations chemical exchange can be shown to take place. The effects of tetraalkylammonium (TAA) and alkali metal cations on the equilibrium distribution of aluminosilicate oligomers in aqueous alkaline aluminosilicate solutions have been investigated using evolution with time of Al-27 NMR spectra. The results indicate that there are no observable differences in the initial distribution of aluminosilicate species (i.e. immediately following solution mixing) involving a mixture of TAA and Na cations and those involving alkali metal cations alone. However, in the latter case, this distribution, in contrast to those for TAA/Na solutions, is not stable, the species quickly re-equilibrating, giving broad signals of q(^3) and perhaps q(^4) type. The effect of Al and Si concentration on the formation of gel in?aluminosilicate solutions is also investigated with ^^Al NMR spectra. It is shown that the gel time strongly depends on the Al concentration and the temperature. A graph of Al and Si concentrations with gel time has been established.Silicon-29 NMR spectra have been obtained for more than 20 aqueous alkaline silicate solutions containing methanol. A signal assigned to CH(_3)OSi(OH)(_3) or one of its deprotonated congeners is studied in detail for the first time for the solution conditions involved. Its appearance has been monitored as a function of the solution composition. The equilibrium constant for its formation is of the order of 0.65. The effects of alcohol, silicate and TAAOH concentration and of the nature of the alkylammonium base on this reaction have been investigated
- …