Search CORE

13 research outputs found

Complex Neural Networks for Audio

Author: Sarroff Andy M
Publication venue: Dartmouth Digital Commons
Publication date: 01/05/2018
Field of study

Audio is represented in two mathematically equivalent ways: the real-valued time domain (i.e., waveform) and the complex-valued frequency domain (i.e., spectrum). There are advantages to the frequency-domain representation, e.g., the human auditory system is known to process sound in the frequency-domain. Furthermore, linear time-invariant systems are convolved with sources in the time-domain, whereas they may be factorized in the frequency-domain. Neural networks have become rather useful when applied to audio tasks such as machine listening and audio synthesis, which are related by their dependencies on high quality acoustic models. They ideally encapsulate fine-scale temporal structure, such as that encoded in the phase of frequency-domain audio, yet there are no authoritative deep learning methods for complex audio. This manuscript is dedicated to addressing the shortcoming. Chapter 2 motivates complex networks by their affinity with complex-domain audio, while Chapter 3 contributes methods for building and optimizing complex networks. We show that the naive implementation of Adam optimization is incorrect for complex random variables and show that selection of input and output representation has a significant impact on the performance of a complex network. Experimental results with novel complex neural architectures are provided in the second half of this manuscript. Chapter 4 introduces a complex model for binaural audio source localization. We show that, like humans, the complex model can generalize to different anatomical filters, which is important in the context of machine listening. The complex model\u27s performance is better than that of the real-valued models, as well as real- and complex-valued baselines. Chapter 5 proposes a two-stage method for speech enhancement. In the first stage, a complex-valued stochastic autoencoder projects complex vectors to a discrete space. In the second stage, long-term temporal dependencies are modeled in the discrete space. The autoencoder raises the performance ceiling for state of the art speech enhancement, but the dynamic enhancement model does not outperform other baselines. We discuss areas for improvement and note that the complex Adam optimizer improves training convergence over the naive implementation

Dartmouth Digital Commons (Dartmouth College)

Musical Audio Synthesis Using Autoencoding Neural Nets

Author: Casey Michael A.
Sarroff Andy
Publication venue: International Society for Music Information Retrieval
Publication date: 01/01/2014
Field of study

With an optimal network topology and tuning of hyperpa- rameters, artificial neural networks (ANNs) may be trained to learn a mapping from low level audio features to one or more higher-level representations. Such artificial neu- ral networks are commonly used in classification and re- gression settings to perform arbitrary tasks. In this work we suggest repurposing autoencoding neural networks as musical audio synthesizers. We offer an interactive musi- cal audio synthesis system that uses feedforward artificial neural networks for musical audio synthesis, rather than discriminative or regression tasks. In our system an ANN is trained on frames of low-level features. A high level representation of the musical audio is learned though an autoencoding neural net. Our real-time synthesis system allows one to interact directly with the parameters of the model and generate musical audio in real time. This work therefore proposes the exploitation of neural networks for creative musical applications

Goldsmiths Research Online

University of Michigan Library Repository

A new method for ecoacoustics? Toward the extraction and evaluation of ecologically-meaningful soundscape components using sparse coding methods

Author: Adiloglu
Amézquita
Boelman
Caro
Chek
Depraetere
Farina
Gewin
Gini
Glotin
Gregory
Hutchinson
Kasten
Krause
Lee
Lellouch
Mallat
Oliver
Parks
Peck
Peck
Pieretti
Pijanowski
Ralph
Riede
Rodriguez
Ruppé
Sarroff
Schafer
Schmidt
Scholler
Seddon
Servick
Shannon
Sinsch
Skowronski
Smaragdis
Smaragdis
Smith
Sueur
Sueur
Sueur
Sueur
Tobias
Towsey
Villanueva-Rivera
Wang
Weiss
Publication venue: 'PeerJ'
Publication date: 01/06/2016
Field of study

Passive acoustic monitoring is emerging as a promising non-invasive proxy for ecological complexity with potential as a tool for remote assessment and monitoring (Sueur and Farina, 2015). Rather than attempting to recognise species-specific calls, either manually or automatically, there is a growing interest in evaluating the global acoustic environment. Positioned within the conceptual framework of ecoacoustics, a growing number of indices have been proposed which aim to capture community-level dynamics by (e.g. Pieretti et al., 2011; Farina, 2014; Sueur et al., 2008b) by providing statistical summaries of the frequency or time domain signal. Although promising, the ecological relevance and efficacy as a monitoring tool of these indices is still unclear. In this paper we suggest that by virtue of operating in the time or frequency domain, existing indices are limited in their ability to access key structural information in the spectro-temporal domain. Alternative methods in which time-frequency dynamics are preserved are considered. Sparse-coding and source separation algorithms (specifically, shift-invariant probabilistic latent component analysis in 2D) are proposed as a means to access and summarise time- frequency dynamics which may be more ecologically-meaningful

Crossref

Directory of Open Access Journals

PubMed Central

Dartmouth Digital Commons (Dartmouth College)

Sussex Research Online

Untitled [Image no. 3177]

Author: Sarroff Marlene
Publication venue: 'The University of Sydney Library'
Publication date: 01/01/2006
Field of study

For more information about this item, browse to http://hdl.handle.net/102.100.100/329

University of Sydney Library, Visual arts collection

GROOVE KERNELS AS RHYTHMIC-ACOUSTIC MOTIF DESCRIPTORS

Author: Andy M Sarroff
Michael Casey
Publication venue
Publication date: 02/04/2020
Field of study

ABSTRACT The "groove" of a song correlates with enjoyment and bodily movement. Recent work has shown that humans often agree whether a song does or does not have groove and how much groove a song has. It is therefore useful to develop algorithms that characterize the quality of groove across songs. We evaluate three unsupervised tempo-invariant models for measuring pairwise musical groove similarity: A temporal model, a timbre-temporal model, and a pitchtimbre-temporal model. The temporal model uses a rhythm similarity metric proposed by Holzapfel and Stylianou, while the timbre-inclusive models are built on shift invariant probabilistic latent component analysis. We evaluate the models using a dataset of over 8000 real-world musical recordings spanning approximately 10 genres, several decades, multiple meters, a large range of tempos, and Western and non-Western localities. A blind perceptual study is conducted: given a random music query, humans rate the groove similarity of the top three retrievals chosen by each of the models, as well as three random retrievals

CiteSeerX