530 research outputs found

    Analysis, Visualization, and Transformation of Audio Signals Using Dictionary-based Methods

    Get PDF
    date-added: 2014-01-07 09:15:58 +0000 date-modified: 2014-01-07 09:15:58 +0000date-added: 2014-01-07 09:15:58 +0000 date-modified: 2014-01-07 09:15:58 +000

    Audio Source Separation Using Sparse Representations

    Get PDF
    This is the author's final version of the article, first published as A. Nesbit, M. G. Jafari, E. Vincent and M. D. Plumbley. Audio Source Separation Using Sparse Representations. In W. Wang (Ed), Machine Audition: Principles, Algorithms and Systems. Chapter 10, pp. 246-264. IGI Global, 2011. ISBN 978-1-61520-919-4. DOI: 10.4018/978-1-61520-919-4.ch010file: NesbitJafariVincentP11-audio.pdf:n\NesbitJafariVincentP11-audio.pdf:PDF owner: markp timestamp: 2011.02.04file: NesbitJafariVincentP11-audio.pdf:n\NesbitJafariVincentP11-audio.pdf:PDF owner: markp timestamp: 2011.02.04The authors address the problem of audio source separation, namely, the recovery of audio signals from recordings of mixtures of those signals. The sparse component analysis framework is a powerful method for achieving this. Sparse orthogonal transforms, in which only few transform coefficients differ significantly from zero, are developed; once the signal has been transformed, energy is apportioned from each transform coefficient to each estimated source, and, finally, the signal is reconstructed using the inverse transform. The overriding aim of this chapter is to demonstrate how this framework, as exemplified here by two different decomposition methods which adapt to the signal to represent it sparsely, can be used to solve different problems in different mixing scenarios. To address the instantaneous (neither delays nor echoes) and underdetermined (more sources than mixtures) mixing model, a lapped orthogonal transform is adapted to the signal by selecting a basis from a library of predetermined bases. This method is highly related to the windowing methods used in the MPEG audio coding framework. In considering the anechoic (delays but no echoes) and determined (equal number of sources and mixtures) mixing case, a greedy adaptive transform is used based on orthogonal basis functions that are learned from the observed data, instead of being selected from a predetermined library of bases. This is found to encode the signal characteristics, by introducing a feedback system between the bases and the observed data. Experiments on mixtures of speech and music signals demonstrate that these methods give good signal approximations and separation performance, and indicate promising directions for future research

    Analysis, visualization, and transformation of audio signals using dictionary-based methods

    Get PDF
    This article provides an overview of dictionary-based methods (DBMs), and reviews recent work in the application of such methods to working with audio and music signals. As Fourier analysis is to additive synthesis, DBMs can be seen as the analytical counterpart to a generalized granular synthesis, where a sound is built by combining heterogeneous atoms selected from a user-defined dictionary. As such, DBMs provide novel ways for analyzing and visualizing audio signals, creating multiresolution descriptions of their contents, and designing sound transformations unique to a description of audio in terms of atoms. 1

    Interpreting Neural Networks through the Polytope Lens

    Full text link
    Mechanistic interpretability aims to explain what a neural network has learned at a nuts-and-bolts level. What are the fundamental primitives of neural network representations? Previous mechanistic descriptions have used individual neurons or their linear combinations to understand the representations a network has learned. But there are clues that neurons and their linear combinations are not the correct fundamental units of description: directions cannot describe how neural networks use nonlinearities to structure their representations. Moreover, many instances of individual neurons and their combinations are polysemantic (i.e. they have multiple unrelated meanings). Polysemanticity makes interpreting the network in terms of neurons or directions challenging since we can no longer assign a specific feature to a neural unit. In order to find a basic unit of description that does not suffer from these problems, we zoom in beyond just directions to study the way that piecewise linear activation functions (such as ReLU) partition the activation space into numerous discrete polytopes. We call this perspective the polytope lens. The polytope lens makes concrete predictions about the behavior of neural networks, which we evaluate through experiments on both convolutional image classifiers and language models. Specifically, we show that polytopes can be used to identify monosemantic regions of activation space (while directions are not in general monosemantic) and that the density of polytope boundaries reflect semantic boundaries. We also outline a vision for what mechanistic interpretability might look like through the polytope lens.Comment: 22/11/22 initial uploa

    High-quality hyperspectral reconstruction using a spectral prior

    Get PDF
    We present a novel hyperspectral image reconstruction algorithm, which overcomes the long-standing tradeoff between spectral accuracy and spatial resolution in existing compressive imaging approaches. Our method consists of two steps: First, we learn nonlinear spectral representations from real-world hyperspectral datasets; for this, we build a convolutional autoencoder, which allows reconstructing its own input through its encoder and decoder networks. Second, we introduce a novel optimization method, which jointly regularizes the fidelity of the learned nonlinear spectral representations and the sparsity of gradients in the spatial domain, by means of our new fidelity prior. Our technique can be applied to any existing compressive imaging architecture, and has been thoroughly tested both in simulation, and by building a prototype hyperspectral imaging system. It outperforms the state-of-the-art methods from each architecture, both in terms of spectral accuracy and spatial resolution, while its computational complexity is reduced by two orders of magnitude with respect to sparse coding techniques. Moreover, we present two additional applications of our method: hyperspectral interpolation and demosaicing. Last, we have created a new high-resolution hyperspectral dataset containing sharper images of more spectral variety than existing ones, available through our project website

    Extensions of independent component analysis for natural image data

    Get PDF
    An understanding of the statistical properties of natural images is useful for any kind of processing to be performed on them. Natural image statistics are, however, in many ways as complex as the world which they depict. Fortunately, the dominant low-level statistics of images are sufficient for many different image processing goals. A lot of research has been devoted to second order statistics of natural images over the years. Independent component analysis is a statistical tool for analyzing higher than second order statistics of data sets. It attempts to describe the observed data as a linear combination of independent, latent sources. Despite its simplicity, it has provided valuable insights of many types of natural data. With natural image data, it gives a sparse basis useful for efficient description of the data. Connections between this description and early mammalian visual processing have been noticed. The main focus of this work is to extend the known results of applying independent component analysis on natural images. We explore different imaging techniques, develop algorithms for overcomplete cases, and study the dependencies between the components by using a model that finds a topographic ordering for the components as well as by conditioning the statistics of a component on the activity of another. An overview is provided of the associated problem field, and it is discussed how these relatively small results may eventually be a part of a more complete solution to the problem of vision.reviewe
    • …
    corecore