Search CORE

41 research outputs found

Deep learning-based music source separation

Author: Mimilakis Stylianos Ioannis
Publication venue
Publication date: 01/01/2021
Field of study

Diese Dissertation befasst sich mit dem Problem der Trennung von Musikquellen durch den Einsatz von deep learning Methoden. Die auf deep learning basierende Trennung von Musikquellen wird unter drei Gesichtspunkten untersucht. Diese Perspektiven sind: die Signalverarbeitung, die neuronale Architektur und die Signaldarstellung. Aus der ersten Perspektive, soll verstanden werden, welche deep learning Modelle, die auf DNNs basieren, für die Aufgabe der Musikquellentrennung lernen, und ob es einen analogen Signalverarbeitungsoperator gibt, der die Funktionalität dieser Modelle charakterisiert. Zu diesem Zweck wird ein neuartiger Algorithmus vorgestellt. Der Algorithmus wird als NCA bezeichnet und destilliert ein optimiertes Trennungsmodell, das aus nicht-linearen Operatoren besteht, in einen einzigen linearen Operator, der leicht zu interpretieren ist. Aus der zweiten Perspektive, soll eine neuronale Netzarchitektur vorgeschlagen werden, die das zuvor erwähnte Konzept der Filterberechnung und -optimierung beinhaltet. Zu diesem Zweck wird die als Masker and Denoiser (MaD) bezeichnete neuronale Netzarchitektur vorgestellt. Die vorgeschlagene Architektur realisiert die Filteroperation unter Verwendung skip-filtering connections Verbindungen. Zusätzlich werden einige Inferenzstrategien und Optimierungsziele vorgeschlagen und diskutiert. Die Leistungsfähigkeit von MaD bei der Musikquellentrennung wird durch eine Reihe von Experimenten bewertet, die sowohl objektive als auch subjektive Bewertungsverfahren umfassen. Abschließend, der Schwerpunkt der dritten Perspektive liegt auf dem Einsatz von DNNs zum Erlernen von solchen Signaldarstellungen, für die Trennung von Musikquellen hilfreich sind. Zu diesem Zweck wird eine neue Methode vorgeschlagen. Die vorgeschlagene Methode verwendet ein neuartiges Umparametrisierungsschema und eine Kombination von Optimierungszielen. Die Umparametrisierung basiert sich auf sinusförmigen Funktionen, die interpretierbare DNN-Darstellungen fördern. Der durchgeführten Experimente deuten an, dass die vorgeschlagene Methode beim Erlernen interpretierbarer Darstellungen effizient eingesetzt werden kann, wobei der Filterprozess noch auf separate Musikquellen angewendet werden kann. Die Ergebnisse der durchgeführten Experimente deuten an, dass die vorgeschlagene Methode beim Erlernen interpretierbarer Darstellungen effizient eingesetzt werden kann, wobei der Filterprozess noch auf separate Musikquellen angewendet werden kann. Darüber hinaus der Einsatz von optimal transport (OT) Entfernungen als Optimierungsziele sind für die Berechnung additiver und klar strukturierter Signaldarstellungen.This thesis addresses the problem of music source separation using deep learning methods. The deep learning-based separation of music sources is examined from three angles. These angles are: the signal processing, the neural architecture, and the signal representation. From the first angle, it is aimed to understand what deep learning models, using deep neural networks (DNNs), learn for the task of music source separation, and if there is an analogous signal processing operator that characterizes the functionality of these models. To do so, a novel algorithm is presented. The algorithm, referred to as the neural couplings algorithm (NCA), distills an optimized separation model consisting of non-linear operators into a single linear operator that is easy to interpret. Using the NCA, it is shown that DNNs learn data-driven filters for singing voice separation, that can be assessed using signal processing. Moreover, by enabling DNNs to learn how to predict filters for source separation, DNNs capture the structure of the target source and learn robust filters. From the second angle, it is aimed to propose a neural network architecture that incorporates the aforementioned concept of filter prediction and optimization. For this purpose, the neural network architecture referred to as the Masker-and-Denoiser (MaD) is presented. The proposed architecture realizes the filtering operation using skip-filtering connections. Additionally, a few inference strategies and optimization objectives are proposed and discussed. The performance of MaD in music source separation is assessed by conducting a series of experiments that include both objective and subjective evaluation processes. Experimental results suggest that the MaD architecture, with some of the studied strategies, is applicable to realistic music recordings, and the MaD architecture has been considered one of the state-of-the-art approaches in the Signal Separation and Evaluation Campaign (SiSEC) 2018. Finally, the focus of the third angle is to employ DNNs for learning signal representations that are helpful for separating music sources. To that end, a new method is proposed using a novel re-parameterization scheme and a combination of optimization objectives. The re-parameterization is based on sinusoidal functions that promote interpretable DNN representations. Results from the conducted experimental procedure suggest that the proposed method can be efficiently employed in learning interpretable representations, where the filtering process can still be applied to separate music sources. Furthermore, the usage of optimal transport (OT) distances as optimization objectives is useful for computing additive and distinctly structured signal representations for various types of music sources

Recommended from our members

Improved methods for single-particle cryogenic electron microscopy

Author: Palovcak Eugene Joseph
Publication venue: eScholarship, University of California
Publication date: 01/01/2019
Field of study

Biological macromolecules such as enzymes are nanoscale machines. This is true in a concrete sense: if the atomic structure of a biological macromolecule can be obtained, the theories of mechanics and intermolecular forces can be applied to explain how the machine works in terms that engineers would understand, including motors, ratchets, gates and transducers. Nevertheless, biological macromolecules are complex, fragile and extremely small, so obtaining their structures is a challenging experimental endeavor. Single-particle cryogenic electron microscopy (cryo-EM) is a technique for determining the 3D structure of a biological macromolecule from a large set of 2D electron micrographs of individual structurally-identical particles. To obtain such images, a solution of the macromolecules must be prepared in the frozen-hydrated state, embedded in a thin electron-transparent glassy film of water. This specimen must then be imaged with a very short exposure to avoid radiation damage. A powerful computer must then be used to sort, align, and average the 2D particle images to back-calculate the 3D structure. At its best, cryo-EM can determine the structures of biological macromolecules to atomic resolution. In practice, this goal is usually not achieved. Cryo-EM has gotten significantly more powerful in the past few years due to improvements in equipment and methodology. Several of the most significant advances originated in the labs of David Agard and Yifan Cheng at UCSF. When I began my PhD with Yifan, the spirit in the lab was that cryo-EM could keep getting better and better: with enough engineering, determining the 3D structure of an arbitrary biological macromolecule would be as routine an experiment as gel electrophoresis or DNA sequencing. Inspired, I took on projects in the lab that I thought would move the field closer to that goal. In the first chapter of this thesis, I describe work I did supporting a project initiated by David Agard and his long-time scientific programmer Shawn Zheng. They developed and implemented an algorithm, MotionCor2, for correcting the complex, anisotropic movements that occur when a frozen-hydrated specimen interacts with the high-energy electron beam. My role was to benchmark MotionCor2 on a panel of real-world 3D reconstruction tasks. I was able to show that MotionCor2 restored the highest resolution details in the images, ultimately yielding significantly better structures than simpler algorithms. For me, this projected highlighted the importance of benchmarking an algorithm for use in routine real-world conditions with the right metrics. In chapter 1, I include the manuscript for the MotionCor2 study, formatted to highlight my contributions that were moved to the supplement in the original publication by Nature Methods. One of the major remaining issues with cryo-EM is sample preparation: preparing the thin freestanding films of frozen-hydrated particles necessarily exposes those particles to air-water interfaces. Many fragile macromolecular complexes denature when exposed to such interfaces, preventing structure determination with cryo-EM. In chapters 2 and 3, I describe my efforts to develop a simple, robust approach to stabilizing fragile macromolecular complexes during the vitrification process. In chapter 2, I develop a method for coating EM grids with an electron-transparent and functionalizable graphene-oxide support film. I demonstrate that such GO grids are compatible with high-resolution structure determination. This work was published in the Journal of Structural Biology in 2018. In chapter 3, I extend this work by functionalizing GO grids with nucleic acids, enabling routine structure determination of uncrosslinked chromatin specimens. In on-going work, I used nucleic acid grids to solve high-resolution structures of a highly fragile specimen, the snf2h-nucleosome complex, and analyzed the conformational heterogeneity of the nucleosome substrate. These results were made possible by the nucleic acid grid, as the other major approach for stabilizing chromatin specimens, chemical crosslinking, not work for this specimen.Perhaps the most fundamental problem with single-particle cryo-EM is the radiation sensitivity of frozen-hydrated macromolecules. To image biological matter with electrons is to destroy it, so obtaining images of undamaged specimens requires very short, highly under sampled exposures. The resultant images are extremely noisy and low contrast, with most particles barely visible from the background. In chapter 4, I describe a novel computational approach to generating contrast in cryo-EM. Using a recently described machine learning strategy for training a parameterized denoising algorithm, I developed a computer program, restore, that denoises cryo-EM images, greatly enhancing their contrast and interpretability. This program leverages recent advances in computer vision and deep learning which have not yet been widely used in cryo-EM image processing algorithms. To characterize the performance of the algorithm on real-world data, I extended conventional metrics for image resolution to measure how an arbitrary transformation affects images at different spatial frequencies. These novel metrics are general and may be useful for characterizing other nonlinear reconstruction algorithms in cryo-EM and medical imaging. Finally, I showed that denoised cryo-EM images maintain the high-resolution information required for accurate 3D reconstruction. Denoising can be applied to conventional cryo-EM images and can be reversed whenever necessary. I have made the software for restore program publicly available and have submitted a manuscript for peer-reviewed publication

eScholarship - University of California

Image denoising with unsupervised, information-theoretic, adaptive filtering

Author: Awate Suyash P.
Publication venue: University of Utah
Publication date: 01/01/2004
Field of study

technical reportThe problem of denoising images is one of the most important and widely studied problems in image processing and computer vision. Various image filtering strategies based on linear systems, statistics, information theory, and variational calculus, have been effective, but invariably make strong assumptions about the properties of the signal and/or noise. Therefore, they lack the generality to be easily applied to new applications or diverse image collections. This paper describes a novel unsupervised, information-theoretic, adaptive filter (UINTA) that improves the predictability of pixel intensities from their neighborhoods by decreasing the joint entropy between them. In this way UINTA automatically discovers the statistical properties of the signal and can thereby reduce noise in a wide spectrum of images and applications. The paper describes the formulation required to minimize the joint entropy measure, presents several important practical considerations in estimating image-region statistics, and then presents a series of results and comparisons on both real and synthetic data