Search CORE

262 research outputs found

Audio source separation into the wild

Author: Aichner
Anguera Miro
Araki
Araki
Arberet
Arberet
Arberet
Attias
Avargel
Avargel
Badeau
Benaroya
Benesty
Bertrand
Bertrand
Bishop
Bustamante
Cardoso
Cemgil
Chazan
Chazan
Cherkassky
Cook
Cox
Crochiere
Dempster
DiBiase
Dillon
Doclo
Doclo
Drude
Duong
Duong
Dvorkind
Evers
Evers
Fallon
Feng
Févotte
Févotte
Gannot
Gannot
Gannot
Gilloire
Girgis
Girin
Habets
Hadad
Hershey
Higuchi
Higuchi
Higuchi
Hild
Hori
Ikram
Kamkar-Parsi
Kleijn
Kounades-Bastian
Kounades-Bastian
Kounades-Bastian
Kounades-Bastian
Kounades-Bastian
Koutras
Kowalski
Kuttruff
Laufer
Lee
Leglaive
Leglaive
Leglaive
Li
Li
Li
Li
Liutkus
Loesch
Loizou
Luo
Lyon
Löllmann
Ma
Malik
Mandel
Markovich
Markovich-Golan
Markovich-Golan
Markovich-Golan
Markovich-Golan
Marquardt
Mitianoudis
Mukai
Nakadai
Nakadai
Narayanan
Nesta
Nugraha
O'Connor
O'Grady
Ozerov
Ozerov
Ozerov
Parra
Parra
Parsons
Pedersen
Pertilä
Plumbley
Prieto
Roman
Roman
Sawada
Sawada
Schmid
Schmidt
Schwartz
Schwartz
Schwartz
Simon
Smaragdis
Sturmel
Talmon
Talmon
Thiergart
Thiergart
Valin
Van Trees
Vijayasenan
Vincent
Vincent
Wang
Wang
Wang
Wang
Warsitz
Wehr
Weinstein
Widrow
Winter
Yilmaz
Yoshioka
Zeng
Zhang
Publication venue: 'Elsevier BV'
Publication date: 16/11/2018
Field of study

International audienceThis review chapter is dedicated to multichannel audio source separation in real-life environment. We explore some of the major achievements in the field and discuss some of the remaining challenges. We will explore several important practical scenarios, e.g. moving sources and/or microphones, varying number of sources and sensors, high reverberation levels, spatially diffuse sources, and synchronization problems. Several applications such as smart assistants, cellular phones, hearing aids and robots, will be discussed. Our perspectives on the future of the field will be given as concluding remarks of this chapter

Crossref

Hal - Université Grenoble Alpes

INRIA a CCSD electronic archive server

Hal-Diderot

Trennung und Schätzung der Anzahl von Audiosignalquellen mit Zeit- und Frequenzüberlappung

Author: Stöter Fabian-Robert
Publication venue
Publication date: 01/01/2020
Field of study

Everyday audio recordings involve mixture signals: music contains a mixture of instruments; in a meeting or conference, there is a mixture of human voices. For these mixtures, automatically separating or estimating the number of sources is a challenging task. A common assumption when processing mixtures in the time-frequency domain is that sources are not fully overlapped. However, in this work we consider some cases where the overlap is severe — for instance, when instruments play the same note (unison) or when many people speak concurrently ("cocktail party") — highlighting the need for new representations and more powerful models. To address the problems of source separation and count estimation, we use conventional signal processing techniques as well as deep neural networks (DNN). We ﬁrst address the source separation problem for unison instrument mixtures, studying the distinct spectro-temporal modulations caused by vibrato. To exploit these modulations, we developed a method based on time warping, informed by an estimate of the fundamental frequency. For cases where such estimates are not available, we present an unsupervised model, inspired by the way humans group time-varying sources (common fate). This contribution comes with a novel representation that improves separation for overlapped and modulated sources on unison mixtures but also improves vocal and accompaniment separation when used as an input for a DNN model. Then, we focus on estimating the number of sources in a mixture, which is important for real-world scenarios. Our work on count estimation was motivated by a study on how humans can address this task, which lead us to conduct listening experiments, conﬁrming that humans are only able to estimate the number of up to four sources correctly. To answer the question of whether machines can perform similarly, we present a DNN architecture, trained to estimate the number of concurrent speakers. Our results show improvements compared to other methods, and the model even outperformed humans on the same task. In both the source separation and source count estimation tasks, the key contribution of this thesis is the concept of “modulation”, which is important to computationally mimic human performance. Our proposed Common Fate Transform is an adequate representation to disentangle overlapping signals for separation, and an inspection of our DNN count estimation model revealed that it proceeds to ﬁnd modulation-like intermediate features.Im Alltag sind wir von gemischten Signalen umgeben: Musik besteht aus einer Mischung von Instrumenten; in einem Meeting oder auf einer Konferenz sind wir einer Mischung menschlicher Stimmen ausgesetzt. Für diese Mischungen ist die automatische Quellentrennung oder die Bestimmung der Anzahl an Quellen eine anspruchsvolle Aufgabe. Eine häuﬁge Annahme bei der Verarbeitung von gemischten Signalen im Zeit-Frequenzbereich ist, dass die Quellen sich nicht vollständig überlappen. In dieser Arbeit betrachten wir jedoch einige Fälle, in denen die Überlappung immens ist zum Beispiel, wenn Instrumente den gleichen Ton spielen (unisono) oder wenn viele Menschen gleichzeitig sprechen (Cocktailparty) —, so dass neue Signal-Repräsentationen und leistungsfähigere Modelle notwendig sind. Um die zwei genannten Probleme zu bewältigen, verwenden wir sowohl konventionelle Signalverbeitungsmethoden als auch tiefgehende neuronale Netze (DNN). Wir gehen zunächst auf das Problem der Quellentrennung für Unisono-Instrumentenmischungen ein und untersuchen die speziellen, durch Vibrato ausgelösten, zeitlich-spektralen Modulationen. Um diese Modulationen auszunutzen entwickelten wir eine Methode, die auf Zeitverzerrung basiert und eine Schätzung der Grundfrequenz als zusätzliche Information nutzt. Für Fälle, in denen diese Schätzungen nicht verfügbar sind, stellen wir ein unüberwachtes Modell vor, das inspiriert ist von der Art und Weise, wie Menschen zeitveränderliche Quellen gruppieren (Common Fate). Dieser Beitrag enthält eine neuartige Repräsentation, die die Separierbarkeit für überlappte und modulierte Quellen in Unisono-Mischungen erhöht, aber auch die Trennung in Gesang und Begleitung verbessert, wenn sie in einem DNN-Modell verwendet wird. Im Weiteren beschäftigen wir uns mit der Schätzung der Anzahl von Quellen in einer Mischung, was für reale Szenarien wichtig ist. Unsere Arbeit an der Schätzung der Anzahl war motiviert durch eine Studie, die zeigt, wie wir Menschen diese Aufgabe angehen. Dies hat uns dazu veranlasst, eigene Hörexperimente durchzuführen, die bestätigten, dass Menschen nur in der Lage sind, die Anzahl von bis zu vier Quellen korrekt abzuschätzen. Um nun die Frage zu beantworten, ob Maschinen dies ähnlich gut können, stellen wir eine DNN-Architektur vor, die erlernt hat, die Anzahl der gleichzeitig sprechenden Sprecher zu ermitteln. Die Ergebnisse zeigen Verbesserungen im Vergleich zu anderen Methoden, aber vor allem auch im Vergleich zu menschlichen Hörern. Sowohl bei der Quellentrennung als auch bei der Schätzung der Anzahl an Quellen ist ein Kernbeitrag dieser Arbeit das Konzept der “Modulation”, welches wichtig ist, um die Strategien von Menschen mittels Computern nachzuahmen. Unsere vorgeschlagene Common Fate Transformation ist eine adäquate Darstellung, um die Überlappung von Signalen für die Trennung zugänglich zu machen und eine Inspektion unseres DNN-Zählmodells ergab schließlich, dass sich auch hier modulationsähnliche Merkmale ﬁnden lassen

Statistical Machine Learning for Modeling and Control of Stochastic Structured Systems

Author: Abdulsamad Hany
Publication venue
Publication date: 01/01/2022
Field of study

Machine learning and its various applications have driven innovation in robotics, synthetic perception, and data analytics. The last decade especially has experienced an explosion in interest in the research and development of artificial intelligence with successful adoption and deployment in some domains. A significant force behind these advances has been an abundance of data and the evolution of simple computational models and tools with a capacity to scale up to massive learning automata. Monolithic neural networks with billions of parameters that rely on automatic differentiation are a prime example of the significant role efficient computation has had on supercharging the ability of well-established representations to extract intelligent patterns from unstructured data. Nonetheless, despite the strides taken in the digital domains of vision and natural language processing, applications of optimal control and robotics significantly trail behind and have not been able to capitalize as much on the latest trends of machine learning. This discrepancy can be explained by the limited transferability of learning concepts that rely on full differentiability to the heavily structured physical and human interaction environments, not to mention the substantial cost of data generation on real physical systems. Therefore, these factors severely limit the application scope of loosely-structured over-parameterized data-crunching machines in the mechanical realm of robot learning and control. This thesis investigates modeling paradigms of hierarchical and switching systems to tackle some of the previously highlighted issues. This research direction is motivated by insights into universal function approximation via local cooperating units and the promise of inherently regularized representations through explicit structural design. Moreover, we explore ideas from robust optimization that address model mismatch issues in statistical models and outline how related methods may be used to improve the tractability of state filtering in stochastic hybrid systems. In Chapter 2, we consider hierarchical modeling for general regression problems. The presented approach is a generative probabilistic interpretation of local regression techniques that approximate nonlinear functions through a set of local linear or polynomial units. The number of available units is crucial in such models, as it directly balances representational power with the parametric complexity. This ambiguity is addressed by using principles from Bayesian nonparametrics to formulate flexible models that adapt their complexity to the data and can potentially encompass an infinite number of components. To learn these representations, we present two efficient variational inference techniques that scale well with data and highlight the advantages of hierarchical infinite local regression models, such as dealing with non-smooth functions, mitigating catastrophic forgetting, and enabling parameter sharing and fast predictions. Finally, we validate this approach on a set of large inverse dynamics datasets and test the learned models in real-world control scenarios. Chapter 3 addresses discrete-continuous hybrid modeling and control for stochastic dynamical systems, which implies dealing with time-series data. In this scenario, we develop an automatic system identification technique that decomposes nonlinear systems into hybrid automata and leverages the resulting structure to learn switching feedback control via hierarchical reinforcement learning. In the process, we rely on an augmented closed-loop hidden Markov model architecture that captures time correlations over long horizons and provides a principled Bayesian inference framework for learning hybrid representations and filtering the hidden discrete states to apply control accordingly. Finally, we embed this structure explicitly into a novel hybrid relative entropy policy search algorithm that optimizes a set of local polynomial feedback controllers and value functions. We validate the overall switching-system perspective by benchmarking the open-loop predictive performance against popular black-box representations. We also provide qualitative empirical results for hybrid reinforcement learning on common nonlinear control tasks. In Chapter 4, we attend to a general and fundamental problem in learning for control, namely robustness in data-driven stochastic optimization. The question of sensitivity has a strong priority, given the rising popularity of embedding statistical models into stochastic control frameworks. However, data from dynamical, especially mechanical, systems is often scarce due to a high extraction cost and limited coverage of the state-action space. The result is usually poor models with narrow validity and brittle control laws, particularly in an ill-posed over-parameterized learning example. We propose to robustify stochastic control by finding the worst-case distribution over the dynamics and optimizing a corresponding robust policy that minimizes the probability of catastrophic failures. We achieve this goal by formulating a two-stage iterative minimax optimization problem that finds the most pessimistic adversary in a trust region around a nominal model and uses it to optimize a robust optimal controller. We test this approach on a set of linear and nonlinear stochastic systems and supply empirical evidence of its practicality. Finally, we provide an outlook on how similar multi-stage distributional optimization techniques can be applied in approximate filtering of stochastic switching systems in order to tackle the issue of exponential explosion in state mixture components. In summation, the individual contributions of this thesis are a collection of interconnected principles for structured and robust learning for control. Although many challenges remain ahead, this research lays a foundation for reflecting on future structured learning questions that strive to combine optimal control and statistical machine learning perspectives for the automatic decomposition and optimization of hierarchical models

TUbiblio

tuprints

Multimodal Image Fusion and Its Applications.

Author: Chen Yu-Hui
Publication venue
Publication date: 01/01/2015
Field of study

Image fusion integrates different modality images to provide comprehensive information of the image content, increasing interpretation capabilities and producing more reliable results. There are several advantages of combining multi-modal images, including improving geometric corrections, complementing data for improved classification, and enhancing features for analysis...etc. This thesis develops the image fusion idea in the context of two domains: material microscopy and biomedical imaging. The proposed methods include image modeling, image indexing, image segmentation, and image registration. The common theme behind all proposed methods is the use of complementary information from multi-modal images to achieve better registration, feature extraction, and detection performances. In material microscopy, we propose an anomaly-driven image fusion framework to perform the task of material microscopy image analysis and anomaly detection. This framework is based on a probabilistic model that enables us to index, process and characterize the data with systematic and well-developed statistical tools. In biomedical imaging, we focus on the multi-modal registration problem for functional MRI (fMRI) brain images which improves the performance of brain activation detection.PhDElectrical Engineering: SystemsUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttp://deepblue.lib.umich.edu/bitstream/2027.42/120701/1/yuhuic_1.pd

Deep Blue Documents at the University of Michigan

Sparse Modeling for Image and Vision Processing

Author: Ecole Normale Supérieure
Francis Bach
Francis Bach
Hal Id Hal
Jean Ponce
Jean Ponce
Julien Mairal
Julien Mairal
Sparse Modeling Image
Vision Processing
Publication venue
Publication date: 01/01/2014
Field of study

In recent years, a large amount of multi-disciplinary research has been conducted on sparse models and their applications. In statistics and machine learning, the sparsity principle is used to perform model selection---that is, automatically selecting a simple model among a large collection of them. In signal processing, sparse coding consists of representing data with linear combinations of a few dictionary elements. Subsequently, the corresponding tools have been widely adopted by several scientific communities such as neuroscience, bioinformatics, or computer vision. The goal of this monograph is to offer a self-contained view of sparse modeling for visual recognition and image processing. More specifically, we focus on applications where the dictionary is learned and adapted to data, yielding a compact representation that has been successful in various contexts.Comment: 205 pages, to appear in Foundations and Trends in Computer Graphics and Visio

arXiv.org e-Print Archive

CiteSeerX

Hal - Université Grenoble Alpes

INRIA a CCSD electronic archive server

Recommended from our members

Sequential Modelling and Inference of High-frequency Limit Order Book with State-space Models and Monte Carlo Algorithms

Author: Li Chenhao
Publication venue: University of Cambridge
Publication date: 30/09/2020
Field of study

The high-frequency limit order book (LOB) market has recently attracted increasing research attention from both the industry and the academia as a result of expanding algorithmic trading. However, the massive data throughput and the inherent complexity of high-frequency market dynamics also present challenges to some classic statistical modelling approaches. By adopting powerful state-space models from the field of signal processing as well as a number of Bayesian inference algorithms such as particle filtering, Markov chain Monte Carlo and variational inference algorithms, this thesis presents my extensive research into the high-frequency limit order book covering a wide scope of topics. Chapter 2 presents a novel construction of the non-homogeneous Poisson process to allow online intensity inference of limit order transactions arriving at a central exchange as point data. Chapter 3 extends a baseline jump diffusion model for market fair-price process to include three additional model features taken from real-world market intuitions. In Chapter 4, another price model is developed to account for both long-term and short-term diffusion behaviours of the price process. This is achieved by incorporating multiple jump-diffusion processes each exhibiting a unique characteristic. Chapter 5 observes the multi-regime nature of price diffusion processes as well as the non-Markovian switching behaviour between regimes. As such, a novel model is proposed which combines the continuous-time state-space model, the hidden semi-Markov switching model and the non-parametric Dirichlet process model. Additionally, building upon the general structure of the particle Markov chain Monte Carlo algorithm, I further propose an algorithm which achieves sequential state inference, regime identification and regime parameters learning requiring minimal prior assumptions. Chapter 6 focuses on the development of efficient parameter-learning algorithms for state-space models and presents three algorithms each demonstrating promising results in comparison to some well-established methods. The models and algorithms proposed in this thesis not only are practical tools for analysing high-frequency LOB markets, but can also be applied in various areas and disciplines beyond finance

Apollo (Cambridge)

Structure-aware image denoising, super-resolution, and enhancement methods

Author: Bodduna Kireeti
Publication venue: Saarländische Universitäts- und Landesbibliothek
Publication date: 01/01/2021
Field of study

Denoising, super-resolution and structure enhancement are classical image processing applications. The motive behind their existence is to aid our visual analysis of raw digital images. Despite tremendous progress in these fields, certain difficult problems are still open to research. For example, denoising and super-resolution techniques which possess all the following properties, are very scarce: They must preserve critical structures like corners, should be robust to the type of noise distribution, avoid undesirable artefacts, and also be fast. The area of structure enhancement also has an unresolved issue: Very little efforts have been put into designing models that can tackle anisotropic deformations in the image acquisition process. In this thesis, we design novel methods in the form of partial differential equations, patch-based approaches and variational models to overcome the aforementioned obstacles. In most cases, our methods outperform the existing approaches in both quality and speed, despite being applicable to a broader range of practical situations.Entrauschen, Superresolution und Strukturverbesserung sind klassische Anwendungen der Bildverarbeitung. Ihre Existenz bedingt sich in dem Bestreben, die visuelle Begutachtung digitaler Bildrohdaten zu unterstützen. Trotz erheblicher Fortschritte in diesen Feldern bedürfen bestimmte schwierige Probleme noch weiterer Forschung. So sind beispielsweise Entrauschungsund Superresolutionsverfahren, welche alle der folgenden Eingenschaften besitzen, sehr selten: die Erhaltung wichtiger Strukturen wie Ecken, Robustheit bezüglich der Rauschverteilung, Vermeidung unerwünschter Artefakte und niedrige Laufzeit. Auch im Gebiet der Strukturverbesserung liegt ein ungelöstes Problem vor: Bisher wurde nur sehr wenig Forschungsaufwand in die Entwicklung von Modellen investieret, welche anisotrope Deformationen in bildgebenden Verfahren bewältigen können. In dieser Arbeit entwerfen wir neue Methoden in Form von partiellen Differentialgleichungen, patch-basierten Ansätzen und Variationsmodellen um die oben erwähnten Hindernisse zu überwinden. In den meisten Fällen übertreffen unsere Methoden nicht nur qualitativ die bisher verwendeten Ansätze, sondern lösen die gestellten Aufgaben auch schneller. Zudem decken wir mit unseren Modellen einen breiteren Bereich praktischer Fragestellungen ab

Universaar

Acronym

The University Defence Research Collaboration In Signal Processing

Author: Aparicio-Navarro Francisco J.
Barrington Stephen
Chambers Jonathon A.
Gong Yu
Kyriakopoulos Konstantinos
Rixson Matthew
Publication venue: Defence Science and Technology Laboratory (Dstl) publication, DSTL/PUB107185.
Publication date: 08/02/2018
Field of study

This chapter describes the development of algorithms for automatic detection of anomalies from multi-dimensional, undersampled and incomplete datasets. The challenge in this work is to identify and classify behaviours as normal or abnormal, safe or threatening, from an irregular and often heterogeneous sensor network. Many defence and civilian applications can be modelled as complex networks of interconnected nodes with unknown or uncertain spatio-temporal relations. The behavior of such heterogeneous networks can exhibit dynamic properties, reflecting evolution in both network structure (new nodes appearing and existing nodes disappearing), as well as inter-node relations. The UDRC work has addressed not only the detection of anomalies, but also the identification of their nature and their statistical characteristics. Normal patterns and changes in behavior have been incorporated to provide an acceptable balance between true positive rate, false positive rate, performance and computational cost. Data quality measures have been used to ensure the models of normality are not corrupted by unreliable and ambiguous data. The context for the activity of each node in complex networks offers an even more efficient anomaly detection mechanism. This has allowed the development of efficient approaches which not only detect anomalies but which also go on to classify their behaviour

De Montfort University Open Research Archive

Machine Learning and Its Application to Reacting Flows

Author
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 20/01/2023
Field of study

This open access book introduces and explains machine learning (ML) algorithms and techniques developed for statistical inferences on a complex process or system and their applications to simulations of chemically reacting turbulent flows. These two fields, ML and turbulent combustion, have large body of work and knowledge on their own, and this book brings them together and explain the complexities and challenges involved in applying ML techniques to simulate and study reacting flows. This is important as to the world’s total primary energy supply (TPES), since more than 90% of this supply is through combustion technologies and the non-negligible effects of combustion on environment. Although alternative technologies based on renewable energies are coming up, their shares for the TPES is are less than 5% currently and one needs a complete paradigm shift to replace combustion sources. Whether this is practical or not is entirely a different question, and an answer to this question depends on the respondent. However, a pragmatic analysis suggests that the combustion share to TPES is likely to be more than 70% even by 2070. Hence, it will be prudent to take advantage of ML techniques to improve combustion sciences and technologies so that efficient and “greener” combustion systems that are friendlier to the environment can be designed. The book covers the current state of the art in these two topics and outlines the challenges involved, merits and drawbacks of using ML for turbulent combustion simulations including avenues which can be explored to overcome the challenges. The required mathematical equations and backgrounds are discussed with ample references for readers to find further detail if they wish. This book is unique since there is not any book with similar coverage of topics, ranging from big data analysis and machine learning algorithm to their applications for combustion science and system design for energy generation

Directory of Open Access Books (DOAB)