16 research outputs found
Riemannian geometry applied to BCI classification
ISBN 978-3-642-15994-7, SoftcoverInternational audienceIn brain-computer interfaces based on motor imagery, covariance matrices are widely used through spatial filters computation and other signal processing methods. Covariance matrices lie in the space of Symmetric Positives-Definite (SPD) matrices and therefore, fall within the Riemannian geometry domain. Using a differential geometry framework, we propose different algorithms in order to classify covariance matrices in their native space
Влияние искажений усилителя мощности на сигналы с ортогональной субполосной базой
Рассматривается ортогональный субполосный базис и влияние нелинейности усилителя мощности на уровень внеполосного излучения и искажения символов на приемной стороне после декодирования. Описан метод формирования и обработки канального сигнала с использованием ортогональной субполосной баз
Deep Polyphonic ADSR Piano Note Transcription
We investigate a late-fusion approach to piano transcription, combined with a
strong temporal prior in the form of a handcrafted Hidden Markov Model (HMM).
The network architecture under consideration is compact in terms of its number
of parameters and easy to train with gradient descent. The network outputs are
fused over time in the final stage to obtain note segmentations, with an HMM
whose transition probabilities are chosen based on a model of attack, decay,
sustain, release (ADSR) envelopes, commonly used for sound synthesis. The note
segments are then subject to a final binary decision rule to reject too weak
note segment hypotheses. We obtain state-of-the-art results on the MAPS
dataset, and are able to outperform other approaches by a large margin, when
predicting complete note regions from onsets to offsets.Comment: 5 pages, 2 figures, published as ICASSP'1
Blind Source Separation Based on Joint Diagonalization in R: The Packages JADE and BSSasymp
Blind source separation (BSS) is a well-known signal processing tool which is used to solve practical data analysis problems in various fields of science. In BSS, we assume that the observed data consists of linear mixtures of latent variables. The mixing system and the distributions of the latent variables are unknown. The aim is to find an estimate of an unmixing matrix which then transforms the observed data back to latent sources. In this paper we present the R packages JADE and BSSasymp. The package JADE offers several BSS methods which are based on joint diagonalization. Package BSSasymp contains functions for computing the asymptotic covariance matrices as well as their data-based estimates for most of the BSS estimators included in package JADE. Several simulated and real datasets are used to illustrate the functions in these two packages.</p
Principled methods for mixtures processing
This document is my thesis for getting the habilitation à diriger des recherches, which is the french diploma that is required to fully supervise Ph.D. students. It summarizes the research I did in the last 15 years and also provides the shortterm research directions and applications I want to investigate. Regarding my past research, I first describe the work I did on probabilistic audio modeling, including the separation of Gaussian and αstable stochastic processes. Then, I mention my work on deep learning applied to audio, which rapidly turned into a large effort for community service. Finally, I present my contributions in machine learning, with some works on hardware compressed sensing and probabilistic generative models.My research programme involves a theoretical part that revolves around probabilistic machine learning, and an applied part that concerns the processing of time series arising in both audio and life sciences
Robust speech recognition with spectrogram factorisation
Communication by speech is intrinsic for humans. Since the breakthrough of mobile devices and wireless communication, digital transmission of speech has become ubiquitous. Similarly distribution and storage of audio and video data has increased rapidly. However, despite being technically capable to record and process audio signals, only a fraction of digital systems and services are actually able to work with spoken input, that is, to operate on the lexical content of speech. One persistent obstacle for practical deployment of automatic speech recognition systems is inadequate robustness against noise and other interferences, which regularly corrupt signals recorded in real-world environments.
Speech and diverse noises are both complex signals, which are not trivially separable. Despite decades of research and a multitude of different approaches, the problem has not been solved to a sufficient extent. Especially the mathematically ill-posed problem of separating multiple sources from a single-channel input requires advanced models and algorithms to be solvable. One promising path is using a composite model of long-context atoms to represent a mixture of non-stationary sources based on their spectro-temporal behaviour. Algorithms derived from the family of non-negative matrix factorisations have been applied to such problems to separate and recognise individual sources like speech.
This thesis describes a set of tools developed for non-negative modelling of audio spectrograms, especially involving speech and real-world noise sources. An overview is provided to the complete framework starting from model and feature definitions, advancing to factorisation algorithms, and finally describing different routes for separation, enhancement, and recognition tasks. Current issues and their potential solutions are discussed both theoretically and from a practical point of view. The included publications describe factorisation-based recognition systems, which have been evaluated on publicly available speech corpora in order to determine the efficiency of various separation and recognition algorithms. Several variants and system combinations that have been proposed in literature are also discussed. The work covers a broad span of factorisation-based system components, which together aim at providing a practically viable solution to robust processing and recognition of speech in everyday situations
Applications in Industry
International audienc
Pitch-Informed Solo and Accompaniment Separation
Das Thema dieser Dissertation ist die Entwicklung eines Systems zur
Tonhöhen-informierten Quellentrennung von Musiksignalen in Soloinstrument
und Begleitung. Dieses ist geeignet, die dominanten Instrumente aus einem
Musikstück zu isolieren, unabhängig von der Art des Instruments, der
Begleitung und Stilrichtung. Dabei werden nur einstimmige
Melodieinstrumente in Betracht gezogen. Die Musikaufnahmen liegen monaural
vor, es kann also keine zusätzliche Information aus der Verteilung der
Instrumente im Stereo-Panorama gewonnen werden.
Die entwickelte Methode nutzt Tonhöhen-Information als Basis für eine
sinusoidale Modellierung der spektralen Eigenschaften des Soloinstruments
aus dem Musikmischsignal. Anstatt die spektralen Informationen pro Frame zu
bestimmen, werden in der vorgeschlagenen Methode Tonobjekte für die
Separation genutzt. Tonobjekt-basierte Verarbeitung ermöglicht es,
zusätzlich die Notenanfänge zu verfeinern, transiente Artefakte zu
reduzieren, gemeinsame Amplitudenmodulation (Common Amplitude Modulation
CAM) einzubeziehen und besser nichtharmonische Elemente der Töne
abzuschätzen. Der vorgestellte Algorithmus zur Quellentrennung von
Soloinstrument und Begleitung ermöglicht eine Echtzeitverarbeitung und ist
somit relevant für den praktischen Einsatz.
Ein Experiment zur besseren Modellierung der Zusammenhänge zwischen
Magnitude, Phase und Feinfrequenz von isolierten Instrumententönen wurde
durchgeführt. Als Ergebnis konnte die Kontinuität der zeitlichen
Einhüllenden, die Inharmonizität bestimmter Musikinstrumente und die
Auswertung des Phasenfortschritts für die vorgestellte Methode ausgenutzt
werden. Zusätzlich wurde ein Algorithmus für die Quellentrennung in
perkussive und harmonische Signalanteile auf Basis des Phasenfortschritts
entwickelt. Dieser erreicht ein verbesserte perzeptuelle Qualität der
harmonischen und perkussiven Signale gegenüber vergleichbaren Methoden nach
dem Stand der Technik.
Die vorgestellte Methode zur Klangquellentrennung in Soloinstrument und
Begleitung wurde zu den Evaluationskampagnen SiSEC 2011 und SiSEC 2013
eingereicht. Dort konnten vergleichbare Ergebnisse im Hinblick auf
perzeptuelle Bewertungsmaße erzielt werden. Die Qualität eines
Referenzalgorithmus im Hinblick auf den in dieser Dissertation
beschriebenen Instrumentaldatensatz übertroffen werden.
Als ein Anwendungsszenario für die Klangquellentrennung in Solo und
Begleitung wurde ein Hörtest durchgeführt, der die Qualitätsanforderungen
an Quellentrennung im Kontext von Musiklernsoftware bewerten sollte. Die
Ergebnisse dieses Hörtests zeigen, dass die Solo- und Begleitspur gemäß
unterschiedlicher Qualitätskriterien getrennt werden sollten. Die
Musiklernsoftware Songs2See integriert die vorgestellte
Klangquellentrennung bereits in einer kommerziell erhältlichen Anwendung.This thesis addresses the development of a system for pitch-informed solo
and accompaniment separation capable of separating main instruments from
music accompaniment regardless of the musical genre of the track, or type
of music accompaniment. For the solo instrument, only pitched monophonic
instruments were considered in a single-channel scenario where no panning
or spatial location information is available.
In the proposed method, pitch information is used as an initial stage of a
sinusoidal modeling approach that attempts to estimate the spectral
information of the solo instrument from a given audio mixture. Instead of
estimating the solo instrument on a frame by frame basis, the proposed
method gathers information of tone objects to perform separation.
Tone-based processing allowed the inclusion of novel processing stages for
attack refinement, transient interference reduction, common amplitude
modulation (CAM) of tone objects, and for better estimation of non-harmonic
elements that can occur in musical instrument tones. The proposed solo and
accompaniment algorithm is an efficient method suitable for real-world
applications.
A study was conducted to better model magnitude, frequency, and phase of
isolated musical instrument tones. As a result of this study, temporal
envelope smoothness, inharmonicty of musical instruments, and phase
expectation were exploited in the proposed separation method. Additionally,
an algorithm for harmonic/percussive separation based on phase expectation
was proposed. The algorithm shows improved perceptual quality with respect
to state-of-the-art methods for harmonic/percussive separation.
The proposed solo and accompaniment method obtained perceptual quality
scores comparable to other state-of-the-art algorithms under the SiSEC 2011
and SiSEC 2013 campaigns, and outperformed the comparison algorithm on the
instrumental dataset described in this thesis.As a use-case of solo and
accompaniment separation, a listening test procedure was conducted to
assess separation quality requirements in the context of music education.
Results from the listening test showed that solo and accompaniment tracks
should be optimized differently to suit quality requirements of music
education. The Songs2See application was presented as commercial music
learning software which includes the proposed solo and accompaniment
separation method
Multimodal approach for pilot mental state detection based on EEG
The safety of flight operations depends on the cognitive abilities of pilots. In recent years, there has been growing concern about potential accidents caused by the declining mental states of pilots. We have developed a novel multimodal approach for mental state detection in pilots using electroencephalography (EEG) signals. Our approach includes an advanced automated preprocessing pipeline to remove artefacts from the EEG data, a feature extraction method based on Riemannian geometry analysis of the cleaned EEG data, and a hybrid ensemble learning technique that combines the results of several machine learning classifiers. The proposed approach provides improved accuracy compared to existing methods, achieving an accuracy of 86% when tested on cleaned EEG data. The EEG dataset was collected from 18 pilots who participated in flight experiments and publicly released at NASA’s open portal. This study presents a reliable and efficient solution for detecting mental states in pilots and highlights the potential of EEG signals and ensemble learning algorithms in developing cognitive cockpit systems. The use of an automated preprocessing pipeline, feature extraction method based on Riemannian geometry analysis, and hybrid ensemble learning technique set this work apart from previous efforts in the field and demonstrates the innovative nature of the proposed approach
Processus gaussiens pour la séparation de sources et le codage informé
La séparation de sources est la tâche qui consiste à récupérer plusieurs signaux dont on observe un ou plusieurs mélanges. Ce problème est particulièrement difficile et de manière à rendre la séparation possible, toute information supplémentaire connue sur les sources ou le mélange doit pouvoir être prise en compte. Dans cette thèse, je propose un formalisme général permettant d inclure de telles connaissances dans les problèmes de séparation, où une source est modélisée comme la réalisation d un processus gaussien. L approche a de nombreux intérêts : elle généralise une grande partie des méthodes actuelles, elle permet la prise en compte de nombreux a priori et les paramètres du modèle peuvent être estimés efficacement. Ce cadre théorique est appliqué à la séparation informée de sources audio, où la séparation est assistée d'une information annexe calculée en amont de la séparation, lors d une phase préliminaire où à la fois le mélange et les sources sont disponibles. Pour peu que cette information puisse se coder efficacement, cela rend possible des applications comme le karaoké ou la manipulation des différents instruments au sein d'un mix à un coût en débit bien plus faible que celui requis par la transmission séparée des sources. Ce problème de la séparation informée s apparente fortement à un problème de codage multicanal. Cette analogie permet de placer la séparation informée dans un cadre théorique plus global où elle devient un problème de codage particulier et bénéficie à ce titre des résultats classiques de la théorie du codage, qui permettent d optimiser efficacement les performances.Source separation consists in recovering different signals that are only observed through their mixtures. To solve this difficult problem, any available prior information about the sources must be used so as to better identify them among all possible solutions. In this thesis, I propose a general framework, which permits to include a large diversity of prior information into source separation. In this framework, the sources signals are modeled as the outcomes of independent Gaussian processes, which are powerful and general nonparametric Bayesian models. This approach has many advantages: it permits the separation of sources defined on arbitrary input spaces, it permits to take many kinds of prior knowledge into account and also leads to automatic parameters estimation. This theoretical framework is applied to the informed source separation of audio sources. In this setup, a side-information is computed beforehand on the sources themselves during a so-called encoding stage where both sources and mixtures are available. In a subsequent decoding stage, the sources are recovered using this information and the mixtures only. Provided this information can be encoded efficiently, it permits popular applications such as karaoke or active listening using a very small bitrate compared to separate transmission of the sources. It became clear that informed source separation is very akin to a multichannel coding problem. With this in mind, it was straightforwardly cast into information theory as a particular source-coding problem, which permits to derive its optimal performance as rate-distortion functions as well as practical coding algorithms achieving these bounds.PARIS-Télécom ParisTech (751132302) / SudocSudocFranceF