40 research outputs found
Principled methods for mixtures processing
This document is my thesis for getting the habilitation à diriger des recherches, which is the french diploma that is required to fully supervise Ph.D. students. It summarizes the research I did in the last 15 years and also provides the shortterm research directions and applications I want to investigate. Regarding my past research, I first describe the work I did on probabilistic audio modeling, including the separation of Gaussian and αstable stochastic processes. Then, I mention my work on deep learning applied to audio, which rapidly turned into a large effort for community service. Finally, I present my contributions in machine learning, with some works on hardware compressed sensing and probabilistic generative models.My research programme involves a theoretical part that revolves around probabilistic machine learning, and an applied part that concerns the processing of time series arising in both audio and life sciences
Designing a Patient-Centered Clinical Workflow to Assess Cyberbully Experiences of Youths in the U.S. Healthcare System
Cyberbullying or online harassment is often defined as when someone repeatedly and intentionally harasses, mistreats, or makes fun of others aiming to scare, anger or shame them using electronic devices [296]. Youths experiencing cyberbullying report higher levels of anxiety and depression, mental distress, suicide thoughts, and substance abuse than their non-bullied peers [360, 605, 261, 354]. Even though bullying is associated with significant health problems, to date, very little youth anti-bullying efforts are initiated and directed in clinical settings. There is presently no standardized procedure or workflow across health systems for systematically assessing cyberbullying or other equally dangerous online activities among vulnerable groups like children or adolescents [599]. Therefore, I developed a series of research projects to link digital indicators of cyberbullying or online harassment to clinical practices by advocating design considerations for a patient-centered clinical assessment and workflow that addresses patients’ needs and expectations to ensure quality care. Through this dissertation, I aim to answer these high-level research questions:RQ1. How does the presence of severe online harassment on online platforms contribute to negative experiences and risky behaviors within vulnerable populations? RQ2. How efficient is the current mechanism of screening these risky online negative experiences and behaviors, specifically related to cyberbully, within at-risk populations like adolescent in clinical settings? RQ3. How might evidence of activities and negative harassing experiences on online platforms best be integrated into electronic health records during clinical treatment? I first explore how harassment is presented within different social media platforms from diverse contexts and cultural norms (study 1,2, and 3); next, by analyzing actual patient data, I address current limitations in the screening process in clinical settings that fail to efficiently address core aspect of cyberbullying and their consequences within adolescent patients (study 4 and 5); finally, connecting all my findings, I recommend specific design guidelines for a refined screening tool and structured processes for implementation and integration of the screened data into patients’ electronic health records (EHRs) for better patient assessment and treatment outcomes around cyberbully within adolescent patients (study 6)
Emotion and Stress Recognition Related Sensors and Machine Learning Technologies
This book includes impactful chapters which present scientific concepts, frameworks, architectures and ideas on sensing technologies and machine learning techniques. These are relevant in tackling the following challenges: (i) the field readiness and use of intrusive sensor systems and devices for capturing biosignals, including EEG sensor systems, ECG sensor systems and electrodermal activity sensor systems; (ii) the quality assessment and management of sensor data; (iii) data preprocessing, noise filtering and calibration concepts for biosignals; (iv) the field readiness and use of nonintrusive sensor technologies, including visual sensors, acoustic sensors, vibration sensors and piezoelectric sensors; (v) emotion recognition using mobile phones and smartwatches; (vi) body area sensor networks for emotion and stress studies; (vii) the use of experimental datasets in emotion recognition, including dataset generation principles and concepts, quality insurance and emotion elicitation material and concepts; (viii) machine learning techniques for robust emotion recognition, including graphical models, neural network methods, deep learning methods, statistical learning and multivariate empirical mode decomposition; (ix) subject-independent emotion and stress recognition concepts and systems, including facial expression-based systems, speech-based systems, EEG-based systems, ECG-based systems, electrodermal activity-based systems, multimodal recognition systems and sensor fusion concepts and (x) emotion and stress estimation and forecasting from a nonlinear dynamical system perspective
Singing Voice Separation from Monaural Recordings using Archetypal Analysis
Ο διαχωρισμός τραγουδιστικής φωνής στοχεύει στο να διαχωρίσει το σήμα της τραγουδιστικής φωνής από το σήμα της μουσικής υπόκρουσης έχοντας ως είσοδο μουσικές ηχογραφήσεις. Η εργασία αυτή είναι ένας ακρογωνιαίος λίθος για πλήθος εργασιών που ανήκουν στην κατηγορία ”ανάκτηση μουσικής πληροφορίας” όπως για παράδειγμα αυτόματη
αναγνώριση στίχων, αναγνώριση τραγουδιστή, εξόρυξη μελωδίας και ρεμίξ ήχου. Στη παρούσα διατριβή, διερευνούμε τον Διαχωρισμό τραγουδιστικής φωνής από μονοφωνικές ηχογραφήσεις εκμεταλλευόμενοι μεθόδους μη επιτηρούμενης μηχανικής μάθησης. Το κίνητρο πίσω από τις μεθόδους που χρησιμοποιήθηκαν είναι το γεγονός ότι η μουσική υπόκρουση τοποθετείται σε έναν χαμηλής-τάξης υπόχωρο λόγω του επαναλαμβανόμενου
μοτίβου της, ενώ το πρότυπο της φωνής παρατηρείται ως αραιό μέσα σε ένα μουσικό κομμάτι. Συνεπώς, ανασυνθέτουμε ηχητικά φασματογραφήματα ως υπέρθεση χαμηλής-τάξης και αραιών συνιστωσών, αποτυπώνοντας τα φασματογραφήματα της μουσικής υπόκρουσης και τραγουδιστικής φωνής αντίστοιχα χρησιμοποιώντας τον αλγόριθμο Robust Principal Component Analysis. Επιπλέον, λαμβάνοντας υπόψη τη μη αρνητική φύση του μέτρου του ηχητικού φασματογραφήματος, αναπτύξαμε μία παραλλαγή της Αρχετυπικής Ανάλυσης με περιορισμούς αραιότητας στοχεύοντας να βελτιώσουμε τον διαχωρισμό. Αμφότερες οι μέθοδοι αξιολογήθηκαν στο σύνολο δεδομένων MIR-1K, το οποίο είναι κατασκευασμένο ειδικά για τον διαχωρισμό τραγουδιστικής φωνής. Τα πειραματικά αποτελέσματα δείχνουν πως και οι δύο μέθοδοι εκτελούν τον διαχωρισμό τραγουδιστικής φωνής επιτυχημένα και πετυχαίνουν στην μετρική GNSDR τιμή μεγαλύτερη των 3.0dB.Singing voice separation aims at separating the singing voice signal from the background music signal from music recordings. This task is a cornerstone for numerous MIR (Music Information Retrieval) tasks including automatic lyric recognition, singer identification, melody extraction and audio remixing. In this thesis, we investigate Singing voice separation from monaural recordings by exploiting unsupervised machine learning methods. The motivation behind the employed methods is the fact that music accompaniment lies in a low rank subspace due to its repeating motive and singing voice has a sparse pattern within the song. To this end, we decompose audio spectrograms as a superposition of low-rank components and sparse ones, capturing the spectrograms of background music and singing voice respectively using the Robust Principal Component Analysis algorithm. Furthermore, by considering the non-negative nature of the magnitude of audio spectrograms, we develop a variant of Archetypal Analysis with sparsity constraints aiming to improve the separation. Both methods are evaluated on MIR-1K dataset, which is designed especially for singing voice separation. Experimental evaluation confirms that both methods perform singing voice separation successfully and achieve a value above 3.0dB in GNSDR metric
Proceedings of the 19th Sound and Music Computing Conference
Proceedings of the 19th Sound and Music Computing Conference - June 5-12, 2022 - Saint-Étienne (France).
https://smc22.grame.f
Adaptation of speech recognition systems to selected real-world deployment conditions
Tato habilitační práce se zabývá problematikou adaptace systémů
rozpoznávání řeči na vybrané reálné podmínky nasazení. Je koncipována
jako sborník celkem dvanácti článků, které se touto problematikou
zabývají. Jde o publikace, jejichž jsem hlavním autorem
nebo spoluatorem, a které vznikly v rámci několika navazujících
výzkumných projektů. Na řešení těchto projektů jsem se
podílel jak v roli člena výzkumného týmu, tak i v roli řešitele nebo
spoluřešitele.
Publikace zařazené do tohoto sborníku lze rozdělit podle tématu
do tří hlavních skupin. Jejich společným jmenovatelem je
snaha přizpůsobit daný rozpoznávací systém novým podmínkám či
konkrétnímu faktoru, který významným způsobem ovlivňuje jeho
funkci či přesnost.
První skupina článků se zabývá úlohou neřízené adaptace na
mluvčího, kdy systém přizpůsobuje svoje parametry specifickým
hlasovým charakteristikám dané mluvící osoby. Druhá část práce
se pak věnuje problematice identifikace neřečových událostí na vstupu
do systému a související úloze rozpoznávání řeči s hlukem
(a zejména hudbou) na pozadí. Konečně třetí část práce se zabývá
přístupy, které umožňují přepis audio signálu obsahujícího promluvy
ve více než v jednom jazyce. Jde o metody adaptace existujícího
rozpoznávacího systému na nový jazyk a metody identifikace
jazyka z audio signálu.
Obě zmíněné identifikační úlohy jsou přitom vyšetřovány zejména
v náročném a méně probádaném režimu zpracování po jednotlivých
rámcích vstupního signálu, který je jako jediný vhodný pro on-line
nasazení, např. pro streamovaná data.This habilitation thesis deals with adaptation of automatic speech
recognition (ASR) systems to selected real-world deployment conditions.
It is presented in the form of a collection of twelve articles
dealing with this task; I am the main author or a co-author of these
articles. They were published during my work on several consecutive
research projects. I have participated in the solution of them
as a member of the research team as well as the investigator or a
co-investigator.
These articles can be divided into three main groups according to
their topics. They have in common the effort to adapt a particular
ASR system to a specific factor or deployment condition that affects
its function or accuracy.
The first group of articles is focused on an unsupervised speaker
adaptation task, where the ASR system adapts its parameters to
the specific voice characteristics of one particular speaker. The second
part deals with a) methods allowing the system to identify
non-speech events on the input, and b) the related task of recognition
of speech with non-speech events, particularly music, in the
background. Finally, the third part is devoted to the methods
that allow the transcription of an audio signal containing multilingual
utterances. It includes a) approaches for adapting the existing
recognition system to a new language and b) methods for identification
of the language from the audio signal.
The two mentioned identification tasks are in particular investigated
under the demanding and less explored frame-wise scenario,
which is the only one suitable for processing of on-line data streams
Algoritmos de procesado de señal basados en Non-negative Matrix Factorization aplicados a la separación, detección y clasificación de sibilancias en señales de audio respiratorias monocanal
La auscultación es el primer examen clínico que un médico lleva a cabo para evaluar el estado del sistema respiratorio, debido a que es un método no invasivo, de bajo coste, fácil de realizar y seguro para el paciente. Sin embargo, el diagnóstico que se deriva de la auscultación sigue siendo un diagnóstico subjetivo que se encuentra condicionado a la habilidad, experiencia y entrenamiento de cada médico en la escucha e interpretación de las señales de audio respiratorias. En consecuencia, se producen un alto porcentaje de diagnósticos erróneos que ponen en riesgo la salud de los pacientes e incrementan el coste asociado a los centros de salud. Esta Tesis propone nuevos métodos basados en Non-negative Matrix Factorization aplicados a la separación, detección y clasificación de sonidos sibilantes para proporcionar una vía de información complementaria al médico que ayude a mejorar la fiabilidad del diagnóstico emitido por el especialista. Auscultation is the first clinical examination that a physician performs to evaluate the condition of the respiratory system, because it is a non-invasive, low-cost, easy-to-perform and safe method for the patient. However, the diagnosis derived from auscultation remains a subjective diagnosis that is conditioned by the ability, experience and training of each physician in the listening and interpretation of respiratory audio signals. As a result, a high percentage of misdiagnoses are produced that endanger the health of patients and increase the cost associated with health centres. This Thesis proposes new methods based on Non-negative Matrix Factorization applied to separation, detection and classification of wheezing sounds in order to provide a complementary information pathway to the physician that helps to improve the reliability of the diagnosis made by the doctor.Tesis Univ. Jaén. Departamento INGENIERÍA DE TELECOMUNICACIÓ