434 research outputs found

    Speech Enhancement Using an Iterative Posterior NMF

    Get PDF
    Over the years, miscellaneous methods for speech enhancement have been proposed, such as spectral subtraction (SS) and minimum mean square error (MMSE) estimators. These methods do not require any prior knowledge about the speech and noise signals nor any training stage beforehand, so they are highly flexible and allow implementation in various situations. However, these algorithms usually assume that the noise is stationary and are thus not good at dealing with nonstationary noise types, especially under low signal-to-noise (SNR) conditions. To overcome the drawbacks of the above methods, nonnegative matrix factorization (NMF) is introduced. NMF approach is more robust to nonstationary noise. In this chapter, we are actually interested in the application of speech enhancement using NMF approach. A speech enhancement method based on regularized nonnegative matrix factorization (NMF) for nonstationary Gaussian noise is proposed. The spectral components of speech and noise are modeled as Gamma and Rayleigh, respectively. We propose to adaptively estimate the sufficient statistics of these distributions to obtain a natural regularization of the NMF criterion

    MANIFOLD REPRESENTATIONS OF MUSICAL SIGNALS AND GENERATIVE SPACES

    Get PDF
    Tra i diversi campi di ricerca nell\u2019ambito dell\u2019informatica musicale, la sintesi e la generazione di segnali audio incarna la pluridisciplinalita\u300 di questo settore, nutrendo insieme le pratiche scientifiche e musicale dalla sua creazione. Inerente all\u2019informatica dalla sua creazione, la generazione audio ha ispirato numerosi approcci, evolvendo colle pratiche musicale e gli progressi tecnologici e scientifici. Inoltre, alcuni processi di sintesi permettono anche il processo inverso, denominato analisi, in modo che i parametri di sintesi possono anche essere parzialmente o totalmente estratti dai suoni, dando una rappresentazione alternativa ai segnali analizzati. Per di piu\u300, la recente ascesa dei algoritmi di l\u2019apprendimento automatico ha vivamente interrogato il settore della ricerca scientifica, fornendo potenti data-centered metodi che sollevavano diversi epistemologici interrogativi, nonostante i sui efficacia. Particolarmente, un tipo di metodi di apprendimento automatico, denominati modelli generativi, si concentrano sulla generazione di contenuto originale usando le caratteristiche che hanno estratti dei dati analizzati. In tal caso, questi modelli non hanno soltanto interrogato i precedenti metodi di generazione, ma anche sul modo di integrare questi algoritmi nelle pratiche artistiche. Mentre questi metodi sono progressivamente introdotti nel settore del trattamento delle immagini, la loro applicazione per la sintesi di segnali audio e ancora molto marginale. In questo lavoro, il nostro obiettivo e di proporre un nuovo metodo di audio sintesi basato su questi nuovi tipi di generativi modelli, rafforazti dalle nuove avanzati dell\u2019apprendimento automatico. Al primo posto, facciamo una revisione dei approcci esistenti nei settori dei sistemi generativi e di sintesi sonore, focalizzando sul posto di nostro lavoro rispetto a questi disciplini e che cosa possiamo aspettare di questa collazione. In seguito, studiamo in maniera piu\u300 precisa i modelli generativi, e come possiamo utilizzare questi recenti avanzati per l\u2019apprendimento di complesse distribuzione di suoni, in un modo che sia flessibile e nel flusso creativo del utente. Quindi proponiamo un processo di inferenza / generazione, il quale rifletta i processi di analisi/sintesi che sono molto usati nel settore del trattamento del segnale audio, usando modelli latenti, che sono basati sull\u2019utilizzazione di un spazio continuato di alto livello, che usiamo per controllare la generazione. Studiamo dapprima i risultati preliminari ottenuti con informazione spettrale estratte da diversi tipi di dati, che valutiamo qualitativamente e quantitativamente. Successiva- mente, studiamo come fare per rendere questi metodi piu\u300 adattati ai segnali audio, fronteggiando tre diversi aspetti. Primo, proponiamo due diversi metodi di regolarizzazione di questo generativo spazio che sono specificamente sviluppati per l\u2019audio : una strategia basata sulla traduzione segnali / simboli, e una basata su vincoli percettivi. Poi, proponiamo diversi metodi per fronteggiare il aspetto temporale dei segnali audio, basati sull\u2019estrazione di rappresentazioni multiscala e sulla predizione, che permettono ai generativi spazi ottenuti di anche modellare l\u2019aspetto dinamico di questi segnali. Per finire, cambiamo il nostro approccio scientifico per un punto di visto piu\u301 ispirato dall\u2019idea di ricerca e creazione. Primo, descriviamo l\u2019architettura e il design della nostra libreria open-source, vsacids, sviluppata per permettere a esperti o non-esperti musicisti di provare questi nuovi metodi di sintesi. Poi, proponiamo una prima utilizzazione del nostro modello con la creazione di una performance in real- time, chiamata \ue6go, basata insieme sulla nostra libreria vsacids e sull\u2019uso di une agente di esplorazione, imparando con rinforzo nel corso della composizione. Finalmente, tramo dal lavoro presentato alcuni conclusioni sui diversi modi di migliorare e rinforzare il metodo di sintesi proposto, nonche\u301 eventuale applicazione artistiche.Among the diverse research fields within computer music, synthesis and generation of audio signals epitomize the cross-disciplinarity of this domain, jointly nourishing both scientific and artistic practices since its creation. Inherent in computer music since its genesis, audio generation has inspired numerous approaches, evolving both with musical practices and scientific/technical advances. Moreover, some syn- thesis processes also naturally handle the reverse process, named analysis, such that synthesis parameters can also be partially or totally extracted from actual sounds, and providing an alternative representation of the analyzed audio signals. On top of that, the recent rise of machine learning algorithms earnestly questioned the field of scientific research, bringing powerful data-centred methods that raised several epistemological questions amongst researchers, in spite of their efficiency. Especially, a family of machine learning methods, called generative models, are focused on the generation of original content using features extracted from an existing dataset. In that case, such methods not only questioned previous approaches in generation, but also the way of integrating this methods into existing creative processes. While these new generative frameworks are progressively introduced in the domain of image generation, the application of such generative techniques in audio synthesis is still marginal. In this work, we aim to propose a new audio analysis-synthesis framework based on these modern generative models, enhanced by recent advances in machine learning. We first review existing approaches, both in sound synthesis and in generative machine learning, and focus on how our work inserts itself in both practices and what can be expected from their collation. Subsequently, we focus a little more on generative models, and how modern advances in the domain can be exploited to allow us learning complex sound distributions, while being sufficiently flexible to be integrated in the creative flow of the user. We then propose an inference / generation process, mirroring analysis/synthesis paradigms that are natural in the audio processing domain, using latent models that are based on a continuous higher-level space, that we use to control the generation. We first provide preliminary results of our method applied on spectral information, extracted from several datasets, and evaluate both qualitatively and quantitatively the obtained results. Subsequently, we study how to make these methods more suitable for learning audio data, tackling successively three different aspects. First, we propose two different latent regularization strategies specifically designed for audio, based on and signal / symbol translation and perceptual constraints. Then, we propose different methods to address the inner temporality of musical signals, based on the extraction of multi-scale representations and on prediction, that allow the obtained generative spaces that also model the dynamics of the signal. As a last chapter, we swap our scientific approach to a more research & creation-oriented point of view: first, we describe the architecture and the design of our open-source library, vsacids, aiming to be used by expert and non-expert music makers as an integrated creation tool. Then, we propose an first musical use of our system by the creation of a real-time performance, called aego, based jointly on our framework vsacids and an explorative agent using reinforcement learning to be trained during the performance. Finally, we draw some conclusions on the different manners to improve and reinforce the proposed generation method, as well as possible further creative applications.A\u300 travers les diffe\u301rents domaines de recherche de la musique computationnelle, l\u2019analysie et la ge\u301ne\u301ration de signaux audio sont l\u2019exemple parfait de la trans-disciplinarite\u301 de ce domaine, nourrissant simultane\u301ment les pratiques scientifiques et artistiques depuis leur cre\u301ation. Inte\u301gre\u301e a\u300 la musique computationnelle depuis sa cre\u301ation, la synthe\u300se sonore a inspire\u301 de nombreuses approches musicales et scientifiques, e\u301voluant de pair avec les pratiques musicales et les avance\u301es technologiques et scientifiques de son temps. De plus, certaines me\u301thodes de synthe\u300se sonore permettent aussi le processus inverse, appele\u301 analyse, de sorte que les parame\u300tres de synthe\u300se d\u2019un certain ge\u301ne\u301rateur peuvent e\u302tre en partie ou entie\u300rement obtenus a\u300 partir de sons donne\u301s, pouvant ainsi e\u302tre conside\u301re\u301s comme une repre\u301sentation alternative des signaux analyse\u301s. Paralle\u300lement, l\u2019inte\u301re\u302t croissant souleve\u301 par les algorithmes d\u2019apprentissage automatique a vivement questionne\u301 le monde scientifique, apportant de puissantes me\u301thodes d\u2019analyse de donne\u301es suscitant de nombreux questionnements e\u301piste\u301mologiques chez les chercheurs, en de\u301pit de leur effectivite\u301 pratique. En particulier, une famille de me\u301thodes d\u2019apprentissage automatique, nomme\u301e mode\u300les ge\u301ne\u301ratifs, s\u2019inte\u301ressent a\u300 la ge\u301ne\u301ration de contenus originaux a\u300 partir de caracte\u301ristiques extraites directement des donne\u301es analyse\u301es. Ces me\u301thodes n\u2019interrogent pas seulement les approches pre\u301ce\u301dentes, mais aussi sur l\u2019inte\u301gration de ces nouvelles me\u301thodes dans les processus cre\u301atifs existants. Pourtant, alors que ces nouveaux processus ge\u301ne\u301ratifs sont progressivement inte\u301gre\u301s dans le domaine la ge\u301ne\u301ration d\u2019image, l\u2019application de ces techniques en synthe\u300se audio reste marginale. Dans cette the\u300se, nous proposons une nouvelle me\u301thode d\u2019analyse-synthe\u300se base\u301s sur ces derniers mode\u300les ge\u301ne\u301ratifs, depuis renforce\u301s par les avance\u301es modernes dans le domaine de l\u2019apprentissage automatique. Dans un premier temps, nous examinerons les approches existantes dans le domaine des syste\u300mes ge\u301ne\u301ratifs, sur comment notre travail peut s\u2019inse\u301rer dans les pratiques de synthe\u300se sonore existantes, et que peut-on espe\u301rer de l\u2019hybridation de ces deux approches. Ensuite, nous nous focaliserons plus pre\u301cise\u301ment sur comment les re\u301centes avance\u301es accomplies dans ce domaine dans ce domaine peuvent e\u302tre exploite\u301es pour l\u2019apprentissage de distributions sonores complexes, tout en e\u301tant suffisamment flexibles pour e\u302tre inte\u301gre\u301es dans le processus cre\u301atif de l\u2019utilisateur. Nous proposons donc un processus d\u2019infe\u301rence / g\ue9n\ue9ration, refle\u301tant les paradigmes d\u2019analyse-synthe\u300se existant dans le domaine de ge\u301ne\u301ration audio, base\u301 sur l\u2019usage de mode\u300les latents continus que l\u2019on peut utiliser pour contro\u302ler la ge\u301ne\u301ration. Pour ce faire, nous e\u301tudierons de\u301ja\u300 les re\u301sultats pre\u301liminaires obtenus par cette me\u301thode sur l\u2019apprentissage de distributions spectrales, prises d\u2019ensembles de donne\u301es diversifie\u301s, en adoptant une approche a\u300 la fois quantitative et qualitative. Ensuite, nous proposerons d\u2019ame\u301liorer ces me\u301thodes de manie\u300re spe\u301cifique a\u300 l\u2019audio sur trois aspects distincts. D\u2019abord, nous proposons deux strate\u301gies de re\u301gularisation diffe\u301rentes pour l\u2019analyse de signaux audio : une base\u301e sur la traduction signal/ symbole, ainsi qu\u2019une autre base\u301e sur des contraintes perceptives. Nous passerons par la suite a\u300 la dimension temporelle de ces signaux audio, proposant de nouvelles me\u301thodes base\u301es sur l\u2019extraction de repre\u301sentations temporelles multi-e\u301chelle et sur une ta\u302che supple\u301mentaire de pre\u301diction, permettant la mode\u301lisation de caracte\u301ristiques dynamiques par les espaces ge\u301ne\u301ratifs obtenus. En dernier lieu, nous passerons d\u2019une approche scientifique a\u300 une approche plus oriente\u301e vers un point de vue recherche & cre\u301ation. Premie\u300rement, nous pre\u301senterons notre librairie open-source, vsacids, visant a\u300 e\u302tre employe\u301e par des cre\u301ateurs experts et non-experts comme un outil inte\u301gre\u301. Ensuite, nous proposons une premie\u300re utilisation musicale de notre syste\u300me par la cre\u301ation d\u2019une performance temps re\u301el, nomme\u301e \ue6go, base\u301e a\u300 la fois sur notre librarie et sur un agent d\u2019exploration appris dynamiquement par renforcement au cours de la performance. Enfin, nous tirons les conclusions du travail accompli jusqu\u2019a\u300 maintenant, concernant les possibles ame\u301liorations et de\u301veloppements de la me\u301thode de synthe\u300se propose\u301e, ainsi que sur de possibles applications cre\u301atives

    Nonparametric Bayesian Dereverberation of Power Spectrograms Based on Infinite-Order Autoregressive Processes

    Get PDF
    This paper describes a monaural audio dereverberation method that operates in the power spectrogram domain. The method is robust to different kinds of source signals such as speech or music. Moreover, it requires little manual intervention, including the complexity of room acoustics. The method is based on a non-conjugate Bayesian model of the power spectrogram. It extends the idea of multi-channel linear prediction to the power spectrogram domain, and formulates a model of reverberation as a non-negative, infinite-order autoregressive process. To this end, the power spectrogram is interpreted as a histogram count data, which allows a nonparametric Bayesian model to be used as the prior for the autoregressive process, allowing the effective number of active components to grow, without bound, with the complexity of data. In order to determine the marginal posterior distribution, a convergent algorithm, inspired by the variational Bayes method, is formulated. It employs the minorization-maximization technique to arrive at an iterative, convergent algorithm that approximates the marginal posterior distribution. Both objective and subjective evaluations show advantage over other methods based on the power spectrum. We also apply the method to a music information retrieval task and demonstrate its effectiveness

    Automatic transcription of polyphonic music exploiting temporal evolution

    Get PDF
    PhDAutomatic music transcription is the process of converting an audio recording into a symbolic representation using musical notation. It has numerous applications in music information retrieval, computational musicology, and the creation of interactive systems. Even for expert musicians, transcribing polyphonic pieces of music is not a trivial task, and while the problem of automatic pitch estimation for monophonic signals is considered to be solved, the creation of an automated system able to transcribe polyphonic music without setting restrictions on the degree of polyphony and the instrument type still remains open. In this thesis, research on automatic transcription is performed by explicitly incorporating information on the temporal evolution of sounds. First efforts address the problem by focusing on signal processing techniques and by proposing audio features utilising temporal characteristics. Techniques for note onset and offset detection are also utilised for improving transcription performance. Subsequent approaches propose transcription models based on shift-invariant probabilistic latent component analysis (SI-PLCA), modeling the temporal evolution of notes in a multiple-instrument case and supporting frequency modulations in produced notes. Datasets and annotations for transcription research have also been created during this work. Proposed systems have been privately as well as publicly evaluated within the Music Information Retrieval Evaluation eXchange (MIREX) framework. Proposed systems have been shown to outperform several state-of-the-art transcription approaches. Developed techniques have also been employed for other tasks related to music technology, such as for key modulation detection, temperament estimation, and automatic piano tutoring. Finally, proposed music transcription models have also been utilized in a wider context, namely for modeling acoustic scenes

    Audio part mixture alignment based on hierarchical nonparametric Bayesian model of musical audio sequence collection

    Full text link

    Learning Identifiable Representations: Independent Influences and Multiple Views

    Get PDF
    Intelligent systems, whether biological or artificial, perceive unstructured information from the world around them: deep neural networks designed for object recognition receive collections of pixels as inputs; living beings capture visual stimuli through photoreceptors that convert incoming light into electrical signals. Sophisticated signal processing is required to extract meaningful features (e.g., the position, dimension, and colour of objects in an image) from these inputs: this motivates the field of representation learning. But what features should be deemed meaningful, and how to learn them? We will approach these questions based on two metaphors. The first one is the cocktail-party problem, where a number of conversations happen in parallel in a room, and the task is to recover (or separate) the voices of the individual speakers from recorded mixtures—also termed blind source separation. The second one is what we call the independent-listeners problem: given two listeners in front of some loudspeakers, the question is whether, when processing what they hear, they will make the same information explicit, identifying similar constitutive elements. The notion of identifiability is crucial when studying these problems, as it specifies suitable technical assumptions under which representations are uniquely determined, up to tolerable ambiguities like latent source reordering. A key result of this theory is that, when the mixing is nonlinear, the model is provably non-identifiable. A first question is, therefore, under what additional assumptions (ideally as mild as possible) the problem becomes identifiable; a second one is, what algorithms can be used to estimate the model. The contributions presented in this thesis address these questions and revolve around two main principles. The first principle is to learn representation where the latent components influence the observations independently. Here the term “independently” is used in a non-statistical sense—which can be loosely thought of as absence of fine-tuning between distinct elements of a generative process. The second principle is that representations can be learned from paired observations or views, where mixtures of the same latent variables are observed, and they (or a subset thereof) are perturbed in one of the views—also termed multi-view setting. I will present work characterizing these two problem settings, studying their identifiability and proposing suitable estimation algorithms. Moreover, I will discuss how the success of popular representation learning methods may be explained in terms of the principles above and describe an application of the second principle to the statistical analysis of group studies in neuroimaging

    How can humans leverage machine learning? From Medical Data Wrangling to Learning to Defer to Multiple Experts

    Get PDF
    Mención Internacional en el título de doctorThe irruption of the smartphone into everyone’s life and the ease with which we digitise or record any data supposed an explosion of quantities of data. Smartphones, equipped with advanced cameras and sensors, have empowered individuals to capture moments and contribute to the growing pool of data. This data-rich landscape holds great promise for research, decision-making, and personalized applications. By carefully analyzing and interpreting this wealth of information, valuable insights, patterns, and trends can be uncovered. However, big data is worthless in a vacuum. Its potential value is unlocked only when leveraged to drive decision-making. In recent times we have been participants of the outburst of artificial intelligence: the development of computer systems and algorithms capable of perceiving, reasoning, learning, and problem-solving, emulating certain aspects of human cognitive abilities. Nevertheless, our focus tends to be limited, merely skimming the surface of the problem, while the reality is that the application of machine learning models to data introduces is usually fraught. More specifically, there are two crucial pitfalls frequently neglected in the field of machine learning: the quality of the data and the erroneous assumption that machine learning models operate autonomously. These two issues have established the foundation for the motivation driving this thesis, which strives to offer solutions to two major associated challenges: 1) dealing with irregular observations and 2) learning when and who should we trust. The first challenge originates from our observation that the majority of machine learning research primarily concentrates on handling regular observations, neglecting a crucial technological obstacle encountered in practical big-data scenarios: the aggregation and curation of heterogeneous streams of information. Before applying machine learning algorithms, it is crucial to establish robust techniques for handling big data, as this specific aspect presents a notable bottleneck in the creation of robust algorithms. Data wrangling, which encompasses the extraction, integration, and cleaning processes necessary for data analysis, plays a crucial role in this regard. Therefore, the first objective of this thesis is to tackle the frequently disregarded challenge of addressing irregularities within the context of medical data. We will focus on three specific aspects. Firstly, we will tackle the issue of missing data by developing a framework that facilitates the imputation of missing data points using relevant information derived from alternative data sources or past observations. Secondly, we will move beyond the assumption of homogeneous observations, where only one statistical data type (such as Gaussian) is considered, and instead, work with heterogeneous observations. This means that different data sources can be represented by various statistical likelihoods, such as Gaussian, Bernoulli, categorical, etc. Lastly, considering the temporal enrichment of todays collected data and our focus on medical data, we will develop a novel algorithm capable of capturing and propagating correlations among different data streams over time. All these three problems are addressed in our first contribution which involves the development of a novel method based on Deep Generative Models (DGM) using Variational Autoencoders (VAE). The proposed model, the Sequential Heterogeneous Incomplete VAE (Shi- VAE), enables the aggregation of multiple heterogeneous data streams in a modular manner, taking into consideration the presence of potential missing data. To demonstrate the feasibility of our approach, we present proof-of-concept results obtained from a real database generated through continuous passive monitoring of psychiatric patients. Our second challenge relates to the misbelief that machine learning algorithms can perform independently. However, this notion that AI systems can solely account for automated decisionmaking, especially in critical domains such as healthcare, is far from reality. Our focus now shifts towards a specific scenario where the algorithm has the ability to make predictions independently or alternatively defer the responsibility to a human expert. The purpose of including the human is not to obtain jsut better performance, but also more reliable and trustworthy predictions we can rely on. In reality, however, important decisions are not made by one person but are usually committed by an ensemble of human experts. With this in mind, two important questions arise: 1) When should the human or the machine bear responsibility and 2) among the experts, who should we trust? To answer the first question, we will employ a recent theory known as Learning to defer (L2D). In L2D we are not only interested in abstaining from prediction but also in understanding the humans confidence for making such prediction. thus deferring only when the human is more likely to be correct. The second question about who to defer among a pool of experts has not been yet answered in the L2D literature, and this is what our contributions aim to provide. First, we extend the two yet proposed consistent surrogate losses in the L2D literature to the multiple-expert setting. Second, we study the frameworks ability to estimate the probability that a given expert correctly predicts and assess whether the two surrogate losses are confidence calibrated. Finally, we propose a conformal inference technique that chooses a subset of experts to query when the system defers. Ensembling experts based on confidence levels is vital to optimize human-machine collaboration. In conclusion, this doctoral thesis has investigated two cases where humans can leverage the power of machine learning: first, as a tool to assist in data wrangling and data understanding problems and second, as a collaborative tool where decision-making can be automated by the machine or delegated to human experts, fostering more transparent and trustworthy solutions.La irrupción de los smartphones en la vida de todos y la facilidad con la que digitalizamos o registramos cualquier situación ha supuesto una explosión en la cantidad de datos. Los teléfonos, equipados con cámaras y sensores avanzados, han contribuido a que las personas puedann capturar más momentos, favoreciendo así el creciente conjunto de datos. Este panorama repleto de datos aporta un gran potencial de cara a la investigación, la toma de decisiones y las aplicaciones personalizadas. Mediante el análisis minucioso y una cuidada interpretación de esta abundante información, podemos descubrir valiosos patrones, tendencias y conclusiones Sin embargo, este gran volumen de datos no tiene valor por si solo. Su potencial se desbloquea solo cuando se aprovecha para impulsar la toma de decisiones. En tiempos recientes, hemos sido testigos del auge de la inteligencia artificial: el desarrollo de sistemas informáticos y algoritmos capaces de percibir, razonar, aprender y resolver problemas, emulando ciertos aspectos de las capacidades cognitivas humanas. No obstante, solemos centrarnos solo en la superficie del problema mientras que la realidad es que la aplicación de modelos de aprendizaje automático a los datos presenta desafíos significativos. Concretamente, se suelen pasar por alto dos problemas cruciales en el campo del aprendizaje automático: la calidad de los datos y la suposición errónea de que los modelos de aprendizaje automático pueden funcionar de manera autónoma. Estos dos problemas han sido el fundamento de la motivación que impulsa esta tesis, que se esfuerza en ofrecer soluciones a dos desafíos importantes asociados: 1) lidiar con datos irregulares y 2) aprender cuándo y en quién debemos confiar. El primer desafío surge de nuestra observación de que la mayoría de las investigaciones en aprendizaje automático se centran principalmente en manejar datos regulares, descuidando un obstáculo tecnológico crucial que se encuentra en escenarios prácticos con gran cantidad de datos: la agregación y el curado de secuencias heterogéneas. Antes de aplicar algoritmos de aprendizaje automático, es crucial establecer técnicas robustas para manejar estos datos, ya que est problemática representa un cuello de botella claro en la creación de algoritmos robustos. El procesamiento de datos (en concreto, nos centraremos en el término inglés data wrangling), que abarca los procesos de extracción, integración y limpieza necesarios para el análisis de datos, desempeña un papel crucial en este sentido. Por lo tanto, el primer objetivo de esta tesis es abordar el desafío normalmente paso por alto de tratar datos irregulare. Específicamente, bajo el contexto de datos médicos. Nos centraremos en tres aspectos principales. En primer lugar, abordaremos el problema de los datos perdidos mediante el desarrollo de un marco que facilite la imputación de estos datos perdidos utilizando información relevante obtenida de fuentes de datos de diferente naturalaeza u observaciones pasadas. En segundo lugar, iremos más allá de la suposición de lidiar con observaciones homogéneas, donde solo se considera un tipo de dato estadístico (como Gaussianos) y, en su lugar, trabajaremos con observaciones heterogéneas. Esto significa que diferentes fuentes de datos pueden estar representadas por diversas distribuciones de probabilidad, como Gaussianas, Bernoulli, categóricas, etc. Por último, teniendo en cuenta el enriquecimiento temporal de los datos hoy en día y nuestro enfoque directo sobre los datos médicos, propondremos un algoritmo innovador capaz de capturar y propagar la correlación entre diferentes flujos de datos a lo largo del tiempo. Todos estos tres problemas se abordan en nuestra primera contribución, que implica el desarrollo de un método basado en Modelos Generativos Profundos (Deep Genarative Model en inglés) utilizando Autoencoders Variacionales (Variational Autoencoders en ingés). El modelo propuesto, Sequential Heterogeneous Incomplete VAE (Shi-VAE), permite la agregación de múltiples flujos de datos heterogéneos de manera modular, teniendo en cuenta la posible presencia de datos perdidos. Para demostrar la viabilidad de nuestro enfoque, presentamos resultados de prueba de concepto obtenidos de una base de datos real generada a través del monitoreo continuo pasivo de pacientes psiquiátricos. Nuestro segundo desafío está relacionado con la creencia errónea de que los algoritmos de aprendizaje automático pueden funcionar de manera independiente. Sin embargo, esta idea de que los sistemas de inteligencia artificial pueden ser los únicos responsables en la toma de decisione, especialmente en dominios críticos como la atención médica, está lejos de la realidad. Ahora, nuestro enfoque se centra en un escenario específico donde el algoritmo tiene la capacidad de realizar predicciones de manera independiente o, alternativamente, delegar la responsabilidad en un experto humano. La inclusión del ser humano no solo tiene como objetivo obtener un mejor rendimiento, sino también obtener predicciones más transparentes y seguras en las que podamos confiar. En la realidad, sin embargo, las decisiones importantes no las toma una sola persona, sino que generalmente son el resultado de la colaboración de un conjunto de expertos. Con esto en mente, surgen dos preguntas importantes: 1) ¿Cuándo debe asumir la responsabilidad el ser humano o cuándo la máquina? y 2) de entre los expertos, ¿en quién debemos confiar? Para responder a la primera pregunta, emplearemos una nueva teoría llamada Learning to defer (L2D). En L2D, no solo estamos interesados en abstenernos de hacer predicciones, sino también en comprender cómo de seguro estará el experto para hacer dichas predicciones, diferiendo solo cuando el humano sea más probable en predecir correcatmente. La segunda pregunta sobre a quién deferir entre un conjunto de expertos aún no ha sido respondida en la literatura de L2D, y esto es precisamente lo que nuestras contribuciones pretenden proporcionar. En primer lugar, extendemos las dos primeras surrogate losses consistentes propuestas hasta ahora en la literatura de L2D al contexto de múltiples expertos. En segundo lugar, estudiamos la capacidad de estos modelos para estimar la probabilidad de que un experto dado haga predicciones correctas y evaluamos si estas surrogate losses están calibradas en términos de confianza. Finalmente, proponemos una técnica de conformal inference que elige un subconjunto de expertos para consultar cuando el sistema decide diferir. Esta combinación de expertos basada en los respectivos niveles de confianza es fundamental para optimizar la colaboración entre humanos y máquinas En conclusión, esta tesis doctoral ha investigado dos casos en los que los humanos pueden aprovechar el poder del aprendizaje automático: primero, como herramienta para ayudar en problemas de procesamiento y comprensión de datos y, segundo, como herramienta colaborativa en la que la toma de decisiones puede ser automatizada para ser realizada por la máquina o delegada a expertos humanos, fomentando soluciones más transparentes y seguras.Programa de Doctorado en Multimedia y Comunicaciones por la Universidad Carlos III de Madrid y la Universidad Rey Juan CarlosPresidente: Joaquín Míguez Arenas.- Secretario: Juan José Murillo Fuentes.- Vocal: Mélanie Natividad Fernández Pradie
    corecore