64 research outputs found

    Bayesian Lower Bounds for Dense or Sparse (Outlier) Noise in the RMT Framework

    Full text link
    Robust estimation is an important and timely research subject. In this paper, we investigate performance lower bounds on the mean-square-error (MSE) of any estimator for the Bayesian linear model, corrupted by a noise distributed according to an i.i.d. Student's t-distribution. This class of prior parametrized by its degree of freedom is relevant to modelize either dense or sparse (accounting for outliers) noise. Using the hierarchical Normal-Gamma representation of the Student's t-distribution, the Van Trees' Bayesian Cram\'er-Rao bound (BCRB) on the amplitude parameters is derived. Furthermore, the random matrix theory (RMT) framework is assumed, i.e., the number of measurements and the number of unknown parameters grow jointly to infinity with an asymptotic finite ratio. Using some powerful results from the RMT, closed-form expressions of the BCRB are derived and studied. Finally, we propose a framework to fairly compare two models corrupted by noises with different degrees of freedom for a fixed common target signal-to-noise ratio (SNR). In particular, we focus our effort on the comparison of the BCRBs associated with two models corrupted by a sparse noise promoting outliers and a dense (Gaussian) noise, respectively

    Deep Grassmann Manifold Optimization for Computer Vision

    Get PDF
    In this work, we propose methods that advance four areas in the field of computer vision: dimensionality reduction, deep feature embeddings, visual domain adaptation, and deep neural network compression. We combine concepts from the fields of manifold geometry and deep learning to develop cutting edge methods in each of these areas. Each of the methods proposed in this work achieves state-of-the-art results in our experiments. We propose the Proxy Matrix Optimization (PMO) method for optimization over orthogonal matrix manifolds, such as the Grassmann manifold. This optimization technique is designed to be highly flexible enabling it to be leveraged in many situations where traditional manifold optimization methods cannot be used. We first use PMO in the field of dimensionality reduction, where we propose an iterative optimization approach to Principal Component Analysis (PCA) in a framework called Proxy Matrix optimization based PCA (PM-PCA). We also demonstrate how PM-PCA can be used to solve the general LpL_p-PCA problem, a variant of PCA that uses arbitrary fractional norms, which can be more robust to outliers. We then present Cascaded Projection (CaP), a method which uses tensor compression based on PMO, to reduce the number of filters in deep neural networks. This, in turn, reduces the number of computational operations required to process each image with the network. Cascaded Projection is the first end-to-end trainable method for network compression that uses standard backpropagation to learn the optimal tensor compression. In the area of deep feature embeddings, we introduce Deep Euclidean Feature Representations through Adaptation on the Grassmann manifold (DEFRAG), that leverages PMO. The DEFRAG method improves the feature embeddings learned by deep neural networks through the use of auxiliary loss functions and Grassmann manifold optimization. Lastly, in the area of visual domain adaptation, we propose the Manifold-Aligned Label Transfer for Domain Adaptation (MALT-DA) to transfer knowledge from samples in a known domain to an unknown domain based on cross-domain cluster correspondences

    Improvement of sample classification and metabolite profiling in 1H-NMR by a machine learning-based modelling of signal parameters

    Get PDF
    RMN és una plataforma analítica utilitzada per quantificar els metabòlits presents en les mostres de metabolòmica. Els espectres de 1H-RMN mostren múltiples senyals de metabòlits amb tres paràmetres específics (desplaçament químic, ample mitjà de banda, intensitat) que poden mostrar reactivitat a les condicions de la mostra. Aquesta reactivitat perjudica l'optimització del fitat dels espectres necessari per a realitzar el perfilat automàtic de metabòlits de les mostres. L'objectiu d'aquesta tesi va ser l'exploració de l'ús de tècniques de tendència basades en Machine Learning (ML) amb l'ús de fluxos de treball robustos per modelar i explotar la informació present en els diferents paràmetres de senyal durant el perfilat de metabòlits dels conjunts de dades 1H-NMR. En particular, les aplicacions considerades van ser la millora de la classificació de les mostres en els estudis de metabolòmica i la millora de la qualitat del perfilat automàtic. A més d'assolir aquests objectius, també es van obtenir èxits addicionals (per exemple, la generació d'una nova eina de codi obert capaç de resoldre els reptes en l'elaboració de perfils de matrius complexes).RMN es una plataforma analítica utilizada para cuantificar los metabolitos presentes en las muestras de metabolómica. Los espectros de 1H-RMN muestran múltiples señales de metabolitos con tres parámetros específicos (desplazamiento químico, ancho medio de banda, intensidad) que pueden mostrar reactividad a las condiciones de la muestra. Esta reactividad perjudica a la optimización del fitado de los espectros necesario para realizar el perfilado automático de metabolitos de las muestras. El objetivo de esta tesis fue la exploración del uso de técnicas de tendencia basadas en Machine Learning (ML) con el uso de flujos de trabajo robustos para modelar y explotar la información presente en los diferentes parámetros de señal durante el perfilado de metabolitos de los conjuntos de datos 1H-NMR. En particular, las aplicaciones consideradas fueron la mejora de la clasificación de las muestras en los estudios de metabolómica y la mejora de la calidad del perfilado automático. Además de lograr estos objetivos, también se obtuvieron logros adicionales (por ejemplo, la generación de una nueva herramienta de código abierto capaz de resolver los retos en la elaboración de perfiles de matrices complejas).NMR is an analytical platform used to quantify the metabolites present in metabolomics samples. 1H-NMR spectra show multiple metabolite signals, each one with three parameters (chemical shift, half bandwidth, intensity) which can show reactivity to the sample conditions. This reactivity is a challenge for the optimization of the lineshape fitting of spectra necessary to perform the automatic metabolite profiling of samples. The aim of this PhD thesis was the exploration of the use of trending machine learning (ML)-based techniques and of robust ML-based workflows to model and then exploit the information present in the different parameters collected for each signal during the metabolite profiling of 1H-NMR datasets. In particular, the applications considered were the enhanced classification of samples in metabolomics studies and the enhancement of the quality of automatic profiling in 1H-NMR datasets. in addition to the achievement of these goals, additional achievements (e.g., the generation of a new open-source tool able to solve challenges in the profiling of complex matrices) was also fulfilled

    MANIFOLD REPRESENTATIONS OF MUSICAL SIGNALS AND GENERATIVE SPACES

    Get PDF
    Tra i diversi campi di ricerca nell\u2019ambito dell\u2019informatica musicale, la sintesi e la generazione di segnali audio incarna la pluridisciplinalita\u300 di questo settore, nutrendo insieme le pratiche scientifiche e musicale dalla sua creazione. Inerente all\u2019informatica dalla sua creazione, la generazione audio ha ispirato numerosi approcci, evolvendo colle pratiche musicale e gli progressi tecnologici e scientifici. Inoltre, alcuni processi di sintesi permettono anche il processo inverso, denominato analisi, in modo che i parametri di sintesi possono anche essere parzialmente o totalmente estratti dai suoni, dando una rappresentazione alternativa ai segnali analizzati. Per di piu\u300, la recente ascesa dei algoritmi di l\u2019apprendimento automatico ha vivamente interrogato il settore della ricerca scientifica, fornendo potenti data-centered metodi che sollevavano diversi epistemologici interrogativi, nonostante i sui efficacia. Particolarmente, un tipo di metodi di apprendimento automatico, denominati modelli generativi, si concentrano sulla generazione di contenuto originale usando le caratteristiche che hanno estratti dei dati analizzati. In tal caso, questi modelli non hanno soltanto interrogato i precedenti metodi di generazione, ma anche sul modo di integrare questi algoritmi nelle pratiche artistiche. Mentre questi metodi sono progressivamente introdotti nel settore del trattamento delle immagini, la loro applicazione per la sintesi di segnali audio e ancora molto marginale. In questo lavoro, il nostro obiettivo e di proporre un nuovo metodo di audio sintesi basato su questi nuovi tipi di generativi modelli, rafforazti dalle nuove avanzati dell\u2019apprendimento automatico. Al primo posto, facciamo una revisione dei approcci esistenti nei settori dei sistemi generativi e di sintesi sonore, focalizzando sul posto di nostro lavoro rispetto a questi disciplini e che cosa possiamo aspettare di questa collazione. In seguito, studiamo in maniera piu\u300 precisa i modelli generativi, e come possiamo utilizzare questi recenti avanzati per l\u2019apprendimento di complesse distribuzione di suoni, in un modo che sia flessibile e nel flusso creativo del utente. Quindi proponiamo un processo di inferenza / generazione, il quale rifletta i processi di analisi/sintesi che sono molto usati nel settore del trattamento del segnale audio, usando modelli latenti, che sono basati sull\u2019utilizzazione di un spazio continuato di alto livello, che usiamo per controllare la generazione. Studiamo dapprima i risultati preliminari ottenuti con informazione spettrale estratte da diversi tipi di dati, che valutiamo qualitativamente e quantitativamente. Successiva- mente, studiamo come fare per rendere questi metodi piu\u300 adattati ai segnali audio, fronteggiando tre diversi aspetti. Primo, proponiamo due diversi metodi di regolarizzazione di questo generativo spazio che sono specificamente sviluppati per l\u2019audio : una strategia basata sulla traduzione segnali / simboli, e una basata su vincoli percettivi. Poi, proponiamo diversi metodi per fronteggiare il aspetto temporale dei segnali audio, basati sull\u2019estrazione di rappresentazioni multiscala e sulla predizione, che permettono ai generativi spazi ottenuti di anche modellare l\u2019aspetto dinamico di questi segnali. Per finire, cambiamo il nostro approccio scientifico per un punto di visto piu\u301 ispirato dall\u2019idea di ricerca e creazione. Primo, descriviamo l\u2019architettura e il design della nostra libreria open-source, vsacids, sviluppata per permettere a esperti o non-esperti musicisti di provare questi nuovi metodi di sintesi. Poi, proponiamo una prima utilizzazione del nostro modello con la creazione di una performance in real- time, chiamata \ue6go, basata insieme sulla nostra libreria vsacids e sull\u2019uso di une agente di esplorazione, imparando con rinforzo nel corso della composizione. Finalmente, tramo dal lavoro presentato alcuni conclusioni sui diversi modi di migliorare e rinforzare il metodo di sintesi proposto, nonche\u301 eventuale applicazione artistiche.Among the diverse research fields within computer music, synthesis and generation of audio signals epitomize the cross-disciplinarity of this domain, jointly nourishing both scientific and artistic practices since its creation. Inherent in computer music since its genesis, audio generation has inspired numerous approaches, evolving both with musical practices and scientific/technical advances. Moreover, some syn- thesis processes also naturally handle the reverse process, named analysis, such that synthesis parameters can also be partially or totally extracted from actual sounds, and providing an alternative representation of the analyzed audio signals. On top of that, the recent rise of machine learning algorithms earnestly questioned the field of scientific research, bringing powerful data-centred methods that raised several epistemological questions amongst researchers, in spite of their efficiency. Especially, a family of machine learning methods, called generative models, are focused on the generation of original content using features extracted from an existing dataset. In that case, such methods not only questioned previous approaches in generation, but also the way of integrating this methods into existing creative processes. While these new generative frameworks are progressively introduced in the domain of image generation, the application of such generative techniques in audio synthesis is still marginal. In this work, we aim to propose a new audio analysis-synthesis framework based on these modern generative models, enhanced by recent advances in machine learning. We first review existing approaches, both in sound synthesis and in generative machine learning, and focus on how our work inserts itself in both practices and what can be expected from their collation. Subsequently, we focus a little more on generative models, and how modern advances in the domain can be exploited to allow us learning complex sound distributions, while being sufficiently flexible to be integrated in the creative flow of the user. We then propose an inference / generation process, mirroring analysis/synthesis paradigms that are natural in the audio processing domain, using latent models that are based on a continuous higher-level space, that we use to control the generation. We first provide preliminary results of our method applied on spectral information, extracted from several datasets, and evaluate both qualitatively and quantitatively the obtained results. Subsequently, we study how to make these methods more suitable for learning audio data, tackling successively three different aspects. First, we propose two different latent regularization strategies specifically designed for audio, based on and signal / symbol translation and perceptual constraints. Then, we propose different methods to address the inner temporality of musical signals, based on the extraction of multi-scale representations and on prediction, that allow the obtained generative spaces that also model the dynamics of the signal. As a last chapter, we swap our scientific approach to a more research & creation-oriented point of view: first, we describe the architecture and the design of our open-source library, vsacids, aiming to be used by expert and non-expert music makers as an integrated creation tool. Then, we propose an first musical use of our system by the creation of a real-time performance, called aego, based jointly on our framework vsacids and an explorative agent using reinforcement learning to be trained during the performance. Finally, we draw some conclusions on the different manners to improve and reinforce the proposed generation method, as well as possible further creative applications.A\u300 travers les diffe\u301rents domaines de recherche de la musique computationnelle, l\u2019analysie et la ge\u301ne\u301ration de signaux audio sont l\u2019exemple parfait de la trans-disciplinarite\u301 de ce domaine, nourrissant simultane\u301ment les pratiques scientifiques et artistiques depuis leur cre\u301ation. Inte\u301gre\u301e a\u300 la musique computationnelle depuis sa cre\u301ation, la synthe\u300se sonore a inspire\u301 de nombreuses approches musicales et scientifiques, e\u301voluant de pair avec les pratiques musicales et les avance\u301es technologiques et scientifiques de son temps. De plus, certaines me\u301thodes de synthe\u300se sonore permettent aussi le processus inverse, appele\u301 analyse, de sorte que les parame\u300tres de synthe\u300se d\u2019un certain ge\u301ne\u301rateur peuvent e\u302tre en partie ou entie\u300rement obtenus a\u300 partir de sons donne\u301s, pouvant ainsi e\u302tre conside\u301re\u301s comme une repre\u301sentation alternative des signaux analyse\u301s. Paralle\u300lement, l\u2019inte\u301re\u302t croissant souleve\u301 par les algorithmes d\u2019apprentissage automatique a vivement questionne\u301 le monde scientifique, apportant de puissantes me\u301thodes d\u2019analyse de donne\u301es suscitant de nombreux questionnements e\u301piste\u301mologiques chez les chercheurs, en de\u301pit de leur effectivite\u301 pratique. En particulier, une famille de me\u301thodes d\u2019apprentissage automatique, nomme\u301e mode\u300les ge\u301ne\u301ratifs, s\u2019inte\u301ressent a\u300 la ge\u301ne\u301ration de contenus originaux a\u300 partir de caracte\u301ristiques extraites directement des donne\u301es analyse\u301es. Ces me\u301thodes n\u2019interrogent pas seulement les approches pre\u301ce\u301dentes, mais aussi sur l\u2019inte\u301gration de ces nouvelles me\u301thodes dans les processus cre\u301atifs existants. Pourtant, alors que ces nouveaux processus ge\u301ne\u301ratifs sont progressivement inte\u301gre\u301s dans le domaine la ge\u301ne\u301ration d\u2019image, l\u2019application de ces techniques en synthe\u300se audio reste marginale. Dans cette the\u300se, nous proposons une nouvelle me\u301thode d\u2019analyse-synthe\u300se base\u301s sur ces derniers mode\u300les ge\u301ne\u301ratifs, depuis renforce\u301s par les avance\u301es modernes dans le domaine de l\u2019apprentissage automatique. Dans un premier temps, nous examinerons les approches existantes dans le domaine des syste\u300mes ge\u301ne\u301ratifs, sur comment notre travail peut s\u2019inse\u301rer dans les pratiques de synthe\u300se sonore existantes, et que peut-on espe\u301rer de l\u2019hybridation de ces deux approches. Ensuite, nous nous focaliserons plus pre\u301cise\u301ment sur comment les re\u301centes avance\u301es accomplies dans ce domaine dans ce domaine peuvent e\u302tre exploite\u301es pour l\u2019apprentissage de distributions sonores complexes, tout en e\u301tant suffisamment flexibles pour e\u302tre inte\u301gre\u301es dans le processus cre\u301atif de l\u2019utilisateur. Nous proposons donc un processus d\u2019infe\u301rence / g\ue9n\ue9ration, refle\u301tant les paradigmes d\u2019analyse-synthe\u300se existant dans le domaine de ge\u301ne\u301ration audio, base\u301 sur l\u2019usage de mode\u300les latents continus que l\u2019on peut utiliser pour contro\u302ler la ge\u301ne\u301ration. Pour ce faire, nous e\u301tudierons de\u301ja\u300 les re\u301sultats pre\u301liminaires obtenus par cette me\u301thode sur l\u2019apprentissage de distributions spectrales, prises d\u2019ensembles de donne\u301es diversifie\u301s, en adoptant une approche a\u300 la fois quantitative et qualitative. Ensuite, nous proposerons d\u2019ame\u301liorer ces me\u301thodes de manie\u300re spe\u301cifique a\u300 l\u2019audio sur trois aspects distincts. D\u2019abord, nous proposons deux strate\u301gies de re\u301gularisation diffe\u301rentes pour l\u2019analyse de signaux audio : une base\u301e sur la traduction signal/ symbole, ainsi qu\u2019une autre base\u301e sur des contraintes perceptives. Nous passerons par la suite a\u300 la dimension temporelle de ces signaux audio, proposant de nouvelles me\u301thodes base\u301es sur l\u2019extraction de repre\u301sentations temporelles multi-e\u301chelle et sur une ta\u302che supple\u301mentaire de pre\u301diction, permettant la mode\u301lisation de caracte\u301ristiques dynamiques par les espaces ge\u301ne\u301ratifs obtenus. En dernier lieu, nous passerons d\u2019une approche scientifique a\u300 une approche plus oriente\u301e vers un point de vue recherche & cre\u301ation. Premie\u300rement, nous pre\u301senterons notre librairie open-source, vsacids, visant a\u300 e\u302tre employe\u301e par des cre\u301ateurs experts et non-experts comme un outil inte\u301gre\u301. Ensuite, nous proposons une premie\u300re utilisation musicale de notre syste\u300me par la cre\u301ation d\u2019une performance temps re\u301el, nomme\u301e \ue6go, base\u301e a\u300 la fois sur notre librarie et sur un agent d\u2019exploration appris dynamiquement par renforcement au cours de la performance. Enfin, nous tirons les conclusions du travail accompli jusqu\u2019a\u300 maintenant, concernant les possibles ame\u301liorations et de\u301veloppements de la me\u301thode de synthe\u300se propose\u301e, ainsi que sur de possibles applications cre\u301atives

    Information Theory and Machine Learning

    Get PDF
    The recent successes of machine learning, especially regarding systems based on deep neural networks, have encouraged further research activities and raised a new set of challenges in understanding and designing complex machine learning algorithms. New applications require learning algorithms to be distributed, have transferable learning results, use computation resources efficiently, convergence quickly on online settings, have performance guarantees, satisfy fairness or privacy constraints, incorporate domain knowledge on model structures, etc. A new wave of developments in statistical learning theory and information theory has set out to address these challenges. This Special Issue, "Machine Learning and Information Theory", aims to collect recent results in this direction reflecting a diverse spectrum of visions and efforts to extend conventional theories and develop analysis tools for these complex machine learning systems

    Addressing the challenges of crude oil processing utilising chemometric approaches

    Get PDF
    Eng D ThesisThroughout the hydrocarbon supply chain, process optimisation is driven by the desire to maximise profit margins. In the global refining marketplace, the biggest cost is crude oil and to improve margins increasing use of non-conventional crude oils (also called opportunity crudes) lowers the cost of the crude blend. Opportunity crudes are selected based on market forces, for example in North America, the production booms in shale oil and tar sands have provided ample amounts of new low-cost oils which refineries are buying and processing. However, as these oils are new to the marketplace many refineries have never processed them before which brings about challenges. These are mainly a lack of understanding of the quality of the crude oil being processed (shale oils for example can come from many thousands of wells) and how these oils interact with the more conventional refinery feedstocks (such as Brent or West Texas Intermediate). The Eng.D project was carried out in collaboration with Intertek Group plc, a multinational corporate organisation consisting of more than 42,000 employees in over 1,000 locations in over 100 countries across the globe, and was aimed at developing solutions to address crude oil processing problems. The issues covered over the course of the project fall into the areas of: enhancing understanding of crude oil quality, addressing issues of hydrocarbon blend stability because of blending and better utilisation of process data to promote efficiency and facilitate process troubleshooting. As such, the Eng.D project was firstly concerned with developing a robust chemometric model, based on Near Infrared spectra, for use in a major Asian refinery. Once built and tuned this model was ultimately used to predict physical properties (such as density, sulphur content and distillation properties) of every crude oil delivery and also online in the refinery for frequent prediction of crude oil blend properties. The second project was then aimed at solving refinery issues of the deposition of undesirable material (such as wax and asphaltenes) in pipes and process units. The research carried out during the course of the Eng.D project resulted in a patented approach to characterise these issues and provide refineries strategies to mitigate the problems. This approach is not just limited to crude oils but can be applied to any blended hydrocarbon streams and detects the precipitation of undesirable material using Near Infrared spectroscopy and microscopy. This ii approach has now been applied to solving problems of blending crude oils in refineries and offshore, heavy fuel oils, shale oils and marine fuels. Finally, the application of smart data analytics in an upstream installation was investigated. The objective of this application was to provide a customer with process troubleshooting for a historical recurring pump failure issue. To achieve this, the root cause of the issue first needed to be identified and then a solution developed.EPSRC (Grant Number EP/GO 37620/1) and Intertek Group plc
    corecore