22 research outputs found

    Dynamic Machine Learning with Least Square Objectives

    Get PDF
    As of the writing of this thesis, machine learning has become one of the most active research fields. The interest comes from a variety of disciplines which include computer science, statistics, engineering, and medicine. The main idea behind learning from data is that, when an analytical model explaining the observations is hard to find ---often in contrast to the models in physics such as Newton's laws--- a statistical approach can be taken where one or more candidate models are tuned using data. Since the early 2000's this challenge has grown in two ways: (i) The amount of collected data has seen a massive growth due to the proliferation of digital media, and (ii) the data has become more complex. One example for the latter is the high dimensional datasets, which can for example correspond to dyadic interactions between two large groups (such as customer and product information a retailer collects), or to high resolution image/video recordings. Another important issue is the study of dynamic data, which exhibits dependence on time. Virtually all datasets fall into this category as all data collection is performed over time, however I use the term dynamic to hint at a system with an explicit temporal dependence. A traditional example is target tracking from signal processing literature. Here the position of a target is modeled using Newton's laws of motion, which relates it to time via the target's velocity and acceleration. Dynamic data, as I defined above, poses two important challenges. Firstly, the learning setup is different from the standard theoretical learning setup, also known as Probably Approximately Correct (PAC) learning. To derive PAC learning bounds one assumes a collection of data points sampled independently and identically from a distribution which generates the data. On the other hand, dynamic systems produce correlated outputs. The learning systems we use should accordingly take this difference into consideration. Secondly, as the system is dynamic, it might be necessary to perform the learning online. In this case the learning has to be done in a single pass. Typical applications include target tracking and electricity usage forecasting. In this thesis I investigate several important dynamic and online learning problems, where I develop novel tools to address the shortcomings of the previous solutions in the literature. The work is divided into three parts for convenience. The first part is about matrix factorization for time series analysis which is further divided into two chapters. In the first chapter, matrix factorization is used within a Bayesian framework to model time-varying dyadic interactions, with examples in predicting user-movie ratings and stock prices. In the next chapter, a matrix factorization which uses autoregressive models to forecast future values of multivariate time series is proposed, with applications in predicting electricity usage and traffic conditions. Inspired by the machinery we use in the first part, the second part is about nonlinear Kalman filtering, where a hidden state is estimated over time given observations. The nonlinearity of the system generating the observations is the main challenge here, where a divergence minimization approach is used to unify the seemingly unrelated methods in the literature, and propose new ones. This has applications in target tracking and options pricing. The third and last part is about cost sensitive learning, where a novel method for maximizing area under receiver operating characteristics curve is proposed. Our method has theoretical guarantees and favorable sample complexity. The method is tested on a variety of benchmark datasets, and also has applications in online advertising

    De l'apprentissage faiblement supervisé au catalogage en ligne

    Get PDF
    Applied mathematics and machine computations have raised a lot of hope since the recent success of supervised learning. Many practitioners in industries have been trying to switch from their old paradigms to machine learning. Interestingly, those data scientists spend more time scrapping, annotating and cleaning data than fine-tuning models. This thesis is motivated by the following question: can we derive a more generic framework than the one of supervised learning in order to learn from clutter data? This question is approached through the lens of weakly supervised learning, assuming that the bottleneck of data collection lies in annotation. We model weak supervision as giving, rather than a unique target, a set of target candidates. We argue that one should look for an “optimistic” function that matches most of the observations. This allows us to derive a principle to disambiguate partial labels. We also discuss the advantage to incorporate unsupervised learning techniques into our framework, in particular manifold regularization approached through diffusion techniques, for which we derived a new algorithm that scales better with input dimension then the baseline method. Finally, we switch from passive to active weakly supervised learning, introducing the “active labeling” framework, in which a practitioner can query weak information about chosen data. Among others, we leverage the fact that one does not need full information to access stochastic gradients and perform stochastic gradient descent.Les mathématiques appliquées et le calcul nourrissent beaucoup d’espoirs à la suite des succès récents de l’apprentissage supervisé. Dans l’industrie, beaucoup d’ingénieurs cherchent à remplacer leurs anciens paradigmes de pensée par l’apprentissage machine. Étonnamment, ces ingénieurs passent plus de temps à collecter, annoter et nettoyer des données qu’à raffiner des modèles. Ce phénomène motive la problématique de cette thèse: peut-on définir un cadre théorique plus général que l’apprentissage supervisé pour apprendre grâce à des données hétérogènes? Cette question est abordée via le concept de supervision faible, faisant l’hypothèse que le problème que posent les données est leur annotation. On modélise la supervision faible comme l’accès, pour une entrée donnée, non pas d’une sortie claire, mais d’un ensemble de sorties potentielles. On plaide pour l’adoption d’une perspective « optimiste » et l’apprentissage d’une fonction qui vérifie la plupart des observations. Cette perspective nous permet de définir un principe pour lever l’ambiguïté des informations faibles. On discute également de l’importance d’incorporer des techniques sans supervision d’appréhension des données d’entrée dans notre théorie, en particulier de compréhension de la variété sous-jacente via des techniques de diffusion, pour lesquelles on propose un algorithme réaliste afin d’éviter le fléau de la dimension, à l’inverse de ce qui existait jusqu’alors. Enfin, nous nous attaquons à la question de collecte active d’informations faibles, définissant le problème de « catalogage en ligne », où un intendant doit acquérir une maximum d’informations fiables sur ses données sous une contrainte de budget. Entre autres, nous tirons parti du fait que pour obtenir un gradient stochastique et effectuer une descente de gradient, il n’y a pas besoin de supervision totale

    Structure-Preserving Model Reduction of Physical Network Systems

    Get PDF
    This paper considers physical network systems where the energy storage is naturally associated to the nodes of the graph, while the edges of the graph correspond to static couplings. The first sections deal with the linear case, covering examples such as mass-damper and hydraulic systems, which have a structure that is similar to symmetric consensus dynamics. The last section is concerned with a specific class of nonlinear physical network systems; namely detailed-balanced chemical reaction networks governed by mass action kinetics. In both cases, linear and nonlinear, the structure of the dynamics is similar, and is based on a weighted Laplacian matrix, together with an energy function capturing the energy storage at the nodes. We discuss two methods for structure-preserving model reduction. The first one is clustering; aggregating the nodes of the underlying graph to obtain a reduced graph. The second approach is based on neglecting the energy storage at some of the nodes, and subsequently eliminating those nodes (called Kron reduction).</p
    corecore