8 research outputs found

    Contributions to probabilistic non-negative matrix factorization - Maximum marginal likelihood estimation and Markovian temporal models

    Get PDF
    Non-negative matrix factorization (NMF) has become a popular dimensionality reductiontechnique, and has found applications in many different fields, such as audio signal processing,hyperspectral imaging, or recommender systems. In its simplest form, NMF aims at finding anapproximation of a non-negative data matrix (i.e., with non-negative entries) as the product of twonon-negative matrices, called the factors. One of these two matrices can be interpreted as adictionary of characteristic patterns of the data, and the other one as activation coefficients ofthese patterns. This low-rank approximation is traditionally retrieved by optimizing a measure of fitbetween the data matrix and its approximation. As it turns out, for many choices of measures of fit,the problem can be shown to be equivalent to the joint maximum likelihood estimation of thefactors under a certain statistical model describing the data. This leads us to an alternativeparadigm for NMF, where the learning task revolves around probabilistic models whoseobservation density is parametrized by the product of non-negative factors. This general framework, coined probabilistic NMF, encompasses many well-known latent variable models ofthe literature, such as models for count data. In this thesis, we consider specific probabilistic NMFmodels in which a prior distribution is assumed on the activation coefficients, but the dictionary remains a deterministic variable. The objective is then to maximize the marginal likelihood in thesesemi-Bayesian NMF models, i.e., the integrated joint likelihood over the activation coefficients.This amounts to learning the dictionary only; the activation coefficients may be inferred in asecond step if necessary. We proceed to study in greater depth the properties of this estimation process. In particular, two scenarios are considered. In the first one, we assume the independence of the activation coefficients sample-wise. Previous experimental work showed that dictionarieslearned with this approach exhibited a tendency to automatically regularize the number of components, a favorable property which was left unexplained. In the second one, we lift thisstandard assumption, and consider instead Markov structures to add statistical correlation to themodel, in order to better analyze temporal data

    Fairness and Flexibility in Sport Scheduling

    Get PDF

    A Ranking Model Motivated by Nonnegative Matrix Factorization with Applications to Tennis Tournaments

    Get PDF
    International audienceWe propose a novel ranking model that combines the Bradley-Terry-Luce probability model with a nonnegative matrix factorization framework to model and uncover the presence of latent variables that influence the performance of top tennis players. We derive an efficient, provably convergent, and numerically stable majorization-minimization-based algorithm to maximize the likelihood of datasets under the proposed statistical model. The model is tested on datasets involving the outcomes of matches between 20 top male and female tennis players over 14 major tournaments for men (including the Grand Slams and the ATP Masters 1000) and 16 major tournaments for women over the past 10 years. Our model automatically infers that the surface of the court (e.g., clay or hard court) is a key determinant of the performances of male players, but less so for females. Top players on various surfaces over this longitudinal period are also identified in an objective manner
    corecore