9,812 research outputs found

    Expectile Matrix Factorization for Skewed Data Analysis

    Full text link
    Matrix factorization is a popular approach to solving matrix estimation problems based on partial observations. Existing matrix factorization is based on least squares and aims to yield a low-rank matrix to interpret the conditional sample means given the observations. However, in many real applications with skewed and extreme data, least squares cannot explain their central tendency or tail distributions, yielding undesired estimates. In this paper, we propose \emph{expectile matrix factorization} by introducing asymmetric least squares, a key concept in expectile regression analysis, into the matrix factorization framework. We propose an efficient algorithm to solve the new problem based on alternating minimization and quadratic programming. We prove that our algorithm converges to a global optimum and exactly recovers the true underlying low-rank matrices when noise is zero. For synthetic data with skewed noise and a real-world dataset containing web service response times, the proposed scheme achieves lower recovery errors than the existing matrix factorization method based on least squares in a wide range of settings.Comment: 8 page main text with 5 page supplementary documents, published in AAAI 201

    Bayesian Cointegrated Vector Autoregression models incorporating Alpha-stable noise for inter-day price movements via Approximate Bayesian Computation

    Full text link
    We consider a statistical model for pairs of traded assets, based on a Cointegrated Vector Auto Regression (CVAR) Model. We extend standard CVAR models to incorporate estimation of model parameters in the presence of price series level shifts which are not accurately modeled in the standard Gaussian error correction model (ECM) framework. This involves developing a novel matrix variate Bayesian CVAR mixture model comprised of Gaussian errors intra-day and Alpha-stable errors inter-day in the ECM framework. To achieve this we derive a novel conjugate posterior model for the Scaled Mixtures of Normals (SMiN CVAR) representation of Alpha-stable inter-day innovations. These results are generalized to asymmetric models for the innovation noise at inter-day boundaries allowing for skewed Alpha-stable models. Our proposed model and sampling methodology is general, incorporating the current literature on Gaussian models as a special subclass and also allowing for price series level shifts either at random estimated time points or known a priori time points. We focus analysis on regularly observed non-Gaussian level shifts that can have significant effect on estimation performance in statistical models failing to account for such level shifts, such as at the close and open of markets. We compare the estimation accuracy of our model and estimation approach to standard frequentist and Bayesian procedures for CVAR models when non-Gaussian price series level shifts are present in the individual series, such as inter-day boundaries. We fit a bi-variate Alpha-stable model to the inter-day jumps and model the effect of such jumps on estimation of matrix-variate CVAR model parameters using the likelihood based Johansen procedure and a Bayesian estimation. We illustrate our model and the corresponding estimation procedures we develop on both synthetic and actual data.Comment: 30 page

    An Oracle Inequality for Quasi-Bayesian Non-Negative Matrix Factorization

    Get PDF
    The aim of this paper is to provide some theoretical understanding of quasi-Bayesian aggregation methods non-negative matrix factorization. We derive an oracle inequality for an aggregated estimator. This result holds for a very general class of prior distributions and shows how the prior affects the rate of convergence.Comment: This is the corrected version of the published paper P. Alquier, B. Guedj, An Oracle Inequality for Quasi-Bayesian Non-negative Matrix Factorization, Mathematical Methods of Statistics, 2017, vol. 26, no. 1, pp. 55-67. Since then Arnak Dalalyan (ENSAE) found a mistake in the proofs. We fixed the mistake at the price of a slightly different logarithmic term in the boun

    Latitude: A Model for Mixed Linear-Tropical Matrix Factorization

    Full text link
    Nonnegative matrix factorization (NMF) is one of the most frequently-used matrix factorization models in data analysis. A significant reason to the popularity of NMF is its interpretability and the `parts of whole' interpretation of its components. Recently, max-times, or subtropical, matrix factorization (SMF) has been introduced as an alternative model with equally interpretable `winner takes it all' interpretation. In this paper we propose a new mixed linear--tropical model, and a new algorithm, called Latitude, that combines NMF and SMF, being able to smoothly alternate between the two. In our model, the data is modeled using the latent factors and latent parameters that control whether the factors are interpreted as NMF or SMF features, or their mixtures. We present an algorithm for our novel matrix factorization. Our experiments show that our algorithm improves over both baselines, and can yield interpretable results that reveal more of the latent structure than either NMF or SMF alone.Comment: 14 pages, 6 figures. To appear in 2018 SIAM International Conference on Data Mining (SDM '18). For the source code, see https://people.mpi-inf.mpg.de/~pmiettin/linear-tropical
    corecore