9,812 research outputs found
Expectile Matrix Factorization for Skewed Data Analysis
Matrix factorization is a popular approach to solving matrix estimation
problems based on partial observations. Existing matrix factorization is based
on least squares and aims to yield a low-rank matrix to interpret the
conditional sample means given the observations. However, in many real
applications with skewed and extreme data, least squares cannot explain their
central tendency or tail distributions, yielding undesired estimates. In this
paper, we propose \emph{expectile matrix factorization} by introducing
asymmetric least squares, a key concept in expectile regression analysis, into
the matrix factorization framework. We propose an efficient algorithm to solve
the new problem based on alternating minimization and quadratic programming. We
prove that our algorithm converges to a global optimum and exactly recovers the
true underlying low-rank matrices when noise is zero. For synthetic data with
skewed noise and a real-world dataset containing web service response times,
the proposed scheme achieves lower recovery errors than the existing matrix
factorization method based on least squares in a wide range of settings.Comment: 8 page main text with 5 page supplementary documents, published in
AAAI 201
Bayesian Cointegrated Vector Autoregression models incorporating Alpha-stable noise for inter-day price movements via Approximate Bayesian Computation
We consider a statistical model for pairs of traded assets, based on a
Cointegrated Vector Auto Regression (CVAR) Model. We extend standard CVAR
models to incorporate estimation of model parameters in the presence of price
series level shifts which are not accurately modeled in the standard Gaussian
error correction model (ECM) framework. This involves developing a novel matrix
variate Bayesian CVAR mixture model comprised of Gaussian errors intra-day and
Alpha-stable errors inter-day in the ECM framework. To achieve this we derive a
novel conjugate posterior model for the Scaled Mixtures of Normals (SMiN CVAR)
representation of Alpha-stable inter-day innovations. These results are
generalized to asymmetric models for the innovation noise at inter-day
boundaries allowing for skewed Alpha-stable models.
Our proposed model and sampling methodology is general, incorporating the
current literature on Gaussian models as a special subclass and also allowing
for price series level shifts either at random estimated time points or known a
priori time points. We focus analysis on regularly observed non-Gaussian level
shifts that can have significant effect on estimation performance in
statistical models failing to account for such level shifts, such as at the
close and open of markets. We compare the estimation accuracy of our model and
estimation approach to standard frequentist and Bayesian procedures for CVAR
models when non-Gaussian price series level shifts are present in the
individual series, such as inter-day boundaries. We fit a bi-variate
Alpha-stable model to the inter-day jumps and model the effect of such jumps on
estimation of matrix-variate CVAR model parameters using the likelihood based
Johansen procedure and a Bayesian estimation. We illustrate our model and the
corresponding estimation procedures we develop on both synthetic and actual
data.Comment: 30 page
An Oracle Inequality for Quasi-Bayesian Non-Negative Matrix Factorization
The aim of this paper is to provide some theoretical understanding of
quasi-Bayesian aggregation methods non-negative matrix factorization. We derive
an oracle inequality for an aggregated estimator. This result holds for a very
general class of prior distributions and shows how the prior affects the rate
of convergence.Comment: This is the corrected version of the published paper P. Alquier, B.
Guedj, An Oracle Inequality for Quasi-Bayesian Non-negative Matrix
Factorization, Mathematical Methods of Statistics, 2017, vol. 26, no. 1, pp.
55-67. Since then Arnak Dalalyan (ENSAE) found a mistake in the proofs. We
fixed the mistake at the price of a slightly different logarithmic term in
the boun
Latitude: A Model for Mixed Linear-Tropical Matrix Factorization
Nonnegative matrix factorization (NMF) is one of the most frequently-used
matrix factorization models in data analysis. A significant reason to the
popularity of NMF is its interpretability and the `parts of whole'
interpretation of its components. Recently, max-times, or subtropical, matrix
factorization (SMF) has been introduced as an alternative model with equally
interpretable `winner takes it all' interpretation. In this paper we propose a
new mixed linear--tropical model, and a new algorithm, called Latitude, that
combines NMF and SMF, being able to smoothly alternate between the two. In our
model, the data is modeled using the latent factors and latent parameters that
control whether the factors are interpreted as NMF or SMF features, or their
mixtures. We present an algorithm for our novel matrix factorization. Our
experiments show that our algorithm improves over both baselines, and can yield
interpretable results that reveal more of the latent structure than either NMF
or SMF alone.Comment: 14 pages, 6 figures. To appear in 2018 SIAM International Conference
on Data Mining (SDM '18). For the source code, see
https://people.mpi-inf.mpg.de/~pmiettin/linear-tropical
- …