28 research outputs found

    Fast Non-Bayesian Poisson Factorization for Implicit-Feedback Recommendations

    Full text link
    This work explores non-negative matrix factorization based on regularized Poisson models for recommender systems with implicit-feedback data. The properties of Poisson likelihood allow a shortcut for very fast computation and optimization over elements with zero-value when the latent-factor matrices are non-negative, making it a more suitable approach than squared loss for very sparse inputs such as implicit-feedback data. A simple and embarrassingly parallel optimization approach based on proximal gradients is presented, which in large datasets converges 2-3 orders of magnitude faster than its Bayesian counterpart (Hierarchical Poisson Factorization) fit through variational inference techniques, and 1 order of magnitude faster than implicit-ALS fit with the Conjugate Gradient method

    Deep Learning in Finance

    Full text link
    We explore the use of deep learning hierarchical models for problems in financial prediction and classification. Financial prediction problems -- such as those presented in designing and pricing securities, constructing portfolios, and risk management -- often involve large data sets with complex data interactions that currently are difficult or impossible to specify in a full economic model. Applying deep learning methods to these problems can produce more useful results than standard methods in finance. In particular, deep learning can detect and exploit interactions in the data that are, at least currently, invisible to any existing financial economic theory.Comment: 20 Pages, 5 Figure

    Sparse Regularization in Marketing and Economics

    Full text link
    Sparse alpha-norm regularization has many data-rich applications in Marketing and Economics. Alpha-norm, in contrast to lasso and ridge regularization, jumps to a sparse solution. This feature is attractive for ultra high-dimensional problems that occur in demand estimation and forecasting. The alpha-norm objective is nonconvex and requires coordinate descent and proximal operators to find the sparse solution. We study a typical marketing demand forecasting problem, grocery store sales for salty snacks, that has many dummy variables as controls. The key predictors of demand include price, equivalized volume, promotion, flavor, scent, and brand effects. By comparing with many commonly used machine learning methods, alpha-norm regularization achieves its goal of providing accurate out-of-sample estimates for the promotion lift effects. Finally, we conclude with directions for future research

    Generalized Linear Model Regression under Distance-to-set Penalties

    Full text link
    Estimation in generalized linear models (GLM) is complicated by the presence of constraints. One can handle constraints by maximizing a penalized log-likelihood. Penalties such as the lasso are effective in high dimensions, but often lead to unwanted shrinkage. This paper explores instead penalizing the squared distance to constraint sets. Distance penalties are more flexible than algebraic and regularization penalties, and avoid the drawback of shrinkage. To optimize distance penalized objectives, we make use of the majorization-minimization principle. Resulting algorithms constructed within this framework are amenable to acceleration and come with global convergence guarantees. Applications to shape constraints, sparse regression, and rank-restricted matrix regression on synthetic and real data showcase strong empirical performance, even under non-convex constraints.Comment: 5 figure

    Bayesian l0l_0-regularized Least Squares

    Full text link
    Bayesian l0l_0-regularized least squares is a variable selection technique for high dimensional predictors. The challenge is optimizing a non-convex objective function via search over model space consisting of all possible predictor combinations. Spike-and-slab (a.k.a. Bernoulli-Gaussian) priors are the gold standard for Bayesian variable selection, with a caveat of computational speed and scalability. Single Best Replacement (SBR) provides a fast scalable alternative. We provide a link between Bayesian regularization and proximal updating, which provides an equivalence between finding a posterior mode and a posterior mean with a different regularization prior. This allows us to use SBR to find the spike-and-slab estimator. To illustrate our methodology, we provide simulation evidence and a real data example on the statistical properties and computational efficiency of SBR versus direct posterior sampling using spike-and-slab priors. Finally, we conclude with directions for future research.Comment: 22 pages, 6 figures, 1 tabl

    Sparse Group Inductive Matrix Completion

    Full text link
    We consider the problem of matrix completion with side information (\textit{inductive matrix completion}). In real-world applications many side-channel features are typically non-informative making feature selection an important part of the problem. We incorporate feature selection into inductive matrix completion by proposing a matrix factorization framework with group-lasso regularization on side feature parameter matrices. We demonstrate, that the theoretical sample complexity for the proposed method is much lower compared to its competitors in sparse problems, and propose an efficient optimization algorithm for the resulting low-rank matrix completion problem with sparsifying regularizers. Experiments on synthetic and real-world datasets show that the proposed approach outperforms other methods

    Horseshoe Regularization for Feature Subset Selection

    Full text link
    Feature subset selection arises in many high-dimensional applications of statistics, such as compressed sensing and genomics. The β„“0\ell_0 penalty is ideal for this task, the caveat being it requires the NP-hard combinatorial evaluation of all models. A recent area of considerable interest is to develop efficient algorithms to fit models with a non-convex β„“Ξ³\ell_\gamma penalty for γ∈(0,1)\gamma\in (0,1), which results in sparser models than the convex β„“1\ell_1 or lasso penalty, but is harder to fit. We propose an alternative, termed the horseshoe regularization penalty for feature subset selection, and demonstrate its theoretical and computational advantages. The distinguishing feature from existing non-convex optimization approaches is a full probabilistic representation of the penalty as the negative of the logarithm of a suitable prior, which in turn enables efficient expectation-maximization and local linear approximation algorithms for optimization and MCMC for uncertainty quantification. In synthetic and real data, the resulting algorithms provide better statistical performance, and the computation requires a fraction of time of state-of-the-art non-convex solvers

    Rendition: Reclaiming what a black box takes away

    Full text link
    The premise of our work is deceptively familiar: A black box f(⋅)f(\cdot) has altered an image x→f(x)\mathbf{x} \rightarrow f(\mathbf{x}). Recover the image x\mathbf{x}. This black box might be any number of simple or complicated things: a linear or non-linear filter, some app on your phone, etc. The latter is a good canonical example for the problem we address: Given only "the app" and an image produced by the app, find the image that was fed to the app. You can run the given image (or any other image) through the app as many times as you like, but you can not look inside the (code for the) app to see how it works. At first blush, the problem sounds a lot like a standard inverse problem, but it is not in the following sense: While we have access to the black box f(⋅)f(\cdot) and can run any image through it and observe the output, we do not know how the block box alters the image. Therefore we have no explicit form or model of f(⋅)f(\cdot). Nor are we necessarily interested in the internal workings of the black box. We are simply happy to reverse its effect on a particular image, to whatever extent possible. This is what we call the "rendition" (rather than restoration) problem, as it does not fit the mold of an inverse problem (blind or otherwise). We describe general conditions under which rendition is possible, and provide a remarkably simple algorithm that works for both contractive and expansive black box operators. The principal and novel take-away message from our work is this surprising fact: One simple algorithm can reliably undo a wide class of (not too violent) image distortions. A higher quality pdf of this paper is available at http://www.milanfar.or

    A Statistical Theory of Deep Learning via Proximal Splitting

    Full text link
    In this paper we develop a statistical theory and an implementation of deep learning models. We show that an elegant variable splitting scheme for the alternating direction method of multipliers optimises a deep learning objective. We allow for non-smooth non-convex regularisation penalties to induce sparsity in parameter weights. We provide a link between traditional shallow layer statistical models such as principal component and sliced inverse regression and deep layer models. We also define the degrees of freedom of a deep learning predictor and a predictive MSE criteria to perform model selection for comparing architecture designs. We focus on deep multiclass logistic learning although our methods apply more generally. Our results suggest an interesting and previously under-exploited relationship between deep learning and proximal splitting techniques. To illustrate our methodology, we provide a multi-class logit classification analysis of Fisher's Iris data where we illustrate the convergence of our algorithm. Finally, we conclude with directions for future research

    Regularizing Bayesian Predictive Regressions

    Full text link
    We show that regularizing Bayesian predictive regressions provides a framework for prior sensitivity analysis. We develop a procedure that jointly regularizes expectations and variance-covariance matrices using a pair of shrinkage priors. Our methodology applies directly to vector autoregressions (VAR) and seemingly unrelated regressions (SUR). The regularization path provides a prior sensitivity diagnostic. By exploiting a duality between regularization penalties and predictive prior distributions, we reinterpret two classic Bayesian analyses of macro-finance studies: equity premium predictability and forecasting macroeconomic growth rates. We find there exist plausible prior specifications for predictability in excess S&P 500 index returns using book-to-market ratios, CAY (consumption, wealth, income ratio), and T-bill rates. We evaluate the forecasts using a market-timing strategy, and we show the optimally regularized solution outperforms a buy-and-hold approach. A second empirical application involves forecasting industrial production, inflation, and consumption growth rates, and demonstrates the feasibility of our approach