Search CORE

23 research outputs found

Mixed Matrix Completion in Complex Survey Sampling under Heterogeneous Missingness

Author: Mao Xiaojun
Wang Hengfang
Wang Zhonglei
Yang Shu
Publication venue
Publication date: 06/02/2024
Field of study

Modern surveys with large sample sizes and growing mixed-type questionnaires require robust and scalable analysis methods. In this work, we consider recovering a mixed dataframe matrix, obtained by complex survey sampling, with entries following different canonical exponential distributions and subject to heterogeneous missingness. To tackle this challenging task, we propose a two-stage procedure: in the first stage, we model the entry-wise missing mechanism by logistic regression, and in the second stage, we complete the target parameter matrix by maximizing a weighted log-likelihood with a low-rank constraint. We propose a fast and scalable estimation algorithm that achieves sublinear convergence, and the upper bound for the estimation error of the proposed method is rigorously derived. Experimental results support our theoretical claims, and the proposed estimator shows its merits compared to other existing methods. The proposed method is applied to analyze the National Health and Nutrition Examination Survey data.Comment: Journal of Computational and Graphical Statistics, 202

arXiv.org e-Print Archive

Rank-One Matrix Completion with Automatic Rank Estimation via L1-Norm Regularization

Author: Cheung Y.M.
Lu H.
Shi Q.
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/10/2018
Field of study

Completing a matrix from a small subset of its entries, i.e., matrix completion is a challenging problem arising from many real-world applications, such as machine learning and computer vision. One popular approach to solve the matrix completion problem is based on low-rank decomposition/factorization. Low-rank matrix decomposition-based methods often require a prespecified rank, which is difficult to determine in practice. In this paper, we propose a novel low-rank decomposition-based matrix completion method with automatic rank estimation. Our method is based on rank-one approximation, where a matrix is represented as a weighted summation of a set of rank-one matrices. To automatically determine the rank of an incomplete matrix, we impose L1-norm regularization on the weight vector and simultaneously minimize the reconstruction error. After obtaining the rank, we further remove the L1-norm regularizer and refine recovery results. With a correctly estimated rank, we can obtain the optimal solution under certain conditions. Experimental results on both synthetic and real-world data demonstrate that the proposed method not only has good performance in rank estimation, but also achieves better recovery accuracy than competing methods

Crossref

White Rose Research Online

Modeling and Optimization for Big Data Analytics

Author: Giannakis Georgios B.
Mateos Gonzalo
Slavakis Konstantinos
Publication venue
Publication date: 01/01/2014
Field of study

Digital Repository of Hellenic Managing Authority of the Operational Programme "Education and Lifelong Learning" (EDULLL)

Flexible And Robust Iterative Methods For The Partial Singular Value Decomposition

Author: Goldenberg Steven
Publication venue: W&M ScholarWorks
Publication date: 01/01/2022
Field of study

The Singular Value Decomposition (SVD) is one of the most fundamental matrix factorizations in linear algebra. As a generalization of the eigenvalue decomposition, the SVD is essential for a wide variety of fields including statistics, signal and image processing, chemistry, quantum physics and even weather prediction. The methods for numerically computing the SVD mostly fall under three main categories: direct, iterative, and streaming. Direct methods focus on solving the SVD in its entirety, making them suitable for smaller dense matrices where the computation cost is tractable. On the other end of the spectrum, streaming methods were created to provide an on-line algorithm that computes an approximate SVD as data is created or read-in over time. Consequently, they can also work on extremely large datasets that cannot fit within memory. To do this, they attempt to obtain only a few singular values and rely on probabilistic guarantees which limit their overall accuracy. Iterative SVD solvers fill in the large gap between these two extremes by providing accurate solutions for a subset of singular values on large (often sparse) matrices. In this dissertation, we focus on the development of flexible and robust iterative SVD solvers that provide fast convergence to high precision. We first introduce a novel iterative solver based on the Golub-Kahan and Davidson methods named GKD. GKD efficiently provides high-precision SVD solutions for large sparse matrices as demonstrated through comparisons with the PRIMME software package. Then, we investigate the use of flexible stopping criteria for GKD and other SVD solvers that are tailored to specific applications. Finally, we analyze the effect of SVD stopping criteria on matrix completion algorithms

College of William & Mary: W&M Publish

Frameworks for Learning from Multiple Tasks

Author: Stamos Dimitris
Publication venue: UCL (University College London)
Publication date: 28/01/2020
Field of study

In this thesis we study different machine learning frameworks for learning multiple tasks together. Depending on the motivations and goals of each learning framework we investigate their computational and statistical properties from both a theoretical and experimental standpoint. The first problem we tackle is low rank matrix learning which is a popular model assumption used in MTL. Trace norm regularization is a widely used approach for learning such models. A standard optimization strategy is based on formulating the problem as one of low rank matrix factorization which, however, leads to a non-convex problem. We show that it is possible to characterize the critical points of the non-convex problem. This allows us to provide an efficient criterion to determine whether a critical point is also a global minimizer. We extend this analysis to the case in which the objective is nonsmooth. The goal of the second problem we worked on is to infer a learning algorithm that works well on a class of tasks sampled from an unknown meta-distribution. As an extension of MTL our goal here is to train on a set of tasks and perform well on future, unseen tasks. We consider a scenario in which the tasks are presented sequentially, without keeping any of their information in memory. We study the statistical properties of that proposed algorithm and prove non-asymptotic bounds for the excess transfer risk. Lastly, a common practice in ML is concatenating many different datasets and applying a learning algorithm on this new dataset. However, training on a collection of heterogeneous datasets can cause issues due to the presence of bias. In this thesis we derive a MTL framework that can jointly learn subcategories within a dataset and undo the inherent bias existing within each of them

UCL Discovery