Search CORE

190 research outputs found

Computing Large-Scale Matrix and Tensor Decomposition with Structured Factors: A Unified Nonconvex Optimization Perspective

Author: De Lathauwer Lieven
Fu Xiao
Gillis Nicolas
Huang Kejun
Vervliet Nico
Publication venue
Publication date: 05/08/2020
Field of study

The proposed article aims at offering a comprehensive tutorial for the computational aspects of structured matrix and tensor factorization. Unlike existing tutorials that mainly focus on {\it algorithmic procedures} for a small set of problems, e.g., nonnegativity or sparsity-constrained factorization, we take a {\it top-down} approach: we start with general optimization theory (e.g., inexact and accelerated block coordinate descent, stochastic optimization, and Gauss-Newton methods) that covers a wide range of factorization problems with diverse constraints and regularization terms of engineering interest. Then, we go `under the hood' to showcase specific algorithm design under these introduced principles. We pay a particular attention to recent algorithmic developments in structured tensor and matrix factorization (e.g., random sketching and adaptive step size based stochastic optimization and structure-exploiting second-order algorithms), which are the state of the art---yet much less touched upon in the literature compared to {\it block coordinate descent} (BCD)-based methods. We expect that the article to have an educational values in the field of structured factorization and hope to stimulate more research in this important and exciting direction.Comment: Final Version; to appear in IEEE Signal Processing Magazine; title revised to comply with the journal's rul

arXiv.org e-Print Archive

New Approaches in Multi-View Clustering

Author: Chen Chuan
Chen Zitai
Li Rui
Qian Hui
Ye Fanghua
Zheng Zibin
Publication venue: 'IntechOpen'
Publication date: 01/08/2018
Field of study

Many real-world datasets can be naturally described by multiple views. Due to this, multi-view learning has drawn much attention from both academia and industry. Compared to single-view learning, multi-view learning has demonstrated plenty of advantages. Clustering has long been serving as a critical technique in data mining and machine learning. Recently, multi-view clustering has achieved great success in various applications. To provide a comprehensive review of the typical multi-view clustering methods and their corresponding recent developments, this chapter summarizes five kinds of popular clustering methods and their multi-view learning versions, which include k-means, spectral clustering, matrix factorization, tensor decomposition, and deep learning. These clustering methods are the most widely employed algorithms for single-view data, and lots of efforts have been devoted to extending them for multi-view clustering. Besides, many other multi-view clustering methods can be unified into the frameworks of these five methods. To promote further research and development of multi-view clustering, some popular and open datasets are summarized in two categories. Furthermore, several open issues that deserve more exploration are pointed out in the end

IntechOpen

Characterizing Variability of Modular Brain Connectivity with Constrained Principal Component Analysis

Author: Hirayama Jun-ichiro
Hyvarinen Aapo
Kawanabe Motoaki
Kiviniemi Vesa
Yamashita Okito
Publication venue
Publication date: 01/01/2016
Field of study

Characterizing the variability of resting-state functional brain connectivity across subjects and/or over time has recently attracted much attention. Principal component analysis (PCA) serves as a fundamental statistical technique for such analyses. However, performing PCA on high-dimensional connectivity matrices yields complicated "eigenconnectivity" patterns, for which systematic interpretation is a challenging issue. Here, we overcome this issue with a novel constrained PCA method for connectivity matrices by extending the idea of the previously proposed orthogonal connectivity factorization method. Our new method, modular connectivity factorization (MCF), explicitly introduces the modularity of brain networks as a parametric constraint on eigenconnectivity matrices. In particular, MCF analyzes the variability in both intra-and inter-module connectivities, simultaneously finding network modules in a principled, data-driven manner. The parametric constraint provides a compact module based visualization scheme with which the result can be intuitively interpreted. We develop an optimization algorithm to solve the constrained PCA problem and validate our method in simulation studies and with a resting-state functional connectivity MRI dataset of 986 subjects. The results show that the proposed MCF method successfully reveals the underlying modular eigenconnectivity patterns in more general situations and is a promising alternative to existing methods.Peer reviewe

Directory of Open Access Journals

PubMed Central

University of Oulu Repository - Jultika

Helsingin yliopiston digitaalinen arkisto

FigShare

Manifold Elastic Net: A Unified Framework for Sparse Dimension Reduction

Author: A D’aspremont
B Efron
C Fyfe
CHQ Ding
CM Bishop
D Cai
D Tao
D Tao
D Tao
Dacheng Tao
DB Graham
DL Donoho
E Candes
EJ Candes
GH Golub
GM James
H Hotelling
H Zou
H Zou
H Zou
H Zou
HP Kriegel
J Fan
J Fan
J Lv
JB Tenenbaum
M Belkin
M Belkin
PJ Phillips
PN Belhumeur
R Tibshirani
RA Fisher
S Yan
ST Roweis
T Li
T Zhang
Tianyi Zhou
X He
X Li
Xindong Wu
Z Zhang
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 26/07/2010
Field of study

It is difficult to find the optimal sparse solution of a manifold learning based dimensionality reduction algorithm. The lasso or the elastic net penalized manifold learning based dimensionality reduction is not directly a lasso penalized least square problem and thus the least angle regression (LARS) (Efron et al. \cite{LARS}), one of the most popular algorithms in sparse learning, cannot be applied. Therefore, most current approaches take indirect ways or have strict settings, which can be inconvenient for applications. In this paper, we proposed the manifold elastic net or MEN for short. MEN incorporates the merits of both the manifold learning based dimensionality reduction and the sparse learning based dimensionality reduction. By using a series of equivalent transformations, we show MEN is equivalent to the lasso penalized least square problem and thus LARS is adopted to obtain the optimal sparse solution of MEN. In particular, MEN has the following advantages for subsequent classification: 1) the local geometry of samples is well preserved for low dimensional data representation, 2) both the margin maximization and the classification error minimization are considered for sparse projection calculation, 3) the projection matrix of MEN improves the parsimony in computation, 4) the elastic net penalty reduces the over-fitting problem, and 5) the projection matrix of MEN can be interpreted psychologically and physiologically. Experimental evidence on face recognition over various popular datasets suggests that MEN is superior to top level dimensionality reduction algorithms.Comment: 33 pages, 12 figure

arXiv.org e-Print Archive

CiteSeerX

Crossref

OPUS - University of Technology Sydney

Finding a low-rank basis in a matrix subspace

Author: Nakatsukasa Yuji
Soma Tasuku
Uschmajew André
Publication venue
Publication date: 01/01/2016
Field of study

For a given matrix subspace, how can we find a basis that consists of low-rank matrices? This is a generalization of the sparse vector problem. It turns out that when the subspace is spanned by rank-1 matrices, the matrices can be obtained by the tensor CP decomposition. For the higher rank case, the situation is not as straightforward. In this work we present an algorithm based on a greedy process applicable to higher rank problems. Our algorithm first estimates the minimum rank by applying soft singular value thresholding to a nuclear norm relaxation, and then computes a matrix with that rank using the method of alternating projections. We provide local convergence results, and compare our algorithm with several alternative approaches. Applications include data compression beyond the classical truncated SVD, computing accurate eigenvectors of a near-multiple eigenvalue, image separation and graph Laplacian eigenproblems

arXiv.org e-Print Archive

OPUS Augsburg

Crossref

Oxford University Research Archive

Modeling and Optimization for Big Data Analytics

Author: Giannakis Georgios B.
Mateos Gonzalo
Slavakis Konstantinos
Publication venue
Publication date: 01/01/2014
Field of study

Digital Repository of Hellenic Managing Authority of the Operational Programme "Education and Lifelong Learning" (EDULLL)

Classical Algorithms from Quantum and Arthur-Merlin Communication Protocols

Author: Chen Lijie
Wang Ruosong
Publication venue: LIPIcs - Leibniz International Proceedings in Informatics. 10th Innovations in Theoretical Computer Science Conference (ITCS 2019)
Publication date: 01/01/2018
Field of study

In recent years, the polynomial method from circuit complexity has been applied to several fundamental problems and obtains the state-of-the-art running times (e.g., R. Williams\u27s n^3 / 2^{Omega(sqrt{log n})} time algorithm for APSP). As observed in [Alman and Williams, STOC 2017], almost all applications of the polynomial method in algorithm design ultimately rely on certain (probabilistic) low-rank decompositions of the computation matrices corresponding to key subroutines. They suggest that making use of low-rank decompositions directly could lead to more powerful algorithms, as the polynomial method is just one way to derive such a decomposition. Inspired by their observation, in this paper, we study another way of systematically constructing low-rank decompositions of matrices which could be used by algorithms - communication protocols. Since their introduction, it is known that various types of communication protocols lead to certain low-rank decompositions (e.g., P protocols/rank, BQP protocols/approximate rank). These are usually interpreted as approaches for proving communication lower bounds, while in this work we explore the other direction. We have the following two generic algorithmic applications of communication protocols: - Quantum Communication Protocols and Deterministic Approximate Counting. Our first connection is that a fast BQP communication protocol for a function f implies a fast deterministic additive approximate counting algorithm for a related pair counting problem. Applying known BQP communication protocols, we get fast deterministic additive approximate counting algorithms for Count-OV (#OV), Sparse Count-OV and Formula of SYM circuits. In particular, our approximate counting algorithm for #OV runs in near-linear time for all dimensions d = o(log^2 n). Previously, even no truly-subquadratic time algorithm was known for d = omega(log n). - Arthur-Merlin Communication Protocols and Faster Satisfying-Pair Algorithms. Our second connection is that a fast AM^{cc} protocol for a function f implies a faster-than-bruteforce algorithm for f-Satisfying-Pair. Using the classical Goldwasser-Sisper AM protocols for approximating set size, we obtain a new algorithm for approximate Max-IP_{n,c log n} in time n^{2 - 1/O(log c)}, matching the state-of-the-art algorithms in [Chen, CCC 2018]. We also apply our second connection to shed some light on long-standing open problems in communication complexity. We show that if the Longest Common Subsequence (LCS) problem admits a fast (computationally efficient) AM^{cc} protocol (polylog(n) complexity), then polynomial-size Formula-SAT admits a 2^{n - n^{1-delta}} time algorithm for any constant delta > 0, which is conjectured to be unlikely by a recent work [Abboud and Bringmann, ICALP 2018]. The same holds even for a fast (computationally efficient) PH^{cc} protocol

arXiv.org e-Print Archive

Dagstuhl Research Online Publication Server

Privacy-Preserving Tensor Factorization for Collaborative Health Data Analysis

Author: Ho Joyce C.
Jiang Xiaoqian
Lou Jian
Ma Jing
Xiong Li
Zhang Qiuchen
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/11/2019
Field of study

Tensor factorization has been demonstrated as an efficient approach for computational phenotyping, where massive electronic health records (EHRs) are converted to concise and meaningful clinical concepts. While distributing the tensor factorization tasks to local sites can avoid direct data sharing, it still requires the exchange of intermediary results which could reveal sensitive patient information. Therefore, the challenge is how to jointly decompose the tensor under rigorous and principled privacy constraints, while still support the model's interpretability. We propose DPFact, a privacy-preserving collaborative tensor factorization method for computational phenotyping using EHR. It embeds advanced privacy-preserving mechanisms with collaborative learning. Hospitals can keep their EHR database private but also collaboratively learn meaningful clinical concepts by sharing differentially private intermediary results. Moreover, DPFact solves the heterogeneous patient population using a structured sparsity term. In our framework, each hospital decomposes its local tensors, and sends the updated intermediary results with output perturbation every several iterations to a semi-trusted server which generates the phenotypes. The evaluation on both real-world and synthetic datasets demonstrated that under strict privacy constraints, our method is more accurate and communication-efficient than state-of-the-art baseline methods

arXiv.org e-Print Archive

Crossref