Search CORE

761 research outputs found

Provable Alternating Gradient Descent for Non-negative Matrix Factorization with Strong Correlations

Author: Li Yuanzhi
Liang Yingyu
Publication venue
Publication date: 13/06/2017
Field of study

Non-negative matrix factorization is a basic tool for decomposing data into the feature and weight matrices under non-negativity constraints, and in practice is often solved in the alternating minimization framework. However, it is unclear whether such algorithms can recover the ground-truth feature matrix when the weights for different features are highly correlated, which is common in applications. This paper proposes a simple and natural alternating gradient descent based algorithm, and shows that with a mild initialization it provably recovers the ground-truth in the presence of strong correlations. In most interesting cases, the correlation can be in the same order as the highest possible. Our analysis also reveals its several favorable features including robustness to noise. We complement our theoretical results with empirical studies on semi-synthetic datasets, demonstrating its advantage over several popular methods in recovering the ground-truth.Comment: Accepted to the International Conference on Machine Learning (ICML), 201

arXiv.org e-Print Archive

The adaptive Crouzeix-Raviart element method for convection-diffusion eigenvalue problems

Author: Chen Qinghua
Du Yingyu
Publication venue
Publication date: 09/06/2016
Field of study

The convection-diffusion eigenvalue problems are hot topics, and computational mathematics community and physics community are concerned about them in recent years. In this paper, we consider the a posteriori error analysis and the adaptive algorithm of the Crouzeix-Raviart nonconforming element method for the convection-diffusion eigenvalue problems. We give the corresponding a posteriori error estimators, and prove their reliability and efficiency. Finally, the numerical results validate the theoretical analysis and show that the algorithm presented in this paper is efficient

arXiv.org e-Print Archive

A Decomposition-Based Many-Objective Evolutionary Algorithm with Local Iterative Update

Author: Zeng Bing
Zhang Yingyu
Publication venue
Publication date: 26/06/2018
Field of study

Existing studies have shown that the conventional multi-objective evolutionary algorithms (MOEAs) based on decomposition may lose the population diversity when solving some many-objective optimization problems. In this paper, a simple decomposition-based MOEA with local iterative update (LIU) is proposed. The LIU strategy has two features that are expected to drive the population to approximate the Pareto Front with good distribution. One is that only the worst solution in the current neighborhood is swapped out by the newly generated offspring, preventing the population from being occupied by copies of a few individuals. The other is that its iterative process helps to assign better solutions to subproblems, which is beneficial to make full use of the similarity of solutions to neighboring subproblems and explore local areas in the search space. In addition, the time complexity of the proposed algorithm is the same as that of MOEA/D, and lower than that of other known MOEAs, since it considers only individuals within the current neighborhood at each update. The algorithm is compared with several of the best MOEAs on problems chosen from two famous test suites DTLZ and WFG. Experimental results demonstrate that only a handful of running instances of the algorithm on DTLZ4 lose their population diversity. What's more, the algorithm wins in most of the test instances in terms of both running time and solution quality, indicating that it is very effective in solving MaOPs.Comment: arXiv admin note: text overlap with arXiv:1803.0628

arXiv.org e-Print Archive

Learning Mixtures of Linear Regressions with Nearly Optimal Complexity

Author: Li Yuanzhi
Liang Yingyu
Publication venue
Publication date: 28/03/2020
Field of study

Mixtures of Linear Regressions (MLR) is an important mixture model with many applications. In this model, each observation is generated from one of the several unknown linear regression components, where the identity of the generated component is also unknown. Previous works either assume strong assumptions on the data distribution or have high complexity. This paper proposes a fixed parameter tractable algorithm for the problem under general conditions, which achieves global convergence and the sample complexity scales nearly linearly in the dimension. In particular, different from previous works that require the data to be from the standard Gaussian, the algorithm allows the data from Gaussians with different covariances. When the conditional number of the covariances and the number of components are fixed, the algorithm has nearly optimal sample complexity

N = \tilde{O}(d)

as well as nearly optimal computational complexity

\tilde{O}(Nd)

, where

d

is the dimension of the data space. To the best of our knowledge, this approach provides the first such recovery guarantee for this general setting.Comment: Fix some typesetting issue in v

arXiv.org e-Print Archive

An efficient quantum search engine on unsorted database

Author: Hu Heping
Lu Zhengding
Zhang Yingyu
Publication venue
Publication date: 20/04/2009
Field of study

We consider the problem of finding one or more desired items out of an unsorted database. Patel has shown that if the database permits quantum queries, then mere digitization is sufficient for efficient search for one desired item. The algorithm, called factorized quantum search algorithm, presented by him can locate the desired item in an unsorted database using

O(log_{4}N)

queries to factorized oracles. But the algorithm requires that all the property values must be distinct from each other. In this paper, we discuss how to make a database satisfy the requirements, and present a quantum search engine based on the algorithm. Our goal is achieved by introducing auxiliary files for the property values that are not distinct, and converting every complex query request into a sequence of calls to factorized quantum search algorithm. The query complexity of our algorithm is

O(P*Q*M*log_{4}N)

, where P is the number of the potential simple query requests in the complex query request, Q is the maximum number of calls to the factorized quantum search algorithm of the simple queries, M is the number of the auxiliary files for the property on which our algorithm are searching for desired items. This implies that to manage an unsorted database on an actual quantum computer is possible and efficient.Comment: 7 pages, 1 figur

arXiv.org e-Print Archive

Recovery guarantee of weighted low-rank approximation via alternating minimization

Author: Li Yuanzhi
Liang Yingyu
Risteski Andrej
Publication venue
Publication date: 08/12/2016
Field of study

Many applications require recovering a ground truth low-rank matrix from noisy observations of the entries, which in practice is typically formulated as a weighted low-rank approximation problem and solved by non-convex optimization heuristics such as alternating minimization. In this paper, we provide provable recovery guarantee of weighted low-rank via a simple alternating minimization algorithm. In particular, for a natural class of matrices and weights and without any assumption on the noise, we bound the spectral norm of the difference between the recovered matrix and the ground truth, by the spectral norm of the weighted noise plus an additive error that decreases exponentially with the number of rounds of alternating minimization, from either initialization by SVD or, more importantly, random initialization. These provide the first theoretical results for weighted low-rank via alternating minimization with non-binary deterministic weights, significantly generalizing those for matrix completion, the special case with binary weights, since our assumptions are similar or weaker than those made in existing works. Furthermore, this is achieved by a very simple algorithm that improves the vanilla alternating minimization with a simple clipping step. The key technical challenge is that under non-binary deterministic weights, na\"ive alternating steps will destroy the incoherence and spectral properties of the intermediate solutions, which are needed for making progress towards the ground truth. We show that the properties only need to hold in an average sense and can be achieved by the clipping step. We further provide an alternating algorithm that uses a whitening step that keeps the properties via SDP and Rademacher rounding and thus requires weaker assumptions. This technique can potentially be applied in some other applications and is of independent interest.Comment: 40 pages. Updated with the ICML 2016 camera ready version, together with an additional algorithm which needs less assumptions in Appendix

arXiv.org e-Print Archive

Generalizing Word Embeddings using Bag of Subwords

Author: Liang Yingyu
Mudgal Sidharth
Zhao Jinman
Publication venue
Publication date: 12/09/2018
Field of study

We approach the problem of generalizing pre-trained word embeddings beyond fixed-size vocabularies without using additional contextual information. We propose a subword-level word vector generation model that views words as bags of character

n

-grams. The model is simple, fast to train and provides good vectors for rare or unseen words. Experiments show that our model achieves state-of-the-art performances in English word similarity task and in joint prediction of part-of-speech tag and morphosyntactic attributes in 23 languages, suggesting our model's ability in capturing the relationship between words' textual representations and their embeddings.Comment: Accepted to EMNLP 201

arXiv.org e-Print Archive

Why are deep nets reversible: A simple theory, with implications for training

Author: Arora Sanjeev
Liang Yingyu
Ma Tengyu
Publication venue
Publication date: 19/11/2015
Field of study

Generative models for deep learning are promising both to improve understanding of the model, and yield training methods requiring fewer labeled samples. Recent works use generative model approaches to produce the deep net's input given the value of a hidden layer several levels above. However, there is no accompanying "proof of correctness" for the generative model, showing that the feedforward deep net is the correct inference method for recovering the hidden layer given the input. Furthermore, these models are complicated. The current paper takes a more theoretical tack. It presents a very simple generative model for RELU deep nets, with the following characteristics: (i) The generative model is just the reverse of the feedforward net: if the forward transformation at a layer is

A

then the reverse transformation is

A^T

. (This can be seen as an explanation of the old weight tying idea for denoising autoencoders.) (ii) Its correctness can be proven under a clean theoretical assumption: the edge weights in real-life deep nets behave like random numbers. Under this assumption ---which is experimentally tested on real-life nets like AlexNet--- it is formally proved that feed forward net is a correct inference method for recovering the hidden layer. The generative model suggests a simple modification for training: use the generative model to produce synthetic data with labels and include it in the training set. Experiments are shown to support this theory of random-like deep nets; and that it helps the training

arXiv.org e-Print Archive

Learning and Generalization in Overparameterized Neural Networks, Going Beyond Two Layers

Author: Allen-Zhu Zeyuan
Li Yuanzhi
Liang Yingyu
Publication venue
Publication date: 01/06/2020
Field of study

The fundamental learning theory behind neural networks remains largely open. What classes of functions can neural networks actually learn? Why doesn't the trained network overfit when it is overparameterized? In this work, we prove that overparameterized neural networks can learn some notable concept classes, including two and three-layer networks with fewer parameters and smooth activations. Moreover, the learning can be simply done by SGD (stochastic gradient descent) or its variants in polynomial time using polynomially many samples. The sample complexity can also be almost independent of the number of parameters in the network. On the technique side, our analysis goes beyond the so-called NTK (neural tangent kernel) linearization of neural networks in prior works. We establish a new notion of quadratic approximation of the neural network (that can be viewed as a second-order variant of NTK), and connect it to the SGD theory of escaping saddle points.Comment: V1/V2/V3/V4 polish writing, V5 adds experiments, V6 reflects our camera ready versio

arXiv.org e-Print Archive

Distributed k-Means and k-Median Clustering on General Topologies

Author: Balcan Maria Florina
Ehrlich Steven
Liang Yingyu
Publication venue
Publication date: 25/01/2020
Field of study

This paper provides new algorithms for distributed clustering for two popular center-based objectives, k-median and k-means. These algorithms have provable guarantees and improve communication complexity over existing approaches. Following a classic approach in clustering by \cite{har2004coresets}, we reduce the problem of finding a clustering with low cost to the problem of finding a coreset of small size. We provide a distributed method for constructing a global coreset which improves over the previous methods by reducing the communication complexity, and which works over general communication topologies. Experimental results on large scale data sets show that this approach outperforms other coreset-based distributed clustering algorithms.Comment: Corrected Theorem 4 in the appendi

arXiv.org e-Print Archive