761 research outputs found
Provable Alternating Gradient Descent for Non-negative Matrix Factorization with Strong Correlations
Non-negative matrix factorization is a basic tool for decomposing data into
the feature and weight matrices under non-negativity constraints, and in
practice is often solved in the alternating minimization framework. However, it
is unclear whether such algorithms can recover the ground-truth feature matrix
when the weights for different features are highly correlated, which is common
in applications. This paper proposes a simple and natural alternating gradient
descent based algorithm, and shows that with a mild initialization it provably
recovers the ground-truth in the presence of strong correlations. In most
interesting cases, the correlation can be in the same order as the highest
possible. Our analysis also reveals its several favorable features including
robustness to noise. We complement our theoretical results with empirical
studies on semi-synthetic datasets, demonstrating its advantage over several
popular methods in recovering the ground-truth.Comment: Accepted to the International Conference on Machine Learning (ICML),
201
The adaptive Crouzeix-Raviart element method for convection-diffusion eigenvalue problems
The convection-diffusion eigenvalue problems are hot topics, and
computational mathematics community and physics community are concerned about
them in recent years. In this paper, we consider the a posteriori error
analysis and the adaptive algorithm of the Crouzeix-Raviart nonconforming
element method for the convection-diffusion eigenvalue problems. We give the
corresponding a posteriori error estimators, and prove their reliability and
efficiency. Finally, the numerical results validate the theoretical analysis
and show that the algorithm presented in this paper is efficient
A Decomposition-Based Many-Objective Evolutionary Algorithm with Local Iterative Update
Existing studies have shown that the conventional multi-objective
evolutionary algorithms (MOEAs) based on decomposition may lose the population
diversity when solving some many-objective optimization problems. In this
paper, a simple decomposition-based MOEA with local iterative update (LIU) is
proposed. The LIU strategy has two features that are expected to drive the
population to approximate the Pareto Front with good distribution. One is that
only the worst solution in the current neighborhood is swapped out by the newly
generated offspring, preventing the population from being occupied by copies of
a few individuals. The other is that its iterative process helps to assign
better solutions to subproblems, which is beneficial to make full use of the
similarity of solutions to neighboring subproblems and explore local areas in
the search space. In addition, the time complexity of the proposed algorithm is
the same as that of MOEA/D, and lower than that of other known MOEAs, since it
considers only individuals within the current neighborhood at each update. The
algorithm is compared with several of the best MOEAs on problems chosen from
two famous test suites DTLZ and WFG. Experimental results demonstrate that only
a handful of running instances of the algorithm on DTLZ4 lose their population
diversity. What's more, the algorithm wins in most of the test instances in
terms of both running time and solution quality, indicating that it is very
effective in solving MaOPs.Comment: arXiv admin note: text overlap with arXiv:1803.0628
Learning Mixtures of Linear Regressions with Nearly Optimal Complexity
Mixtures of Linear Regressions (MLR) is an important mixture model with many
applications. In this model, each observation is generated from one of the
several unknown linear regression components, where the identity of the
generated component is also unknown. Previous works either assume strong
assumptions on the data distribution or have high complexity. This paper
proposes a fixed parameter tractable algorithm for the problem under general
conditions, which achieves global convergence and the sample complexity scales
nearly linearly in the dimension. In particular, different from previous works
that require the data to be from the standard Gaussian, the algorithm allows
the data from Gaussians with different covariances. When the conditional number
of the covariances and the number of components are fixed, the algorithm has
nearly optimal sample complexity as well as nearly optimal
computational complexity , where is the dimension of the
data space. To the best of our knowledge, this approach provides the first such
recovery guarantee for this general setting.Comment: Fix some typesetting issue in v
An efficient quantum search engine on unsorted database
We consider the problem of finding one or more desired items out of an
unsorted database. Patel has shown that if the database permits quantum
queries, then mere digitization is sufficient for efficient search for one
desired item. The algorithm, called factorized quantum search algorithm,
presented by him can locate the desired item in an unsorted database using
queries to factorized oracles. But the algorithm requires that
all the property values must be distinct from each other. In this paper, we
discuss how to make a database satisfy the requirements, and present a quantum
search engine based on the algorithm. Our goal is achieved by introducing
auxiliary files for the property values that are not distinct, and converting
every complex query request into a sequence of calls to factorized quantum
search algorithm. The query complexity of our algorithm is ,
where P is the number of the potential simple query requests in the complex
query request, Q is the maximum number of calls to the factorized quantum
search algorithm of the simple queries, M is the number of the auxiliary files
for the property on which our algorithm are searching for desired items. This
implies that to manage an unsorted database on an actual quantum computer is
possible and efficient.Comment: 7 pages, 1 figur
Recovery guarantee of weighted low-rank approximation via alternating minimization
Many applications require recovering a ground truth low-rank matrix from
noisy observations of the entries, which in practice is typically formulated as
a weighted low-rank approximation problem and solved by non-convex optimization
heuristics such as alternating minimization. In this paper, we provide provable
recovery guarantee of weighted low-rank via a simple alternating minimization
algorithm. In particular, for a natural class of matrices and weights and
without any assumption on the noise, we bound the spectral norm of the
difference between the recovered matrix and the ground truth, by the spectral
norm of the weighted noise plus an additive error that decreases exponentially
with the number of rounds of alternating minimization, from either
initialization by SVD or, more importantly, random initialization. These
provide the first theoretical results for weighted low-rank via alternating
minimization with non-binary deterministic weights, significantly generalizing
those for matrix completion, the special case with binary weights, since our
assumptions are similar or weaker than those made in existing works.
Furthermore, this is achieved by a very simple algorithm that improves the
vanilla alternating minimization with a simple clipping step.
The key technical challenge is that under non-binary deterministic weights,
na\"ive alternating steps will destroy the incoherence and spectral properties
of the intermediate solutions, which are needed for making progress towards the
ground truth. We show that the properties only need to hold in an average sense
and can be achieved by the clipping step.
We further provide an alternating algorithm that uses a whitening step that
keeps the properties via SDP and Rademacher rounding and thus requires weaker
assumptions. This technique can potentially be applied in some other
applications and is of independent interest.Comment: 40 pages. Updated with the ICML 2016 camera ready version, together
with an additional algorithm which needs less assumptions in Appendix
Generalizing Word Embeddings using Bag of Subwords
We approach the problem of generalizing pre-trained word embeddings beyond
fixed-size vocabularies without using additional contextual information. We
propose a subword-level word vector generation model that views words as bags
of character -grams. The model is simple, fast to train and provides good
vectors for rare or unseen words. Experiments show that our model achieves
state-of-the-art performances in English word similarity task and in joint
prediction of part-of-speech tag and morphosyntactic attributes in 23
languages, suggesting our model's ability in capturing the relationship between
words' textual representations and their embeddings.Comment: Accepted to EMNLP 201
Why are deep nets reversible: A simple theory, with implications for training
Generative models for deep learning are promising both to improve
understanding of the model, and yield training methods requiring fewer labeled
samples.
Recent works use generative model approaches to produce the deep net's input
given the value of a hidden layer several levels above. However, there is no
accompanying "proof of correctness" for the generative model, showing that the
feedforward deep net is the correct inference method for recovering the hidden
layer given the input. Furthermore, these models are complicated.
The current paper takes a more theoretical tack. It presents a very simple
generative model for RELU deep nets, with the following characteristics: (i)
The generative model is just the reverse of the feedforward net: if the forward
transformation at a layer is then the reverse transformation is .
(This can be seen as an explanation of the old weight tying idea for denoising
autoencoders.) (ii) Its correctness can be proven under a clean theoretical
assumption: the edge weights in real-life deep nets behave like random numbers.
Under this assumption ---which is experimentally tested on real-life nets like
AlexNet--- it is formally proved that feed forward net is a correct inference
method for recovering the hidden layer.
The generative model suggests a simple modification for training: use the
generative model to produce synthetic data with labels and include it in the
training set. Experiments are shown to support this theory of random-like deep
nets; and that it helps the training
Learning and Generalization in Overparameterized Neural Networks, Going Beyond Two Layers
The fundamental learning theory behind neural networks remains largely open.
What classes of functions can neural networks actually learn? Why doesn't the
trained network overfit when it is overparameterized?
In this work, we prove that overparameterized neural networks can learn some
notable concept classes, including two and three-layer networks with fewer
parameters and smooth activations. Moreover, the learning can be simply done by
SGD (stochastic gradient descent) or its variants in polynomial time using
polynomially many samples. The sample complexity can also be almost independent
of the number of parameters in the network.
On the technique side, our analysis goes beyond the so-called NTK (neural
tangent kernel) linearization of neural networks in prior works. We establish a
new notion of quadratic approximation of the neural network (that can be viewed
as a second-order variant of NTK), and connect it to the SGD theory of escaping
saddle points.Comment: V1/V2/V3/V4 polish writing, V5 adds experiments, V6 reflects our
camera ready versio
Distributed k-Means and k-Median Clustering on General Topologies
This paper provides new algorithms for distributed clustering for two popular
center-based objectives, k-median and k-means. These algorithms have provable
guarantees and improve communication complexity over existing approaches.
Following a classic approach in clustering by \cite{har2004coresets}, we reduce
the problem of finding a clustering with low cost to the problem of finding a
coreset of small size. We provide a distributed method for constructing a
global coreset which improves over the previous methods by reducing the
communication complexity, and which works over general communication
topologies. Experimental results on large scale data sets show that this
approach outperforms other coreset-based distributed clustering algorithms.Comment: Corrected Theorem 4 in the appendi
- β¦