761 research outputs found

    Provable Alternating Gradient Descent for Non-negative Matrix Factorization with Strong Correlations

    Full text link
    Non-negative matrix factorization is a basic tool for decomposing data into the feature and weight matrices under non-negativity constraints, and in practice is often solved in the alternating minimization framework. However, it is unclear whether such algorithms can recover the ground-truth feature matrix when the weights for different features are highly correlated, which is common in applications. This paper proposes a simple and natural alternating gradient descent based algorithm, and shows that with a mild initialization it provably recovers the ground-truth in the presence of strong correlations. In most interesting cases, the correlation can be in the same order as the highest possible. Our analysis also reveals its several favorable features including robustness to noise. We complement our theoretical results with empirical studies on semi-synthetic datasets, demonstrating its advantage over several popular methods in recovering the ground-truth.Comment: Accepted to the International Conference on Machine Learning (ICML), 201

    The adaptive Crouzeix-Raviart element method for convection-diffusion eigenvalue problems

    Full text link
    The convection-diffusion eigenvalue problems are hot topics, and computational mathematics community and physics community are concerned about them in recent years. In this paper, we consider the a posteriori error analysis and the adaptive algorithm of the Crouzeix-Raviart nonconforming element method for the convection-diffusion eigenvalue problems. We give the corresponding a posteriori error estimators, and prove their reliability and efficiency. Finally, the numerical results validate the theoretical analysis and show that the algorithm presented in this paper is efficient

    A Decomposition-Based Many-Objective Evolutionary Algorithm with Local Iterative Update

    Full text link
    Existing studies have shown that the conventional multi-objective evolutionary algorithms (MOEAs) based on decomposition may lose the population diversity when solving some many-objective optimization problems. In this paper, a simple decomposition-based MOEA with local iterative update (LIU) is proposed. The LIU strategy has two features that are expected to drive the population to approximate the Pareto Front with good distribution. One is that only the worst solution in the current neighborhood is swapped out by the newly generated offspring, preventing the population from being occupied by copies of a few individuals. The other is that its iterative process helps to assign better solutions to subproblems, which is beneficial to make full use of the similarity of solutions to neighboring subproblems and explore local areas in the search space. In addition, the time complexity of the proposed algorithm is the same as that of MOEA/D, and lower than that of other known MOEAs, since it considers only individuals within the current neighborhood at each update. The algorithm is compared with several of the best MOEAs on problems chosen from two famous test suites DTLZ and WFG. Experimental results demonstrate that only a handful of running instances of the algorithm on DTLZ4 lose their population diversity. What's more, the algorithm wins in most of the test instances in terms of both running time and solution quality, indicating that it is very effective in solving MaOPs.Comment: arXiv admin note: text overlap with arXiv:1803.0628

    Learning Mixtures of Linear Regressions with Nearly Optimal Complexity

    Full text link
    Mixtures of Linear Regressions (MLR) is an important mixture model with many applications. In this model, each observation is generated from one of the several unknown linear regression components, where the identity of the generated component is also unknown. Previous works either assume strong assumptions on the data distribution or have high complexity. This paper proposes a fixed parameter tractable algorithm for the problem under general conditions, which achieves global convergence and the sample complexity scales nearly linearly in the dimension. In particular, different from previous works that require the data to be from the standard Gaussian, the algorithm allows the data from Gaussians with different covariances. When the conditional number of the covariances and the number of components are fixed, the algorithm has nearly optimal sample complexity N=O~(d)N = \tilde{O}(d) as well as nearly optimal computational complexity O~(Nd)\tilde{O}(Nd), where dd is the dimension of the data space. To the best of our knowledge, this approach provides the first such recovery guarantee for this general setting.Comment: Fix some typesetting issue in v

    An efficient quantum search engine on unsorted database

    Full text link
    We consider the problem of finding one or more desired items out of an unsorted database. Patel has shown that if the database permits quantum queries, then mere digitization is sufficient for efficient search for one desired item. The algorithm, called factorized quantum search algorithm, presented by him can locate the desired item in an unsorted database using O(log4N)O(log_{4}N) queries to factorized oracles. But the algorithm requires that all the property values must be distinct from each other. In this paper, we discuss how to make a database satisfy the requirements, and present a quantum search engine based on the algorithm. Our goal is achieved by introducing auxiliary files for the property values that are not distinct, and converting every complex query request into a sequence of calls to factorized quantum search algorithm. The query complexity of our algorithm is O(Pβˆ—Qβˆ—Mβˆ—log4N)O(P*Q*M*log_{4}N), where P is the number of the potential simple query requests in the complex query request, Q is the maximum number of calls to the factorized quantum search algorithm of the simple queries, M is the number of the auxiliary files for the property on which our algorithm are searching for desired items. This implies that to manage an unsorted database on an actual quantum computer is possible and efficient.Comment: 7 pages, 1 figur

    Recovery guarantee of weighted low-rank approximation via alternating minimization

    Full text link
    Many applications require recovering a ground truth low-rank matrix from noisy observations of the entries, which in practice is typically formulated as a weighted low-rank approximation problem and solved by non-convex optimization heuristics such as alternating minimization. In this paper, we provide provable recovery guarantee of weighted low-rank via a simple alternating minimization algorithm. In particular, for a natural class of matrices and weights and without any assumption on the noise, we bound the spectral norm of the difference between the recovered matrix and the ground truth, by the spectral norm of the weighted noise plus an additive error that decreases exponentially with the number of rounds of alternating minimization, from either initialization by SVD or, more importantly, random initialization. These provide the first theoretical results for weighted low-rank via alternating minimization with non-binary deterministic weights, significantly generalizing those for matrix completion, the special case with binary weights, since our assumptions are similar or weaker than those made in existing works. Furthermore, this is achieved by a very simple algorithm that improves the vanilla alternating minimization with a simple clipping step. The key technical challenge is that under non-binary deterministic weights, na\"ive alternating steps will destroy the incoherence and spectral properties of the intermediate solutions, which are needed for making progress towards the ground truth. We show that the properties only need to hold in an average sense and can be achieved by the clipping step. We further provide an alternating algorithm that uses a whitening step that keeps the properties via SDP and Rademacher rounding and thus requires weaker assumptions. This technique can potentially be applied in some other applications and is of independent interest.Comment: 40 pages. Updated with the ICML 2016 camera ready version, together with an additional algorithm which needs less assumptions in Appendix

    Generalizing Word Embeddings using Bag of Subwords

    Full text link
    We approach the problem of generalizing pre-trained word embeddings beyond fixed-size vocabularies without using additional contextual information. We propose a subword-level word vector generation model that views words as bags of character nn-grams. The model is simple, fast to train and provides good vectors for rare or unseen words. Experiments show that our model achieves state-of-the-art performances in English word similarity task and in joint prediction of part-of-speech tag and morphosyntactic attributes in 23 languages, suggesting our model's ability in capturing the relationship between words' textual representations and their embeddings.Comment: Accepted to EMNLP 201

    Why are deep nets reversible: A simple theory, with implications for training

    Full text link
    Generative models for deep learning are promising both to improve understanding of the model, and yield training methods requiring fewer labeled samples. Recent works use generative model approaches to produce the deep net's input given the value of a hidden layer several levels above. However, there is no accompanying "proof of correctness" for the generative model, showing that the feedforward deep net is the correct inference method for recovering the hidden layer given the input. Furthermore, these models are complicated. The current paper takes a more theoretical tack. It presents a very simple generative model for RELU deep nets, with the following characteristics: (i) The generative model is just the reverse of the feedforward net: if the forward transformation at a layer is AA then the reverse transformation is ATA^T. (This can be seen as an explanation of the old weight tying idea for denoising autoencoders.) (ii) Its correctness can be proven under a clean theoretical assumption: the edge weights in real-life deep nets behave like random numbers. Under this assumption ---which is experimentally tested on real-life nets like AlexNet--- it is formally proved that feed forward net is a correct inference method for recovering the hidden layer. The generative model suggests a simple modification for training: use the generative model to produce synthetic data with labels and include it in the training set. Experiments are shown to support this theory of random-like deep nets; and that it helps the training

    Learning and Generalization in Overparameterized Neural Networks, Going Beyond Two Layers

    Full text link
    The fundamental learning theory behind neural networks remains largely open. What classes of functions can neural networks actually learn? Why doesn't the trained network overfit when it is overparameterized? In this work, we prove that overparameterized neural networks can learn some notable concept classes, including two and three-layer networks with fewer parameters and smooth activations. Moreover, the learning can be simply done by SGD (stochastic gradient descent) or its variants in polynomial time using polynomially many samples. The sample complexity can also be almost independent of the number of parameters in the network. On the technique side, our analysis goes beyond the so-called NTK (neural tangent kernel) linearization of neural networks in prior works. We establish a new notion of quadratic approximation of the neural network (that can be viewed as a second-order variant of NTK), and connect it to the SGD theory of escaping saddle points.Comment: V1/V2/V3/V4 polish writing, V5 adds experiments, V6 reflects our camera ready versio

    Distributed k-Means and k-Median Clustering on General Topologies

    Full text link
    This paper provides new algorithms for distributed clustering for two popular center-based objectives, k-median and k-means. These algorithms have provable guarantees and improve communication complexity over existing approaches. Following a classic approach in clustering by \cite{har2004coresets}, we reduce the problem of finding a clustering with low cost to the problem of finding a coreset of small size. We provide a distributed method for constructing a global coreset which improves over the previous methods by reducing the communication complexity, and which works over general communication topologies. Experimental results on large scale data sets show that this approach outperforms other coreset-based distributed clustering algorithms.Comment: Corrected Theorem 4 in the appendi
    • …
    corecore