1,426 research outputs found

    A Geometric Approach of Gradient Descent Algorithms in Neural Networks

    Full text link
    In this paper, we present an original geometric framework to analyze the convergence properties of gradient descent trajectories in the context of linear neural networks. Built upon a key invariance property induced by the network structure, we propose a conjecture called \emph{overfitting conjecture} stating that, for almost every training data, the corresponding gradient descent trajectory converges to a global minimum, for almost every initial condition. This would imply that, for linear neural networks of an arbitrary number of hidden layers, the solution achieved by simple gradient descent algorithm is equivalent to that of least square estimation. Our first result consists in establishing, in the case of linear networks of arbitrary depth, convergence of gradient descent trajectories to critical points of the loss function. Our second result is the proof of the \emph{overfitting conjecture} in the case of single-hidden-layer linear networks with an argument based on the notion of normal hyperbolicity and under a generic property on the training data (i.e., holding for almost every training data).Comment: Preprint. Work in progres

    Counting Independent Sets and Colorings on Random Regular Bipartite Graphs

    Get PDF
    We give a fully polynomial-time approximation scheme (FPTAS) to count the number of independent sets on almost every Delta-regular bipartite graph if Delta >= 53. In the weighted case, for all sufficiently large integers Delta and weight parameters lambda = Omega~ (1/(Delta)), we also obtain an FPTAS on almost every Delta-regular bipartite graph. Our technique is based on the recent work of Jenssen, Keevash and Perkins (SODA, 2019) and we also apply it to confirm an open question raised there: For all q >= 3 and sufficiently large integers Delta=Delta(q), there is an FPTAS to count the number of q-colorings on almost every Delta-regular bipartite graph

    Improved Bayesian Network Structure Learning in the Model Averaging Paradigm

    Get PDF
    A Bayesian network (BN) is a probabilistic graphical model with applications in knowledge discovery and prediction. Its structure can be learned from data using the well-known score-and-search approach, where a scoring function is used to evaluate the fit of a proposed BN to the data in an unsupervised manner, and the space of directed acyclic graphs is searched for the best-scoring BNs. However, selecting a single model (i.e., the best-scoring BN) is often not the best choice. When one is learning a BN from limited data, selecting a single model may be misleading as there may be many other BNs that have scores that are close to optimal, and the posterior probability of even the best-scoring BN is often close to zero. A more preferred alternative to committing to a single model is to perform some form of Bayesian or frequentist model averaging. A widely used data analysis methodology is to: (i) learn a set of plausible networks that fit the data, (ii) perform model averaging to obtain confidence measure for each edge, and (iii) select a threshold and report all edges with confidence higher than the threshold. In this manner, a representative network can be constructed from the edges that are deemed significant that can then be examined for probabilistic dependencies and possible cause-effect relations. This thesis presents several improvements to Bayesian network structure learning that benefit the data analysis methodology. We propose a novel approach to model averaging inspired by performance guarantees in approximation algorithms. Our approach has two primary advantages. First, our approach only considers credible models in that they are optimal or near-optimal in score. Second, our approach is more efficient and scales to significantly larger Bayesian networks than existing approaches. We empirically study a selection of widely used and also recently proposed scoring functions. We address design limitations of previous empirical studies by scaling our experiments to larger BNs, comparing on an extensive set of both ground truth BNs and real-world datasets, considering alternative performance metrics, and comparing scoring functions on two model averaging frameworks: the bootstrap and the credible set. Contrary to previous recommendations based on finding a single structure, we find that for model averaging the BDeu scoring function is the preferred choice in most scenarios for the bootstrap framework and a recent score qNML is the preferred choice for the credible set framework. We identify an important shortcoming in a widely used threshold selection method. We then propose a simple transfer learning approach for maximizing target metrics and selecting a threshold that can be generalized from proxy datasets to the target dataset and show on an extensive set of benchmarks that it can perform significantly better than previous approaches. We demonstrate via ensemble methods that combining results from multiple scores significantly improve both the bootstrap and the credible set approach on various metrics, and that combining all scores from both approaches still yields better results

    A regularization perspective on spectral sparsification

    Full text link
    In this thesis, we study how to obtain faster algorithms for spectral graph sparsifi-cation by applying continuous optimization techniques. Spectral sparsification is thetask of reducing the number of edges in a graph while maintaining a spectral ap-proximation to the original graph. Our key conceptual contribution is the connectionbetween spectral sparsification and regret minimization in online matrix games, i.e.,online convex programming over the positive semidefinite cone. While this connec-tion was previously noted [24, 47], we formally reduce graph sparsification to a matrixregret minimization problem, which we solve by applying mirror descent with a non-entropic regularizer. In this way, we not only obtain a new proof of the existenceof linear-sized spectral sparsifiers, originally given by [19], but improve the runningtime from Ω(n4)([19, 54]) to almost quadratic. More generally, our framework canalso be applied for the matrix multi-armed bandit online learning problem to reducethe regret bound to the optimalO(√nT), compared to theO(√nTlog(n) given bythe traditional matrix-entropy regulari
    • …
    corecore