51 research outputs found
Laplacian Mixture Modeling for Network Analysis and Unsupervised Learning on Graphs
Laplacian mixture models identify overlapping regions of influence in
unlabeled graph and network data in a scalable and computationally efficient
way, yielding useful low-dimensional representations. By combining Laplacian
eigenspace and finite mixture modeling methods, they provide probabilistic or
fuzzy dimensionality reductions or domain decompositions for a variety of input
data types, including mixture distributions, feature vectors, and graphs or
networks. Provable optimal recovery using the algorithm is analytically shown
for a nontrivial class of cluster graphs. Heuristic approximations for scalable
high-performance implementations are described and empirically tested.
Connections to PageRank and community detection in network analysis demonstrate
the wide applicability of this approach. The origins of fuzzy spectral methods,
beginning with generalized heat or diffusion equations in physics, are reviewed
and summarized. Comparisons to other dimensionality reduction and clustering
methods for challenging unsupervised machine learning problems are also
discussed.Comment: 13 figures, 35 reference
Recommended from our members
Learning Theory and Approximation
Learning theory studies data structures from samples and aims at understanding unknown function relations behind them. This leads to interesting theoretical problems which can be often attacked with methods from Approximation Theory. This workshop - the second one of this type at the MFO - has concentrated on the following recent topics: Learning of manifolds and the geometry of data; sparsity and dimension reduction; error analysis and algorithmic aspects, including kernel based methods for regression and classification; application of multiscale aspects and of refinement algorithms to learning
Approximating Spectral Clustering via Sampling: a Review
Spectral clustering refers to a family of unsupervised learning algorithms
that compute a spectral embedding of the original data based on the
eigenvectors of a similarity graph. This non-linear transformation of the data
is both the key of these algorithms' success and their Achilles heel: forming a
graph and computing its dominant eigenvectors can indeed be computationally
prohibitive when dealing with more that a few tens of thousands of points. In
this paper, we review the principal research efforts aiming to reduce this
computational cost. We focus on methods that come with a theoretical control on
the clustering performance and incorporate some form of sampling in their
operation. Such methods abound in the machine learning, numerical linear
algebra, and graph signal processing literature and, amongst others, include
Nystr\"om-approximation, landmarks, coarsening, coresets, and compressive
spectral clustering. We present the approximation guarantees available for each
and discuss practical merits and limitations. Surprisingly, despite the breadth
of the literature explored, we conclude that there is still a gap between
theory and practice: the most scalable methods are only intuitively motivated
or loosely controlled, whereas those that come with end-to-end guarantees rely
on strong assumptions or enable a limited gain of computation time
Approximating Spectral Clustering via Sampling: a Review
International audienceSpectral clustering refers to a family of well-known unsupervised learning algorithms. Rather than attempting to cluster points in their native domain, one constructs a (usually sparse) similarity graph and computes the principal eigenvec-tors of its Laplacian. The eigenvectors are then interpreted as transformed points and fed into a k-means clustering algorithm. As a result of this non-linear transformation , it becomes possible to use a simple centroid-based algorithm in order to identify non-convex clusters, something that was otherwise impossible. Unfortunately , what makes spectral clustering so successful is also its Achilles heel: forming a graph and computing its dominant eigenvectors can be computationally prohibitive when dealing with more that a few tens of thousands of points. In this chapter, we review the principal research efforts aiming to reduce this computational cost. We focus on methods that come with a theoretical control on the clustering performance and incorporate some form of sampling in their operation. Such methods abound in the machine learning, numerical linear algebra, and graph signal processing literature and, amongst others, include Nyström-approximation, landmarks, coarsening, coresets, and compressive spectral clustering. We present the approximation guarantees available for each and discuss practical merits and limitations. Surprisingly, despite the breadth of the literature explored, we conclude that there is still a gap between theory and practice: the most scalable methods are only intuitively motivated or loosely controlled, whereas those that come with end-to-end guarantees rely on strong assumptions or enable a limited gain of computation time
Recommended from our members
Learning Theory and Approximation
The main goal of this workshop – the third one of this type at the MFO – has been to blend mathematical results from statistical learning theory and approximation theory to strengthen both disciplines and use synergistic effects to work on current research questions. Learning theory aims at modeling unknown function relations and data structures from samples in an automatic manner. Approximation theory is naturally used for the advancement and closely connected to the further development of learning theory, in particular for the exploration of new useful algorithms, and for the theoretical understanding of existing methods. Conversely, the study of learning theory also gives rise to interesting theoretical problems for approximation theory such as the approximation and sparse representation of functions or the construction of rich kernel reproducing Hilbert spaces on general metric spaces. This workshop has concentrated on the following recent topics: Pitchfork bifurcation of dynamical systems arising from mathematical foundations of cell development; regularized kernel based learning in the Big Data situation; deep learning; convergence rates of learning and online learning algorithms; numerical refinement algorithms to learning; statistical robustness of regularized kernel based learning
A mathematical theory of making hard decisions: model selection and robustness of matrix factorization with binary constraints
One of the first and most fundamental tasks in machine learning is to group observations within a dataset. Given a notion of similarity, finding those instances which are outstandingly similar to each other has manifold applications. Recommender systems and topic analysis in text data are examples which are most intuitive to grasp. The interpretation of the groups, called clusters, is facilitated if the assignment of samples is definite. Especially in high-dimensional data, denoting a degree to which an observation belongs to a specified cluster requires a subsequent processing of the model to filter the most important information. We argue that a good summary of the data provides hard decisions on the following question: how many groups are there, and which observations belong to which clusters? In this work, we contribute to the theoretical and practical background of clustering tasks, addressing one or both aspects of this question. Our overview of state-of-the-art clustering approaches details the challenges of our ambition to provide hard decisions. Based on this overview, we develop new methodologies for two branches of clustering: the one concerns the derivation of nonconvex clusters, known as spectral clustering; the other addresses the identification of biclusters, a set of samples together with similarity defining features, via Boolean matrix factorization. One of the main challenges in both considered settings is the robustness to noise. Assuming that the issue of robustness is controllable by means of theoretical insights, we have a closer look at those aspects of established clustering methods which lack a theoretical foundation. In the scope of Boolean matrix factorization, we propose a versatile framework for the optimization of matrix factorizations subject to binary constraints. Especially Boolean factorizations have been computed by intuitive methods so far, implementing greedy heuristics which lack quality guarantees of obtained solutions. In contrast, we propose to build upon recent advances in nonconvex optimization theory. This enables us to provide convergence guarantees to local optima of a relaxed objective, requiring only approximately binary factor matrices. By means of this new optimization scheme PAL-Tiling, we propose two approaches to automatically determine the number of clusters. The one is based on information theory, employing the minimum description length principle, and the other is a novel statistical approach, controlling the false discovery rate. The flexibility of our framework PAL-Tiling enables the optimization of novel factorization schemes. In a different context, where every data point belongs to a pre-defined class, a characterization of the classes may be obtained by Boolean factorizations. However, there are cases where this traditional factorization scheme is not sufficient. Therefore, we propose the integration of another factor matrix, reflecting class-specific differences within a cluster. Our theoretical considerations are complemented by empirical evaluations, showing how our methods combine theoretical soundness with practical advantages
From spline wavelet to sampling theory on circulant graphs and beyond– conceiving sparsity in graph signal processing
Graph Signal Processing (GSP), as the field concerned with the extension of classical signal processing concepts to the graph domain, is still at the beginning on the path toward providing a generalized theory of signal processing. As such, this thesis aspires to conceive the theory of sparse representations on graphs by traversing the cornerstones of wavelet and sampling theory on graphs.
Beginning with the novel topic of graph spline wavelet theory, we introduce families of spline and e-spline wavelets, and associated filterbanks on circulant graphs, which lever- age an inherent vanishing moment property of circulant graph Laplacian matrices (and their parameterized generalizations), for the reproduction and annihilation of (exponen- tial) polynomial signals. Further, these families are shown to provide a stepping stone to generalized graph wavelet designs with adaptive (annihilation) properties. Circulant graphs, which serve as building blocks, facilitate intuitively equivalent signal processing concepts and operations, such that insights can be leveraged for and extended to more complex scenarios, including arbitrary undirected graphs, time-varying graphs, as well as associated signals with space- and time-variant properties, all the while retaining the focus on inducing sparse representations.
Further, we shift from sparsity-inducing to sparsity-leveraging theory and present a novel sampling and graph coarsening framework for (wavelet-)sparse graph signals, inspired by Finite Rate of Innovation (FRI) theory and directly building upon (graph) spline wavelet theory. At its core, the introduced Graph-FRI-framework states that any K-sparse signal residing on the vertices of a circulant graph can be sampled and perfectly reconstructed from its dimensionality-reduced graph spectral representation of minimum size 2K, while the structure of an associated coarsened graph is simultaneously inferred. Extensions to arbitrary graphs can be enforced via suitable approximation schemes.
Eventually, gained insights are unified in a graph-based image approximation framework which further leverages graph partitioning and re-labelling techniques for a maximally sparse graph wavelet representation.Open Acces
Multiscale Spectral-Domain Parameterization for History Matching in Structured and Unstructured Grid Geometries
Reservoir model calibration to production data, also known as history matching, is an essential tool for the prediction of fluid displacement patterns and related decisions concerning reservoir management and field development. The history matching of high resolution geologic models is, however, known to define an ill-posed inverse problem such that the solution of geologic heterogeneity is always non-unique and potentially unstable. A common approach to improving ill-posedness is to parameterize the estimable geologic model components, imposing a type of regularization that exploits geologic continuity by explicitly or implicitly grouping similar properties while retaining at least the minimum heterogeneity resolution required to reproduce the data. This dissertation develops novel methods of model parameterization within the class of techniques based on a linear transformation.
Three principal research contributions are made in this dissertation. First is the development of an adaptive multiscale history matching formulation in the frequency domain using the discrete cosine parameterization. Geologic model calibration is performed by its sequential refinement to a spatial scale sufficient to match the data. The approach enables improvement in solution non-uniqueness and stability, and further balances model and data resolution as determined by a parameter identifiability metric. Second, a model-independent parameterization based on grid connectivity information is developed as a generalization of the cosine parameterization for applicability to generic grid geometries. The parameterization relates the spatial reservoir parameters to the modal shapes or harmonics of the grid on which they are defined, merging with a Fourier analysis in special cases (i.e., for rectangular grid cells of constant dimensions), and enabling a multiscale calibration of the reservoir model in the spectral domain. Third, a model-dependent parameterization is developed to combine grid connectivity with prior geologic information within a spectral domain representation. The resulting parameterization is capable of reducing geologic models while imposing prior heterogeneity on the calibrated model using the adaptive multiscale workflow.
In addition to methodological developments of the parameterization methods, an important consideration in this dissertation is their applicability to field scale reservoir models with varying levels of prior geologic complexity on par with current industry standards
Graph Inference with Applications to Low-Resource Audio Search and Indexing
The task of query-by-example search is to retrieve, from among a collection of data, the observations most similar to a given query. A common approach to this problem is based on viewing the data as vertices in a graph in which edge weights reflect similarities between observations. Errors arise in this graph-based framework both from errors in measuring these similarities and from approximations required for fast retrieval. In this thesis, we use tools from graph inference to analyze and control the sources of these errors. We establish novel theoretical results related to representation learning and to vertex nomination, and use these results to control the effects of model misspecification, noisy similarity measurement and approximation error on search accuracy. We present a state-of-the-art system for query-by-example audio search in the context of low-resource speech recognition, which also serves as an illustrative example and testbed for applying our theoretical results
- …