Search CORE

8,880 research outputs found

Laplacian Mixture Modeling for Network Analysis and Unsupervised Learning on Graphs

Author: Korenblum Daniel
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 01/01/2018
Field of study

Laplacian mixture models identify overlapping regions of influence in unlabeled graph and network data in a scalable and computationally efficient way, yielding useful low-dimensional representations. By combining Laplacian eigenspace and finite mixture modeling methods, they provide probabilistic or fuzzy dimensionality reductions or domain decompositions for a variety of input data types, including mixture distributions, feature vectors, and graphs or networks. Provable optimal recovery using the algorithm is analytically shown for a nontrivial class of cluster graphs. Heuristic approximations for scalable high-performance implementations are described and empirically tested. Connections to PageRank and community detection in network analysis demonstrate the wide applicability of this approach. The origins of fuzzy spectral methods, beginning with generalized heat or diffusion equations in physics, are reviewed and summarized. Comparisons to other dimensionality reduction and clustering methods for challenging unsupervised machine learning problems are also discussed.Comment: 13 figures, 35 reference

arXiv.org e-Print Archive

Directory of Open Access Journals

A Likelihood-Free Inference Framework for Population Genetic Data using Exchangeable Neural Networks

Author: Chan Jeffrey
Jenkins Paul A.
Mathieson Sara
Perrone Valerio
Song Yun S.
Spence Jeffrey P.
Publication venue
Publication date: 01/01/2018
Field of study

An explosion of high-throughput DNA sequencing in the past decade has led to a surge of interest in population-scale inference with whole-genome data. Recent work in population genetics has centered on designing inference methods for relatively simple model classes, and few scalable general-purpose inference techniques exist for more realistic, complex models. To achieve this, two inferential challenges need to be addressed: (1) population data are exchangeable, calling for methods that efficiently exploit the symmetries of the data, and (2) computing likelihoods is intractable as it requires integrating over a set of correlated, extremely high-dimensional latent variables. These challenges are traditionally tackled by likelihood-free methods that use scientific simulators to generate datasets and reduce them to hand-designed, permutation-invariant summary statistics, often leading to inaccurate inference. In this work, we develop an exchangeable neural network that performs summary statistic-free, likelihood-free inference. Our framework can be applied in a black-box fashion across a variety of simulation-based tasks, both within and outside biology. We demonstrate the power of our approach on the recombination hotspot testing problem, outperforming the state-of-the-art.Comment: 9 pages, 8 figure

arXiv.org e-Print Archive

Crossref

Warwick Research Archives Portal Repository

Haverford College: Haverford Scholarship

Approximate message passing for nonconvex sparse regularization with stability and asymptotic analysis

Author: Sakata Ayaka
Xu Yingying
Publication venue: 'IOP Publishing'
Publication date: 18/02/2018
Field of study

We analyse a linear regression problem with nonconvex regularization called smoothly clipped absolute deviation (SCAD) under an overcomplete Gaussian basis for Gaussian random data. We propose an approximate message passing (AMP) algorithm considering nonconvex regularization, namely SCAD-AMP, and analytically show that the stability condition corresponds to the de Almeida--Thouless condition in spin glass literature. Through asymptotic analysis, we show the correspondence between the density evolution of SCAD-AMP and the replica symmetric solution. Numerical experiments confirm that for a sufficiently large system size, SCAD-AMP achieves the optimal performance predicted by the replica method. Through replica analysis, a phase transition between replica symmetric (RS) and replica symmetry breaking (RSB) region is found in the parameter space of SCAD. The appearance of the RS region for a nonconvex penalty is a significant advantage that indicates the region of smooth landscape of the optimization problem. Furthermore, we analytically show that the statistical representation performance of the SCAD penalty is better than that of L1-based methods, and the minimum representation error under RS assumption is obtained at the edge of the RS/RSB phase. The correspondence between the convergence of the existing coordinate descent algorithm and RS/RSB transition is also indicated

arXiv.org e-Print Archive

Aaltodoc Publication Archive

Network Density of States

Author: Benson Austin R.
Bindel David
Dong Kun
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 23/05/2019
Field of study

Spectral analysis connects graph structure to the eigenvalues and eigenvectors of associated matrices. Much of spectral graph theory descends directly from spectral geometry, the study of differentiable manifolds through the spectra of associated differential operators. But the translation from spectral geometry to spectral graph theory has largely focused on results involving only a few extreme eigenvalues and their associated eigenvalues. Unlike in geometry, the study of graphs through the overall distribution of eigenvalues - the spectral density - is largely limited to simple random graph models. The interior of the spectrum of real-world graphs remains largely unexplored, difficult to compute and to interpret. In this paper, we delve into the heart of spectral densities of real-world graphs. We borrow tools developed in condensed matter physics, and add novel adaptations to handle the spectral signatures of common graph motifs. The resulting methods are highly efficient, as we illustrate by computing spectral densities for graphs with over a billion edges on a single compute node. Beyond providing visually compelling fingerprints of graphs, we show how the estimation of spectral densities facilitates the computation of many common centrality measures, and use spectral densities to estimate meaningful information about graph structure that cannot be inferred from the extremal eigenpairs alone.Comment: 10 pages, 7 figure

arXiv.org e-Print Archive

Crossref

A probabilistic numerical method for optimal multiple switching problem and application to investments in electricity generation

Author: Aïd René
Campi Luciano
Langrené Nicolas
Pham Huyên
Publication venue: 'Society for Industrial & Applied Mathematics (SIAM)'
Publication date: 30/10/2012
Field of study

In this paper, we present a probabilistic numerical algorithm combining dynamic programming, Monte Carlo simulations and local basis regressions to solve non-stationary optimal multiple switching problems in infinite horizon. We provide the rate of convergence of the method in terms of the time step used to discretize the problem, of the size of the local hypercubes involved in the regressions, and of the truncating time horizon. To make the method viable for problems in high dimension and long time horizon, we extend a memory reduction method to the general Euler scheme, so that, when performing the numerical resolution, the storage of the Monte Carlo simulation paths is not needed. Then, we apply this algorithm to a model of optimal investment in power plants. This model takes into account electricity demand, cointegrated fuel prices, carbon price and random outages of power plants. It computes the optimal level of investment in each generation technology, considered as a whole, w.r.t. the electricity spot price. This electricity price is itself built according to a new extended structural model. In particular, it is a function of several factors, among which the installed capacities. The evolution of the optimal generation mix is illustrated on a realistic numerical problem in dimension eight, i.e. with two different technologies and six random factors

arXiv.org e-Print Archive

HAL-Paris 13

Hal-Diderot

HAL-Polytechnique

Scalable Kernel Methods via Doubly Stochastic Gradients

Author: Balcan Maria-Florina
Dai Bo
He Niao
Liang Yingyu
Raj Anant
Song Le
Xie Bo
Publication venue
Publication date: 10/09/2015
Field of study

The general perception is that kernel methods are not scalable, and neural nets are the methods of choice for nonlinear learning problems. Or have we simply not tried hard enough for kernel methods? Here we propose an approach that scales up kernel methods using a novel concept called "doubly stochastic functional gradients". Our approach relies on the fact that many kernel methods can be expressed as convex optimization problems, and we solve the problems by making two unbiased stochastic approximations to the functional gradient, one using random training points and another using random functions associated with the kernel, and then descending using this noisy functional gradient. We show that a function produced by this procedure after

t

iterations converges to the optimal function in the reproducing kernel Hilbert space in rate

O(1/t)

, and achieves a generalization performance of

O(1/\sqrt{t})

. This doubly stochasticity also allows us to avoid keeping the support vectors and to implement the algorithm in a small memory footprint, which is linear in number of iterations and independent of data dimension. Our approach can readily scale kernel methods up to the regimes which are dominated by neural nets. We show that our method can achieve competitive performance to neural nets in datasets such as 8 million handwritten digits from MNIST, 2.3 million energy materials from MolecularSpace, and 1 million photos from ImageNet.Comment: 32 pages, 22 figure

arXiv.org e-Print Archive

CiteSeerX

Efficient Deformable Shape Correspondence via Kernel Matching

Author: Boyarski Amit
Bronstein Alex
Bronstein Michael
Cremers Daniel
Kimmel Ron
Litany Or
Lähner Zorah
Remez Tal
Rodolà Emanuele
Slossberg Ron
Vestner Matthias
Publication venue
Publication date: 15/09/2017
Field of study

We present a method to match three dimensional shapes under non-isometric deformations, topology changes and partiality. We formulate the problem as matching between a set of pair-wise and point-wise descriptors, imposing a continuity prior on the mapping, and propose a projected descent optimization procedure inspired by difference of convex functions (DC) programming. Surprisingly, in spite of the highly non-convex nature of the resulting quadratic assignment problem, our method converges to a semantically meaningful and continuous mapping in most of our experiments, and scales well. We provide preliminary theoretical analysis and several interpretations of the method.Comment: Accepted for oral presentation at 3DV 2017, including supplementary materia

arXiv.org e-Print Archive

Crossref

Archivio della ricerca- Università di Roma La Sapienza