2,082 research outputs found

    From random walks to distances on unweighted graphs

    Full text link
    Large unweighted directed graphs are commonly used to capture relations between entities. A fundamental problem in the analysis of such networks is to properly define the similarity or dissimilarity between any two vertices. Despite the significance of this problem, statistical characterization of the proposed metrics has been limited. We introduce and develop a class of techniques for analyzing random walks on graphs using stochastic calculus. Using these techniques we generalize results on the degeneracy of hitting times and analyze a metric based on the Laplace transformed hitting time (LTHT). The metric serves as a natural, provably well-behaved alternative to the expected hitting time. We establish a general correspondence between hitting times of the Brownian motion and analogous hitting times on the graph. We show that the LTHT is consistent with respect to the underlying metric of a geometric graph, preserves clustering tendency, and remains robust against random addition of non-geometric edges. Tests on simulated and real-world data show that the LTHT matches theoretical predictions and outperforms alternatives.Comment: To appear in NIPS 201

    Far-Field Compression for Fast Kernel Summation Methods in High Dimensions

    Full text link
    We consider fast kernel summations in high dimensions: given a large set of points in dd dimensions (with d≫3d \gg 3) and a pair-potential function (the {\em kernel} function), we compute a weighted sum of all pairwise kernel interactions for each point in the set. Direct summation is equivalent to a (dense) matrix-vector multiplication and scales quadratically with the number of points. Fast kernel summation algorithms reduce this cost to log-linear or linear complexity. Treecodes and Fast Multipole Methods (FMMs) deliver tremendous speedups by constructing approximate representations of interactions of points that are far from each other. In algebraic terms, these representations correspond to low-rank approximations of blocks of the overall interaction matrix. Existing approaches require an excessive number of kernel evaluations with increasing dd and number of points in the dataset. To address this issue, we use a randomized algebraic approach in which we first sample the rows of a block and then construct its approximate, low-rank interpolative decomposition. We examine the feasibility of this approach theoretically and experimentally. We provide a new theoretical result showing a tighter bound on the reconstruction error from uniformly sampling rows than the existing state-of-the-art. We demonstrate that our sampling approach is competitive with existing (but prohibitively expensive) methods from the literature. We also construct kernel matrices for the Laplacian, Gaussian, and polynomial kernels -- all commonly used in physics and data analysis. We explore the numerical properties of blocks of these matrices, and show that they are amenable to our approach. Depending on the data set, our randomized algorithm can successfully compute low rank approximations in high dimensions. We report results for data sets with ambient dimensions from four to 1,000.Comment: 43 pages, 21 figure

    kk-MLE: A fast algorithm for learning statistical mixture models

    Full text link
    We describe kk-MLE, a fast and efficient local search algorithm for learning finite statistical mixtures of exponential families such as Gaussian mixture models. Mixture models are traditionally learned using the expectation-maximization (EM) soft clustering technique that monotonically increases the incomplete (expected complete) likelihood. Given prescribed mixture weights, the hard clustering kk-MLE algorithm iteratively assigns data to the most likely weighted component and update the component models using Maximum Likelihood Estimators (MLEs). Using the duality between exponential families and Bregman divergences, we prove that the local convergence of the complete likelihood of kk-MLE follows directly from the convergence of a dual additively weighted Bregman hard clustering. The inner loop of kk-MLE can be implemented using any kk-means heuristic like the celebrated Lloyd's batched or Hartigan's greedy swap updates. We then show how to update the mixture weights by minimizing a cross-entropy criterion that implies to update weights by taking the relative proportion of cluster points, and reiterate the mixture parameter update and mixture weight update processes until convergence. Hard EM is interpreted as a special case of kk-MLE when both the component update and the weight update are performed successively in the inner loop. To initialize kk-MLE, we propose kk-MLE++, a careful initialization of kk-MLE guaranteeing probabilistically a global bound on the best possible complete likelihood.Comment: 31 pages, Extend preliminary paper presented at IEEE ICASSP 201

    Laplacian Mixture Modeling for Network Analysis and Unsupervised Learning on Graphs

    Full text link
    Laplacian mixture models identify overlapping regions of influence in unlabeled graph and network data in a scalable and computationally efficient way, yielding useful low-dimensional representations. By combining Laplacian eigenspace and finite mixture modeling methods, they provide probabilistic or fuzzy dimensionality reductions or domain decompositions for a variety of input data types, including mixture distributions, feature vectors, and graphs or networks. Provable optimal recovery using the algorithm is analytically shown for a nontrivial class of cluster graphs. Heuristic approximations for scalable high-performance implementations are described and empirically tested. Connections to PageRank and community detection in network analysis demonstrate the wide applicability of this approach. The origins of fuzzy spectral methods, beginning with generalized heat or diffusion equations in physics, are reviewed and summarized. Comparisons to other dimensionality reduction and clustering methods for challenging unsupervised machine learning problems are also discussed.Comment: 13 figures, 35 reference

    Spatio-Temporal Cluster Detection and Local Moran Statistics of Point Processes

    Get PDF
    Moran\u27s index is a statistic that measures spatial dependence, quantifying the degree of dispersion or clustering of point processes and events in some location/area. Recognizing that a single Moran\u27s index may not give a sufficient summary of the spatial autocorrelation measure, a local indicator of spatial association (LISA) has gained popularity. Accordingly, we propose extending LISAs to time after partitioning the area and computing a Moran-type statistic for each subarea. Patterns between the local neighbors are unveiled that would not otherwise be apparent. We consider the measures of Moran statistics while incorporating a time factor under simulated multilevel Palm distribution, a generalized Poisson phenomenon where the clusters and dependence among the subareas are captured by the rate of increase of the process over time. Event propagation is built under spatial nested sequences over time. The Palm parameters, Moran statistics and convergence criteria are calculated from an explicit algorithm in a Markov chain Monte Carlo simulation setting and further analyzed in two real datasets

    Optical signatures of quantum delocalization over extended domains in photosynthetic membranes

    Full text link
    The prospect of coherent dynamics and excitonic delocalization across several light-harvesting structures in photosynthetic membranes is of considerable interest, but challenging to explore experimentally. Here we demonstrate theoretically that the excitonic delocalization across extended domains involving several light-harvesting complexes can lead to unambiguous signatures in the optical response, specifically, linear absorption spectra. We characterize, under experimentally established conditions of molecular assembly and protein-induced inhomogeneities, the optical absorption in these arrays from polarized and unpolarized excitation, and demonstrate that it can be used as a diagnostic tool to determine the coherent coupling among iso-energetic light-harvesting structures. The knowledge of these couplings would then provide further insight into the dynamical properties of transfer, such as facilitating the accurate determination of F\"orster rates.Comment: 4 figures and Supplementary information with 7 figures. To appear in Journal of physical chemistry A, 201

    Exact algorithms for L1L^1-TV regularization of real-valued or circle-valued signals

    Full text link
    We consider L1L^1-TV regularization of univariate signals with values on the real line or on the unit circle. While the real data space leads to a convex optimization problem, the problem is non-convex for circle-valued data. In this paper, we derive exact algorithms for both data spaces. A key ingredient is the reduction of the infinite search spaces to a finite set of configurations, which can be scanned by the Viterbi algorithm. To reduce the computational complexity of the involved tabulations, we extend the technique of distance transforms to non-uniform grids and to the circular data space. In total, the proposed algorithms have complexity O(KN)\mathscr{O}(KN) where NN is the length of the signal and KK is the number of different values in the data set. In particular, the complexity is O(N)\mathscr{O}(N) for quantized data. It is the first exact algorithm for TV regularization with circle-valued data, and it is competitive with the state-of-the-art methods for scalar data, assuming that the latter are quantized

    Inverse Density as an Inverse Problem: The Fredholm Equation Approach

    Full text link
    In this paper we address the problem of estimating the ratio qp\frac{q}{p} where pp is a density function and qq is another density, or, more generally an arbitrary function. Knowing or approximating this ratio is needed in various problems of inference and integration, in particular, when one needs to average a function with respect to one probability distribution, given a sample from another. It is often referred as {\it importance sampling} in statistical inference and is also closely related to the problem of {\it covariate shift} in transfer learning as well as to various MCMC methods. It may also be useful for separating the underlying geometry of a space, say a manifold, from the density function defined on it. Our approach is based on reformulating the problem of estimating qp\frac{q}{p} as an inverse problem in terms of an integral operator corresponding to a kernel, and thus reducing it to an integral equation, known as the Fredholm problem of the first kind. This formulation, combined with the techniques of regularization and kernel methods, leads to a principled kernel-based framework for constructing algorithms and for analyzing them theoretically. The resulting family of algorithms (FIRE, for Fredholm Inverse Regularized Estimator) is flexible, simple and easy to implement. We provide detailed theoretical analysis including concentration bounds and convergence rates for the Gaussian kernel in the case of densities defined on Rd\R^d, compact domains in Rd\R^d and smooth dd-dimensional sub-manifolds of the Euclidean space. We also show experimental results including applications to classification and semi-supervised learning within the covariate shift framework and demonstrate some encouraging experimental comparisons. We also show how the parameters of our algorithms can be chosen in a completely unsupervised manner.Comment: Fixing a few typos in last versio
    • …
    corecore