7,296 research outputs found

    Laplacian Mixture Modeling for Network Analysis and Unsupervised Learning on Graphs

    Full text link
    Laplacian mixture models identify overlapping regions of influence in unlabeled graph and network data in a scalable and computationally efficient way, yielding useful low-dimensional representations. By combining Laplacian eigenspace and finite mixture modeling methods, they provide probabilistic or fuzzy dimensionality reductions or domain decompositions for a variety of input data types, including mixture distributions, feature vectors, and graphs or networks. Provable optimal recovery using the algorithm is analytically shown for a nontrivial class of cluster graphs. Heuristic approximations for scalable high-performance implementations are described and empirically tested. Connections to PageRank and community detection in network analysis demonstrate the wide applicability of this approach. The origins of fuzzy spectral methods, beginning with generalized heat or diffusion equations in physics, are reviewed and summarized. Comparisons to other dimensionality reduction and clustering methods for challenging unsupervised machine learning problems are also discussed.Comment: 13 figures, 35 reference

    Reconstruction subgrid models for nonpremixed combustion

    Get PDF
    Large-eddy simulation of combustion problems involves highly nonlinear terms that, when filtered, result in a contribution from subgrid fluctuations of scalars, Z, to the dynamics of the filtered value. This subgrid contribution requires modeling. Reconstruction models try to recover as much information as possible from the resolved field Z, based on a deconvolution procedure to obtain an intermediate field ZM. The approximate reconstruction using moments (ARM) method combines approximate reconstruction, a purely mathematical procedure, with additional physics-based information required to match specific scalar moments, in the simplest case, the Reynolds-averaged value of the subgrid variance. Here, results from the analysis of the ARM model in the case of a spatially evolving turbulent plane jet are presented. A priori and a posteriori evaluations using data from direct numerical simulation are carried out. The nonlinearities considered are representative of reacting flows: power functions, the dependence of the density on the mixture fraction (relevant for conserved scalar approaches) and the Arrhenius nonlinearity (very localized in Z space). Comparisons are made against the more popular beta probability density function (PDF) approach in the a priori analysis, trying to define ranges of validity for each approach. The results show that the ARM model is able to capture the subgrid part of the variance accurately over a wide range of filter sizes and performs well for the different nonlinearities, giving uniformly better predictions than the beta PDF for the polynomial case. In the case of the density and Arrhenius nonlinearities, the relative performance of the ARM and traditional PDF approaches depends on the size of the subgrid variance with respect to a characteristic scale of each function. Furthermore, the sources of error associated with the ARM method are considered and analytical bounds on that error are obtained

    Revealing evolutionary constraints on proteins through sequence analysis

    Full text link
    Statistical analysis of alignments of large numbers of protein sequences has revealed "sectors" of collectively coevolving amino acids in several protein families. Here, we show that selection acting on any functional property of a protein, represented by an additive trait, can give rise to such a sector. As an illustration of a selected trait, we consider the elastic energy of an important conformational change within an elastic network model, and we show that selection acting on this energy leads to correlations among residues. For this concrete example and more generally, we demonstrate that the main signature of functional sectors lies in the small-eigenvalue modes of the covariance matrix of the selected sequences. However, secondary signatures of these functional sectors also exist in the extensively-studied large-eigenvalue modes. Our simple, general model leads us to propose a principled method to identify functional sectors, along with the magnitudes of mutational effects, from sequence data. We further demonstrate the robustness of these functional sectors to various forms of selection, and the robustness of our approach to the identification of multiple selected traits.Comment: 37 pages, 28 figure

    A new method to measure evolution of the galaxy luminosity function

    Full text link
    We present a new efficient technique for measuring evolution of the galaxy luminosity function. The method reconstructs the evolution over the luminosity-redshift plane using any combination of three input dataset types: 1) number counts, 2) galaxy redshifts, 3) integrated background flux measurements. The evolution is reconstructed in adaptively sized regions of the plane according to the input data as determined by a Bayesian formalism. We demonstrate the performance of the method using a range of different synthetic input datasets. We also make predictions of the accuracy with which forthcoming surveys conducted with SCUBA2 and the Herschel Space Satellite will be able to measure evolution of the sub-millimetre luminosity function using the method.Comment: MNRAS in press. 14 pages, 7 figures

    An exploratory data analysis method to reveal modular latent structures in high-throughput data

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Modular structures are ubiquitous across various types of biological networks. The study of network modularity can help reveal regulatory mechanisms in systems biology, evolutionary biology and developmental biology. Identifying putative modular latent structures from high-throughput data using exploratory analysis can help better interpret the data and generate new hypotheses. Unsupervised learning methods designed for global dimension reduction or clustering fall short of identifying modules with factors acting in linear combinations.</p> <p>Results</p> <p>We present an exploratory data analysis method named MLSA (Modular Latent Structure Analysis) to estimate modular latent structures, which can find co-regulative modules that involve non-coexpressive genes.</p> <p>Conclusions</p> <p>Through simulations and real-data analyses, we show that the method can recover modular latent structures effectively. In addition, the method also performed very well on data generated from sparse global latent factor models. The R code is available at <url>http://userwww.service.emory.edu/~tyu8/MLSA/</url>.</p
    • 

    corecore