13,377 research outputs found

    Local ciliate communities associated with aquatic macrophytes

    Get PDF
    This study, based within the catchment area of the River Frome, an important chalk stream in the south of England, compared ciliated protozoan communities associated with three species of aquatic macrophyte common to lotic habitats: Ranunculus penicillatus subsp. pseudofluitans, Nasturtium officinale and Sparganium emersum. A total of 77 ciliate species were counted. No species-specific ciliate assemblage was found to be typical of any one plant species. Ciliate abundance between plant species was determined to be significantly different. The ciliate communities from each plant species were unique in that the number of species increased with ciliate abundance. The community associated with R. penicillatus subsp. pseudofluitans showed the highest consistency and species richness whereas S. emersum ciliate communities were unstable. Most notably, N. officinale was associated with low ciliate abundances and an apparent reduction in biofilm formation, discussed herein in relation to the plantā€™s production of the microbial toxin phenethyl isothiocyanate. We propose that the results reflect differences in the quantity and quality of biofilm present on the plants, which could be determined by the different plant morphologies, patterns of plant decay and herbivore defense systems, all of which suppress or promote the various conditions for biofilm growth

    Next Generation Cluster Editing

    Get PDF
    This work aims at improving the quality of structural variant prediction from the mapped reads of a sequenced genome. We suggest a new model based on cluster editing in weighted graphs and introduce a new heuristic algorithm that allows to solve this problem quickly and with a good approximation on the huge graphs that arise from biological datasets

    Laplacian Mixture Modeling for Network Analysis and Unsupervised Learning on Graphs

    Full text link
    Laplacian mixture models identify overlapping regions of influence in unlabeled graph and network data in a scalable and computationally efficient way, yielding useful low-dimensional representations. By combining Laplacian eigenspace and finite mixture modeling methods, they provide probabilistic or fuzzy dimensionality reductions or domain decompositions for a variety of input data types, including mixture distributions, feature vectors, and graphs or networks. Provable optimal recovery using the algorithm is analytically shown for a nontrivial class of cluster graphs. Heuristic approximations for scalable high-performance implementations are described and empirically tested. Connections to PageRank and community detection in network analysis demonstrate the wide applicability of this approach. The origins of fuzzy spectral methods, beginning with generalized heat or diffusion equations in physics, are reviewed and summarized. Comparisons to other dimensionality reduction and clustering methods for challenging unsupervised machine learning problems are also discussed.Comment: 13 figures, 35 reference

    Detection and localization of change points in temporal networks with the aid of stochastic block models

    Get PDF
    A framework based on generalized hierarchical random graphs (GHRGs) for the detection of change points in the structure of temporal networks has recently been developed by Peel and Clauset [1]. We build on this methodology and extend it to also include the versatile stochastic block models (SBMs) as a parametric family for reconstructing the empirical networks. We use five different techniques for change point detection on prototypical temporal networks, including empirical and synthetic ones. We find that none of the considered methods can consistently outperform the others when it comes to detecting and locating the expected change points in empirical temporal networks. With respect to the precision and the recall of the results of the change points, we find that the method based on a degree-corrected SBM has better recall properties than other dedicated methods, especially for sparse networks and smaller sliding time window widths.Comment: This is an author-created, un-copyedited version of an article accepted for publication/published in Journal of Statistical Mechanics: Theory and Experiment. IOP Publishing Ltd is not responsible for any errors or omissions in this version of the manuscript or any version derived from it. The Version of Record is available online at http://dx.doi.org/10.1088/1742-5468/2016/11/11330

    A survey of statistical network models

    Full text link
    Networks are ubiquitous in science and have become a focal point for discussion in everyday life. Formal statistical models for the analysis of network data have emerged as a major topic of interest in diverse areas of study, and most of these involve a form of graphical representation. Probability models on graphs date back to 1959. Along with empirical studies in social psychology and sociology from the 1960s, these early works generated an active network community and a substantial literature in the 1970s. This effort moved into the statistical literature in the late 1970s and 1980s, and the past decade has seen a burgeoning network literature in statistical physics and computer science. The growth of the World Wide Web and the emergence of online networking communities such as Facebook, MySpace, and LinkedIn, and a host of more specialized professional network communities has intensified interest in the study of networks and network data. Our goal in this review is to provide the reader with an entry point to this burgeoning literature. We begin with an overview of the historical development of statistical network modeling and then we introduce a number of examples that have been studied in the network literature. Our subsequent discussion focuses on a number of prominent static and dynamic network models and their interconnections. We emphasize formal model descriptions, and pay special attention to the interpretation of parameters and their estimation. We end with a description of some open problems and challenges for machine learning and statistics.Comment: 96 pages, 14 figures, 333 reference

    A Replica Inference Approach to Unsupervised Multi-Scale Image Segmentation

    Full text link
    We apply a replica inference based Potts model method to unsupervised image segmentation on multiple scales. This approach was inspired by the statistical mechanics problem of "community detection" and its phase diagram. Specifically, the problem is cast as identifying tightly bound clusters ("communities" or "solutes") against a background or "solvent". Within our multiresolution approach, we compute information theory based correlations among multiple solutions ("replicas") of the same graph over a range of resolutions. Significant multiresolution structures are identified by replica correlations as manifest in information theory overlaps. With the aid of these correlations as well as thermodynamic measures, the phase diagram of the corresponding Potts model is analyzed both at zero and finite temperatures. Optimal parameters corresponding to a sensible unsupervised segmentation correspond to the "easy phase" of the Potts model. Our algorithm is fast and shown to be at least as accurate as the best algorithms to date and to be especially suited to the detection of camouflaged images.Comment: 26 pages, 22 figure

    Properties of Healthcare Teaming Networks as a Function of Network Construction Algorithms

    Full text link
    Network models of healthcare systems can be used to examine how providers collaborate, communicate, refer patients to each other. Most healthcare service network models have been constructed from patient claims data, using billing claims to link patients with providers. The data sets can be quite large, making standard methods for network construction computationally challenging and thus requiring the use of alternate construction algorithms. While these alternate methods have seen increasing use in generating healthcare networks, there is little to no literature comparing the differences in the structural properties of the generated networks. To address this issue, we compared the properties of healthcare networks constructed using different algorithms and the 2013 Medicare Part B outpatient claims data. Three different algorithms were compared: binning, sliding frame, and trace-route. Unipartite networks linking either providers or healthcare organizations by shared patients were built using each method. We found that each algorithm produced networks with substantially different topological properties. Provider networks adhered to a power law, and organization networks to a power law with exponential cutoff. Censoring networks to exclude edges with less than 11 shared patients, a common de-identification practice for healthcare network data, markedly reduced edge numbers and greatly altered measures of vertex prominence such as the betweenness centrality. We identified patterns in the distance patients travel between network providers, and most strikingly between providers in the Northeast United States and Florida. We conclude that the choice of network construction algorithm is critical for healthcare network analysis, and discuss the implications for selecting the algorithm best suited to the type of analysis to be performed.Comment: With links to comprehensive, high resolution figures and networks via figshare.co

    Local pre-processing for node classification in networks : application in protein-protein interaction

    Get PDF
    Network modelling provides an increasingly popular conceptualisation in a wide range of domains, including the analysis of protein structure. Typical approaches to analysis model parameter values at nodes within the network. The spherical locality around a node provides a microenvironment that can be used to characterise an area of a network rather than a particular point within it. Microenvironments that centre on the nodes in a protein chain can be used to quantify parameters that are related to protein functionality. They also permit particular patterns of such parameters in node-centred microenvironments to be used to locate sites of particular interest. This paper evaluates an approach to index generation that seeks to rapidly construct microenvironment data. The results show that index generation performs best when the radius of microenvironments matches the granularity of the index. Results are presented to show that such microenvironments improve the utility of protein chain parameters in classifying the structural characteristics of nodes using both support vector machines and neural networks

    Overlapping modularity at the critical point of k-clique percolation

    Get PDF
    One of the most remarkable social phenomena is the formation of communities in social networks corresponding to families, friendship circles, work teams, etc. Since people usually belong to several different communities at the same time, the induced overlaps result in an extremely complicated web of the communities themselves. Thus, uncovering the intricate community structure of social networks is a non-trivial task with great potential for practical applications, gaining a notable interest in the recent years. The Clique Percolation Method (CPM) is one of the earliest overlapping community finding methods, which was already used in the analysis of several different social networks. In this approach the communities correspond to k-clique percolation clusters, and the general heuristic for setting the parameters of the method is to tune the system just below the critical point of k-clique percolation. However, this rule is based on simple physical principles and its validity was never subject to quantitative analysis. Here we examine the quality of the partitioning in the vicinity of the critical point using recently introduced overlapping modularity measures. According to our results on real social- and other networks, the overlapping modularities show a maximum close to the critical point, justifying the original criteria for the optimal parameter settings.Comment: 20 pages, 6 figure
    • ā€¦
    corecore