8,442 research outputs found

    Compressing networks with super nodes

    Full text link
    Community detection is a commonly used technique for identifying groups in a network based on similarities in connectivity patterns. To facilitate community detection in large networks, we recast the network to be partitioned into a smaller network of 'super nodes', each super node comprising one or more nodes in the original network. To define the seeds of our super nodes, we apply the 'CoreHD' ranking from dismantling and decycling. We test our approach through the analysis of two common methods for community detection: modularity maximization with the Louvain algorithm and maximum likelihood optimization for fitting a stochastic block model. Our results highlight that applying community detection to the compressed network of super nodes is significantly faster while successfully producing partitions that are more aligned with the local network connectivity, more stable across multiple (stochastic) runs within and between community detection algorithms, and overlap well with the results obtained using the full network

    A perceptual hash function to store and retrieve large scale DNA sequences

    Full text link
    This paper proposes a novel approach for storing and retrieving massive DNA sequences.. The method is based on a perceptual hash function, commonly used to determine the similarity between digital images, that we adapted for DNA sequences. Perceptual hash function presented here is based on a Discrete Cosine Transform Sign Only (DCT-SO). Each nucleotide is encoded as a fixed gray level intensity pixel and the hash is calculated from its significant frequency characteristics. This results to a drastic data reduction between the sequence and the perceptual hash. Unlike cryptographic hash functions, perceptual hashes are not affected by "avalanche effect" and thus can be compared. The similarity distance between two hashes is estimated with the Hamming Distance, which is used to retrieve DNA sequences. Experiments that we conducted show that our approach is relevant for storing massive DNA sequences, and retrieving them

    Clustering and Community Detection in Directed Networks: A Survey

    Full text link
    Networks (or graphs) appear as dominant structures in diverse domains, including sociology, biology, neuroscience and computer science. In most of the aforementioned cases graphs are directed - in the sense that there is directionality on the edges, making the semantics of the edges non symmetric. An interesting feature that real networks present is the clustering or community structure property, under which the graph topology is organized into modules commonly called communities or clusters. The essence here is that nodes of the same community are highly similar while on the contrary, nodes across communities present low similarity. Revealing the underlying community structure of directed complex networks has become a crucial and interdisciplinary topic with a plethora of applications. Therefore, naturally there is a recent wealth of research production in the area of mining directed graphs - with clustering being the primary method and tool for community detection and evaluation. The goal of this paper is to offer an in-depth review of the methods presented so far for clustering directed networks along with the relevant necessary methodological background and also related applications. The survey commences by offering a concise review of the fundamental concepts and methodological base on which graph clustering algorithms capitalize on. Then we present the relevant work along two orthogonal classifications. The first one is mostly concerned with the methodological principles of the clustering algorithms, while the second one approaches the methods from the viewpoint regarding the properties of a good cluster in a directed network. Further, we present methods and metrics for evaluating graph clustering results, demonstrate interesting application domains and provide promising future research directions.Comment: 86 pages, 17 figures. Physics Reports Journal (To Appear

    The Myth of Global Science Collaboration - Collaboration patterns in epistemic communities

    Full text link
    Scientific collaboration is often perceived as a joint global process that involves researchers worldwide, regardless of their place of work and residence. Globalization of science, in this respect, implies that collaboration among scientists takes place along the lines of common topics and irrespective of the spatial distances between the collaborators. The networks of collaborators, termed 'epistemic communities', should thus have a space-independent structure. This paper shows that such a notion of globalized scientific collaboration is not supported by empirical data. It introduces a novel approach of analyzing distance-dependent probabilities of collaboration. The results of the analysis of six distinct scientific fields reveal that intra-country collaboration is about 10-50 times more likely to occur than international collaboration. Moreover, strong dependencies exist between collaboration activity (measured in co-authorships) and spatial distance when confined to national borders. However, the fact that distance becomes irrelevant once collaboration is taken to the international scale suggests a globalized science system that is strongly influenced by the gravity of local science clusters. The similarity of the probability functions of the six science fields analyzed suggests a universal mode of spatial governance that is independent from the mode of knowledge creation in science.Comment: 13 pages, 3 figures, 1 tabl

    Comparing biological networks via graph compression

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Comparison of various kinds of biological data is one of the main problems in bioinformatics and systems biology. Data compression methods have been applied to comparison of large sequence data and protein structure data. Since it is still difficult to compare global structures of large biological networks, it is reasonable to try to apply data compression methods to comparison of biological networks. In existing compression methods, the uniqueness of compression results is not guaranteed because there is some ambiguity in selection of overlapping edges.</p> <p>Results</p> <p>This paper proposes novel efficient methods, CompressEdge and CompressVertices, for comparing large biological networks. In the proposed methods, an original network structure is compressed by iteratively contracting identical edges and sets of connected edges. Then, the similarity of two networks is measured by a compression ratio of the concatenated networks. The proposed methods are applied to comparison of metabolic networks of several organisms, <it>H. sapiens, M. musculus, A. thaliana, D. melanogaster, C. elegans, E. coli, S. cerevisiae,</it> and <it>B. subtilis,</it> and are compared with an existing method. These results suggest that our methods can efficiently measure the similarities between metabolic networks.</p> <p>Conclusions</p> <p>Our proposed algorithms, which compress node-labeled networks, are useful for measuring the similarity of large biological networks.</p

    A Compression Based Distance Measure for Texture

    Full text link

    Multi-Target Prediction: A Unifying View on Problems and Methods

    Full text link
    Multi-target prediction (MTP) is concerned with the simultaneous prediction of multiple target variables of diverse type. Due to its enormous application potential, it has developed into an active and rapidly expanding research field that combines several subfields of machine learning, including multivariate regression, multi-label classification, multi-task learning, dyadic prediction, zero-shot learning, network inference, and matrix completion. In this paper, we present a unifying view on MTP problems and methods. First, we formally discuss commonalities and differences between existing MTP problems. To this end, we introduce a general framework that covers the above subfields as special cases. As a second contribution, we provide a structured overview of MTP methods. This is accomplished by identifying a number of key properties, which distinguish such methods and determine their suitability for different types of problems. Finally, we also discuss a few challenges for future research

    How round is a protein? Exploring protein structures for globularity using conformal mapping.

    Get PDF
    We present a new algorithm that automatically computes a measure of the geometric difference between the surface of a protein and a round sphere. The algorithm takes as input two triangulated genus zero surfaces representing the protein and the round sphere, respectively, and constructs a discrete conformal map f between these surfaces. The conformal map is chosen to minimize a symmetric elastic energy E S (f) that measures the distance of f from an isometry. We illustrate our approach on a set of basic sample problems and then on a dataset of diverse protein structures. We show first that E S (f) is able to quantify the roundness of the Platonic solids and that for these surfaces it replicates well traditional measures of roundness such as the sphericity. We then demonstrate that the symmetric elastic energy E S (f) captures both global and local differences between two surfaces, showing that our method identifies the presence of protruding regions in protein structures and quantifies how these regions make the shape of a protein deviate from globularity. Based on these results, we show that E S (f) serves as a probe of the limits of the application of conformal mapping to parametrize protein shapes. We identify limitations of the method and discuss its extension to achieving automatic registration of protein structures based on their surface geometry

    Data Mining

    Get PDF
    corecore