397 research outputs found

    NetLSD: Hearing the Shape of a Graph

    Full text link
    Comparison among graphs is ubiquitous in graph analytics. However, it is a hard task in terms of the expressiveness of the employed similarity measure and the efficiency of its computation. Ideally, graph comparison should be invariant to the order of nodes and the sizes of compared graphs, adaptive to the scale of graph patterns, and scalable. Unfortunately, these properties have not been addressed together. Graph comparisons still rely on direct approaches, graph kernels, or representation-based methods, which are all inefficient and impractical for large graph collections. In this paper, we propose the Network Laplacian Spectral Descriptor (NetLSD): the first, to our knowledge, permutation- and size-invariant, scale-adaptive, and efficiently computable graph representation method that allows for straightforward comparisons of large graphs. NetLSD extracts a compact signature that inherits the formal properties of the Laplacian spectrum, specifically its heat or wave kernel; thus, it hears the shape of a graph. Our evaluation on a variety of real-world graphs demonstrates that it outperforms previous works in both expressiveness and efficiency.Comment: KDD '18: The 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, August 19--23, 2018, London, United Kingdo

    Protein-Protein Docking Using Long Range Nuclear Magnetic Resonance Constraints

    Get PDF
    One of the main methods for experimentally determining protein structure is nuclear magnetic resonance (NMR) spectroscopy. The advantage of using NMR compared to other methods is that the molecule may be studied in its natural state and environment. However, NMR is limited in its facility to analyze multi-domain molecules because of the scarcity of inter-atomic NMR constraints between the domains. In those cases it might be possible to dock the domains based on long range NMR constraints that are related to the molecule's overall structure. We present two computational methods for rigid docking based on long range NMR constraints. The first docking method is based on the overall alignment tensor of the complex. The docking algorithm is based on the minimization of the difference between the predicted and experimental alignment tensor. In order to efficiently dock the complex we introduce a new, computationally efficient method called PATI for predicting the molecular alignment tensor based on the three-dimensional structure of the molecule. The increase in speed compared to the currently best-known method (PALES) is achieved by re-expressing the problem as one of numerical integration, rather than a simple uniform sampling (as in the PALES method), and by using a convex hull rather than a detailed representation of the surface of a molecule. Using PATI, we derive a method called PATIDOCK for efficiently docking a two-domain complex based solely on the novel idea of using the difference between the experimental alignment tensor and the predicted alignment tensor computed by PATI. We show that the alignment tensor fundamentally contains enough information to accurately dock a two-domain complex, and that we can very quickly dock the two domains by pre-computing the right set of data. A second new docking method is based on a similar concept but using the rotational diffusion tensor. We derive a minimization algorithm for this docking method by separating the problem into two simpler minimization problems and approximating our energy function by a quadratic equation. These methods provide two new efficient procedures for protein docking computations

    The Physics of Communicability in Complex Networks

    Full text link
    A fundamental problem in the study of complex networks is to provide quantitative measures of correlation and information flow between different parts of a system. To this end, several notions of communicability have been introduced and applied to a wide variety of real-world networks in recent years. Several such communicability functions are reviewed in this paper. It is emphasized that communication and correlation in networks can take place through many more routes than the shortest paths, a fact that may not have been sufficiently appreciated in previously proposed correlation measures. In contrast to these, the communicability measures reviewed in this paper are defined by taking into account all possible routes between two nodes, assigning smaller weights to longer ones. This point of view naturally leads to the definition of communicability in terms of matrix functions, such as the exponential, resolvent, and hyperbolic functions, in which the matrix argument is either the adjacency matrix or the graph Laplacian associated with the network. Considerable insight on communicability can be gained by modeling a network as a system of oscillators and deriving physical interpretations, both classical and quantum-mechanical, of various communicability functions. Applications of communicability measures to the analysis of complex systems are illustrated on a variety of biological, physical and social networks. The last part of the paper is devoted to a review of the notion of locality in complex networks and to computational aspects that by exploiting sparsity can greatly reduce the computational efforts for the calculation of communicability functions for large networks.Comment: Review Article. 90 pages, 14 figures. Contents: Introduction; Communicability in Networks; Physical Analogies; Comparing Communicability Functions; Communicability and the Analysis of Networks; Communicability and Localization in Complex Networks; Computability of Communicability Functions; Conclusions and Prespective

    Just SLaQ When You Approximate: Accurate Spectral Distances for Web-Scale Graphs

    Full text link
    Graph comparison is a fundamental operation in data mining and information retrieval. Due to the combinatorial nature of graphs, it is hard to balance the expressiveness of the similarity measure and its scalability. Spectral analysis provides quintessential tools for studying the multi-scale structure of graphs and is a well-suited foundation for reasoning about differences between graphs. However, computing full spectrum of large graphs is computationally prohibitive; thus, spectral graph comparison methods often rely on rough approximation techniques with weak error guarantees. In this work, we propose SLaQ, an efficient and effective approximation technique for computing spectral distances between graphs with billions of nodes and edges. We derive the corresponding error bounds and demonstrate that accurate computation is possible in time linear in the number of graph edges. In a thorough experimental evaluation, we show that SLaQ outperforms existing methods, oftentimes by several orders of magnitude in approximation accuracy, and maintains comparable performance, allowing to compare million-scale graphs in a matter of minutes on a single machine.Comment: To appear at TheWebConf (WWW) 202

    Some studies on protein structure alignment algorithms

    Get PDF
    The alignment of two protein structures is a fundamental problem in structural bioinformatics.Their structural similarity carries with it the connotation of similar functional behavior that couldbe exploited in various applications. A plethora of algorithms, including one by us, is a testamentto the importance of the problem. In this thesis, we propose a novel approach to measure theeectiveness of a sample of four such algorithms, DALI, TM-align, CE and EDAlignsse, for de-tecting structural similarities among proteins. The underlying premise is that structural proximityshould translate into spatial proximity. To verify this, we carried out extensive experiments withve dierent datasets, each consisting of proteins from two to six dierent families.In further addition to our work, we have focused on the area of computational methods foraligning multiple protein structures. This problem is known for its np-complete nature. Therefore,there are many ways to come up with a solution which can be better than the existing ones or atleast as good as them. Such a solution is presented here in this thesis. We have used a heuristicalgorithm which is the Progressive Multiple Alignment approach, to have the multiple sequencealignment. We used the root mean square deviation (RMSD) as a measure of alignment quality andreported this measure for a large and varied number of alignments. We also compared the executiontimes of our algorithm with the well-known algorithm MUSTANG for all the tested alignments

    Multidimensional Scaling Reveals the Main Evolutionary Pathways of Class A G-Protein-Coupled Receptors

    Get PDF
    Class A G-protein-coupled receptors (GPCRs) constitute the largest family of transmembrane receptors in the human genome. Understanding the mechanisms which drove the evolution of such a large family would help understand the specificity of each GPCR sub-family with applications to drug design. To gain evolutionary information on class A GPCRs, we explored their sequence space by metric multidimensional scaling analysis (MDS). Three-dimensional mapping of human sequences shows a non-uniform distribution of GPCRs, organized in clusters that lay along four privileged directions. To interpret these directions, we projected supplementary sequences from different species onto the human space used as a reference. With this technique, we can easily monitor the evolutionary drift of several GPCR sub-families from cnidarians to humans. Results support a model of radiative evolution of class A GPCRs from a central node formed by peptide receptors. The privileged directions obtained from the MDS analysis are interpretable in terms of three main evolutionary pathways related to specific sequence determinants. The first pathway was initiated by a deletion in transmembrane helix 2 (TM2) and led to three sub-families by divergent evolution. The second pathway corresponds to the differentiation of the amine receptors. The third pathway corresponds to parallel evolution of several sub-families in relation with a covarion process involving proline residues in TM2 and TM5. As exemplified with GPCRs, the MDS projection technique is an important tool to compare orthologous sequence sets and to help decipher the mutational events that drove the evolution of protein families
    corecore