7 research outputs found

    Finding Near-Optimal Independent Sets at Scale

    Full text link
    The independent set problem is NP-hard and particularly difficult to solve in large sparse graphs. In this work, we develop an advanced evolutionary algorithm, which incorporates kernelization techniques to compute large independent sets in huge sparse networks. A recent exact algorithm has shown that large networks can be solved exactly by employing a branch-and-reduce technique that recursively kernelizes the graph and performs branching. However, one major drawback of their algorithm is that, for huge graphs, branching still can take exponential time. To avoid this problem, we recursively choose vertices that are likely to be in a large independent set (using an evolutionary approach), then further kernelize the graph. We show that identifying and removing vertices likely to be in large independent sets opens up the reduction space---which not only speeds up the computation of large independent sets drastically, but also enables us to compute high-quality independent sets on much larger instances than previously reported in the literature.Comment: 17 pages, 1 figure, 8 tables. arXiv admin note: text overlap with arXiv:1502.0168

    Scalable kernelization for the maximum independent set problem

    Get PDF

    High Performance Large Graph Analytics by Enhancing Locality

    Get PDF
    Graphs are widely used in a variety of domains for representing entities and their relationship to each other. Graph analytics helps to understand, detect, extract and visualize insightful relationships between different entities. Graph analytics has a wide range of applications in various domains including computational biology, commerce, intelligence, health care and transportation. The breadth of problems that require large graph analytics is growing rapidly resulting in a need for fast and efficient graph processing. One of the major challenges in graph processing is poor locality of reference. Locality of reference refers to the phenomenon of frequently accessing the same memory location or adjacent memory locations. Applications with poor data locality reduce the effectiveness of the cache memory. They result in large number of cache misses, requiring access to high latency main memory. Therefore, it is essential to have good locality for good performance. Most graph processing applications have highly random memory access patterns. Coupled with the current large sizes of the graphs, they result in poor cache utilization. Additionally, the computation to data access ratio in many graph processing applications is very low, making it difficult to cover the memory latency using computation. It is also challenging to efficiently parallelize most graph applications. Many graphs in real world have unbalanced degree distribution. It is difficult to achieve a balanced workload for such graphs. The parallelism in graph applications is generally fine-grained in nature. This calls for efficient synchronization and communication between the processing units. Techniques for enhancing locality have been well studied in the context of regular applications like linear algebra. Those techniques are in most cases not applicable to the graph problems. In this dissertation, we propose two techniques for enhancing locality in graph algorithms: access transformation and task-set reduction. Access transformation can be applied to algorithms to improve the spatial locality by changing the random access pattern to sequential access. It is applicable to iterative algorithms that process random vertices/edges in each iteration. The task-set reduction technique can be applied to enhance the temporal locality. It is applicable to algorithms which repeatedly access the same data to perform certain task. Using the two techniques, we propose novel algorithms for three graph problems: k-core decomposition, maximal clique enumeration and triangle listing. We have implemented the algorithms. The results show that these algorithms provide significant improvement in performance and also scale well

    Theoretical-experimental study on protein-ligand interactions based on thermodynamics methods, molecular docking and perturbation models

    Get PDF
    The current doctoral thesis focuses on understanding the thermodynamic events of protein-ligand interactions which have been of paramount importance from traditional Medicinal Chemistry to Nanobiotechnology. Particular attention has been made on the application of state-of-the-art methodologies to address thermodynamic studies of the protein-ligand interactions by integrating structure-based molecular docking techniques, classical fractal approaches to solve protein-ligand complementarity problems, perturbation models to study allosteric signal propagation, predictive nano-quantitative structure-toxicity relationship models coupled with powerful experimental validation techniques. The contributions provided by this work could open an unlimited horizon to the fields of Drug-Discovery, Materials Sciences, Molecular Diagnosis, and Environmental Health Sciences

    Analysis of shape, properties and "druggability" of protein binding pockets

    Get PDF
    Kenntnisse über die dreidimensionale Struktur therapeutisch relevanter Zielproteine bieten wertvolle Informationen für den rationalen Wirkstoffentwurf. Die stetig wachsende Zahl aufgeklärter Kristallstrukturen von Proteinen ermöglicht eine qualitative und quantitative rechnergestützte Untersuchung von spezifischen Protein-Liganden Wechselwirkungen. Im Rahmen dieser Arbeit wurden neue Algorithmen für die Identifikation und den Ähnlichkeitsvergleich von Proteinbindetaschen und ihren Eigenschaften entwickelt und in dem Programm PocketomePicker zusammengefasst. Die Software gliedert sich in die Routinen PocketPicker, PocketShapelets und PocketGraph. Ferner wurde in dieser Arbeit die Methode ReverseLIQUID reimplementiert und im Rahmen einer Kooperation für das strukturbasierte Virtuelle Screening angewendet. Die genannten Methoden und ihre wissenschaftliche Anwendungen sollte hier zusammengefasst werden: Die Methode PocketPicker ermöglicht die Vorhersage potentieller Bindetaschen auf Proteinoberflächen. Diese Technik implementiert einen geometrischen Ansatz auf Basis „künstlicher Gitter“ zur Identifikation zusammenhängender vergrabener Bereiche der Proteinoberfläche als Orte möglicher Ligandenbindestellen. Die Methode erreicht eine korrekte Vorhersage der tatsächlichen Bindetasche für 73 % der Einträge eines repräsentativen Datensatzes von Proteinstrukturen. Für 90 % der Proteinstrukturen wird die tatsächlich Ligandenbindestelle unter den drei wahrscheinlichsten vorhergesagten Taschen gefunden. PocketPicker übertrifft die Vorhersagequalität anderer etablierter Algorithmen und ermöglicht Taschenidentifikationen auf apo-Strukturen ohne signifikante Einbußen des Vorhersageerfolges. Andere Verfahren weisen deutlich eingeschränkte Ergebnisse bei der Anwendung auf apo-Strukturen auf. PocketPicker erlaubt den alignmentfreien Ähnlichkeitsvergleich von Bindetaschenfor-men durch die Kodierung berechneter Bindevolumen als Korrelationsdeskriptoren. Dieser Ansatz wurde erfolgreich für Funktionsvorhersage von Bindetaschen aus Homologiemodellen von APOBEC3C und Glutamat Dehydrogenase des Malariaerregers Plasmodium falciparum angewendet. Diese beiden Projekte wurden in Zusammenarbeit mit Kollaborationspartnern durchgeführt. Zudem wurden PocketPicker Korrelationsdeskriptoren erfolgreich für die automatisierte Konformationsanalyse der enzymatischen Tasche von Aldose Reduktase angewendet. Für detaillierte Analysen der Form und der physikochemischen Eigenschaften von Proteinbindetaschen wurde in dieser Arbeit die Methode PocketShapelets entwickelt. Diese Technik ermöglicht strukturelle Alignments von extrahierten Bindevolumen durch Zerlegungen der Oberfläche von Proteinbindetaschen. Die Überlagerung gelingt durch die Identifikation strukturell ähnlicher Oberflächenkurvaturen zweier Taschen. PocketShapelets wurde erfolgreich zur Analyse funktioneller Ähnlichkeit von Bindetaschen verwendet, die auf Betrachtungen physikochemischer Eigenschaften basiert. Zur Analyse der topologischen Vielfalt von Bindetaschengeometrien wurde in dieser Arbeit die Methode PocketGraph entwickelt. Dieser Ansatz nutzt das Konzept des sog. „Wachsenden Neuronalen Gases“ aus dem Bereich des maschinellen Lernens für eine automatische Extraktion des strukturellen Aufbaus von Bindetaschen. Ferner ermöglicht diese Methode die Zerlegung einer Bindestelle in ihre Subtaschen. Die von PocketPicker charakterisierten Taschenvolumen bilden die Grundlage für die Methode ReverseLIQUID. Dieses Programm wurde in dieser Arbeit weiterentwickelt und im Rahmen einer Kooperation zur Identifikation eines Inhibitors der Serinprotease HtrA des Erregers Helicobacter pylori verwendet. Mit ReverseLIQUID konnte ein strukturbasiertes Pharmakophormodell für das Virtuelle Screening erstellt werden. Dieser Ansatz ermöglichte die Identifikation einer Substanz mit niedrig mikromolarer Affinität gegenüber der Zielstruktur.Knowledge of the three-dimensional structure therapeutically relevant target proteins provides valuable information for rational drug design. The constantly increasing numbers of available crystal structures enable qualitative and quantitative analysis of specific protein-ligand interactions in silico. In this work novel algorithms for the identification and the comparison of protein binding sites and their properties were developed and combined in the program PocketomePicker. The software combines the routines PocketPicker, PocketShapelets and PocketGraph. Furthermore, the method ReverseLIQUID was re-implemented in this work and used for the structure-based virtual screening with a cooperation partner. The programs and their scientific applications are summarized here: The method PocketPicker is designed for the prediction of potential binding sites on protein surfaces. The technique implements a geometric approach based on the concept of “artificial grids” for the identification of continuous buried regions of the protein surface that might act as potential ligand binding sites. The method yields correct predications of the actual binding site for 73 % of the entries in a representative data set of protein structures. For 90 % of the proteins the actual binding site is found among the top three predicted binding pockets. PocketPicker exceeds the predictive quality of other established algorithms and enables correct binding site identifications on apo structures without significant drops of the prediction success. This is not achieved by other programs. PocketPicker enables alignment-free comparisons of binding site shapes by encoding extracted binding volumes as correlation vectors. This approach was used for successful predictions of binding site functionality for homology models of APOBEC3C and glutamate dehydrogenase of the malaria pathogen Plasmodium falciparum. These projects were carried out with collaboration partners. Furthermore, PocketPicker correlation descriptors were used for automated analysis of binding site conformations of aldose reductase active sites. The method PocketShapelets was implemented in this work for detailed analysis of shapes and physicochemical properties of protein binding sites. This approach enables structural alignments of extracted binding volumes by surface decomposition of protein binding sites. The structural superposition is achieved by identification of structurally similar surface curvatures of different binding pockets. PocketShapelets was successfully used for the analysis of functional similarity of binding sites based on observations of physicochemical properties. PocketGraph was developed for the analysis of the structural diversity of binding site geometries. This approach uses the “Growing Neural Gas” concept used in machine learning for an automated extraction of the structural organization of binding sites. Furthermore, the method enables the decomposition of binding sites into subpockets. The pocket volumes characterized by PocketPicker are the foundation of another program called ReverseLIQUID. This method was refined in this work and used for the identification of a Helicobacter pylori serine protease HtrA inhibitor. This project was performed with a collaboration partner. A receptor-based pharmacophore model was derived using ReverseLIQUID and used for virtual screening. This approach led to the identification of a substance with low micromolar affinity towards the target protein

    Structural Diversity of Biological Ligands and their Binding Sites in Proteins

    Get PDF
    The phenomenon of molecular recognition, which underpins almost all biological processes, is dynamic, complex and subtle. Establishing an interaction between a pair of molecules involves mutual structural rearrangements guided by a highly convoluted energy landscape, the accurate mapping of which continues to elude us. The analysis of interactions between proteins and small molecules has been a focus of intense interest for many years, offering as it does the promise of increased insight into many areas of biology, and the potential for greatly improved drug design methodologies. Computational methods for predicting which types of ligand a given protein may bind, and what conformation two molecules will adopt once paired, are particularly sought after. The work presented in this thesis aims to quantify the amount of structural variability observed in the ways in which proteins interact with ligands. This diversity is considered from two perspectives: to what extent ligands bind to different proteins in distinct conformations, and the degree to which binding sites specific for the same ligand have different atomic structures. The first study could be of value to approaches which aim to predict the bound pose of a ligand, since by cataloguing the range of conformations previously observed, it may be possible to better judge the biological likelihood of a newly predicted molecular arrangement. The findings show that several common biological ligands exhibit considerable conformational diversity when bound to proteins. Although binding in predominantly extended conformations, the analysis presented here highlights several cases in which the biological requirements of a given protein force its ligand to adopt a highly compact form. Comparing the conformational diversity observed within several protein families, the hypothesis that homologous proteins tend to bind ligands in a similar arrangement is generally upheld, but several families are identified in which this is demonstrably not the case. Consideration of diversity in the binding site itself, on the other hand, may be useful in guiding methods which search for binding sites in uncharacterised protein structures: identifying those regions of known sites which are less variable could help to focus the search only on the most important features. Analysis of the diversity of a non-redundant dataset of adenine binding sites shows that a small number of key interactions are conserved, with the majority of the fragment environment being highly variable. Just as ligand conformation varies between protein families, so the degree of binding site diversity is observed to be significantly higher in some families than others. Taken together, the results of this work suggest that the repertoire of strategies produced by nature for the purposes of molecular recognition are extremely extensive. Moreover, the importance of a given ligand conformation or pattern of interaction appears to vary greatly depending on the function of the particular group of proteins studied. As such, it is proposed that diversity analysis may form a significant part of future large-scale studies of ligand-protein interactions
    corecore