62 research outputs found

    Maximum Common Subgraph Isomorphism Algorithms

    Get PDF
    Maximum common subgraph (MCS) isomorphism algorithms play an important role in chemoinformatics by providing an effective mechanism for the alignment of pairs of chemical structures. This article discusses the various types of MCS that can be identified when two graphs are compared and reviews some of the algorithms that are available for this purpose, focusing on those that are, or may be, applicable to the matching of chemical graphs

    Applications and Variations of the Maximum Common Subgraph for the Determination of Chemical Similarity

    Get PDF
    The Maximum Common Substructure (MCS), along with numerous graph theory techniques, has been used widely in chemoinformatics. A topic which has been studied at Sheffield is the hyperstructure concept - a chemical definition of a superstructure, which represents the graph theoretic union of several molecules. This technique however, has been poorly studied in the context of similarity-based virtual screening. Most hyperstructure literature to date has focused on either construction methodology, or property prediction on small datasets of compounds. The work in this thesis is divided into two parts. The first part describes a method for constructing hyperstructures, and then describes the application of a hyperstructure in similarity searching in large compound datasets, comparing it with extended connectivity fingerprint and MCS similarity. Since hyperstructures performed significantly worse than fingerprints, additional work is described concerning various weighting schemes of hyperstructures. Due to the poor performance of hyperstructure and MCS screening compared to fingerprints, it was questioned whether the type of maximum common substructure algorithm and type had an influence. A series of MCS algorithms and types were compared for both speed, MCS size, and virtual screening ability. A topologically-constrained variant of the MCS was found to be competitive with fingerprints, and fusion of the two techniques overall improved active compound recall

    Comparison of maximum common subgraph isomorphism algorithms for the alignment of 2D chemical structures

    Get PDF
    The identification of the largest substructure in common when two (or more) molecules are overlaid is important for several applications in chemoinformatics, and can be implemented using a maximum common subgraph (MCS) algorithm. Many such algorithms have been reported, and it is important to know which are likely to be the useful in operation. A detailed comparison was hence conducted of the efficiency (in terms of CPU time) and the effectiveness (in terms of the size of the MCS identified) of eleven MCS algorithms, some of which were exact and some of which were approximate in character. The algorithms were used to identify both connected and disconnected MCSs on a range of pairs of molecules. The fastest exact algorithms for the connected and disconnected problems were found to be the fMCS and MaxCliqueSeq algorithms, respectively, while the ChemAxon_MCS algorithm was the fastest approximate algorithm for both types of problem

    Merging and scoring molecular interactions utilising existing community standards: tools, use-cases and a case study.

    No full text
    The evidence that two molecules interact in a living cell is often inferred from multiple different experiments. Experimental data is captured in multiple repositories, but there is no simple way to assess the evidence of an interaction occurring in a cellular environment. Merging and scoring of data are commonly required operations after querying for the details of specific molecular interactions, to remove redundancy and assess the strength of accompanying experimental evidence. We have developed both a merging algorithm and a scoring system for molecular interactions based on the proteomics standard initiative-molecular interaction standards. In this manuscript, we introduce these two algorithms and provide community access to the tool suite, describe examples of how these tools are useful to selectively present molecular interaction data and demonstrate a case where the algorithms were successfully used to identify a systematic error in an existing dataset

    Towards population-based structural health monitoring, Part III: graphs, networks and communities

    Get PDF
    Population-based structural health monitoring opens up the possibility of using information from a population of structures to provide extra information for each individual structure. For example, population-based structural health monitoring could provide improved damage-detection within a homogeneous population of structures by defining a normal condition across a population of structures, which was robust to environmental variation. Furthermore, in cases where structures are sufficiently similar, damage location, assessment, and classification labels could be transferred, increasing the damage labels available for each structure. To determine whether two structures are sufficiently similar requires the comparison of some representation of the structure. In fields such as bioinformatics and computer science, attributed graphs are often used to determine structural similarity. This paper will describe methods for comparing the topology attributes of two such graphs. The algorithm described is suited to population-based structural health monitoring as it provides matches between two graphs which have physical significance. This paper will also describe the process of comparing hierarchical attributes to determine the level of knowledge transfer possible between two structures

    WormBase: a comprehensive resource for nematode research

    Get PDF
    WormBase (http://www.wormbase.org) is a central data repository for nematode biology. Initially created as a service to the Caenorhabditis elegans research field, WormBase has evolved into a powerful research tool in its own right. In the past 2 years, we expanded WormBase to include the complete genomic sequence, gene predictions and orthology assignments from a range of related nematodes. This comparative data enrich the C. elegans data with improved gene predictions and a better understanding of gene function. In turn, they bring the wealth of experimental knowledge of C. elegans to other systems of medical and agricultural importance. Here, we describe new species and data types now available at WormBase. In addition, we detail enhancements to our curatorial pipeline and website infrastructure to accommodate new genomes and an extensive user base

    The IMEx coronavirus interactome: an evolving map of Coronaviridae-host molecular interactions

    Get PDF
    The current coronavirus disease of 2019 (COVID-19) pandemic, caused by the severe acute respiratory syndrome coronavirus (SARS-CoV)-2, has spurred a wave of research of nearly unprecedented scale. Among the different strategies that are being used to understand the disease and develop effective treatments, the study of physical molecular interactions can provide fine-grained resolution of the mechanisms behind the virus biology and the human organism response. We present a curated dataset of physical molecular interactions focused on proteins from SARS-CoV-2, SARS-CoV-1 and other members of the Coronaviridae family that has been manually extracted by International Molecular Exchange (IMEx) Consortium curators. Currently, the dataset comprises over 4400 binarized interactions extracted from 151 publications. The dataset can be accessed in the standard formats recommended by the Proteomics Standards Initiative (HUPO-PSI) at the IntAct database website (https://www.ebi.ac.uk/intact) and will be continuously updated as research on COVID-19 progresses

    The IntAct molecular interaction database in 2012

    Get PDF
    IntAct is an open-source, open data molecular interaction database populated by data either curated from the literature or from direct data depositions. Two levels of curation are now available within the database, with both IMEx-level annotation and less detailed MIMIx-compatible entries currently supported. As from September 2011, IntAct contains approximately 275ā€‰000 curated binary interaction evidences from over 5000 publications. The IntAct website has been improved to enhance the search process and in particular the graphical display of the results. New data download formats are also available, which will facilitate the inclusion of IntAct's data in the Semantic Web. IntAct is an active contributor to the IMEx consortium (http://www.imexconsortium.org). IntAct source code and data are freely available at http://www.ebi.ac.uk/intac

    The IntAct molecular interaction database in 2012

    Get PDF
    IntAct is an open-source, open data molecular interaction database populated by data either curated from the literature or from direct data depositions. Two levels of curation are now available within the database, with both IMEx-level annotation and less detailed MIMIx-compatible entries currently supported. As from September 2011, IntAct contains approximately 275ā€‰000 curated binary interaction evidences from over 5000 publications. The IntAct website has been improved to enhance the search process and in particular the graphical display of the results. New data download formats are also available, which will facilitate the inclusion of IntAct's data in the Semantic Web. IntAct is an active contributor to the IMEx consortium (http://www.imexconsortium.org). IntAct source code and data are freely available at http://www.ebi.ac.uk/intact

    The IntAct database:Efficient access to fine-grained molecular interaction data

    Get PDF
    The IntAct molecular interaction database (https://www.ebi.ac.uk/intact) is a curated resource of molecular interactions, derived from the scientific literature and from direct data depositions. As of August 2021, IntAct provides more than one million binary interactions, curated by twelve global partners of the International Molecular Exchange consortium, for which the IntAct database provides a shared curation and dissemination platform. The IMEx curation policy has always emphasised a fine-grained data and curation model, aiming to capture the relevant experimental detail essential for the interpretation of the provided molecular interaction data. Here, we present recent curation focus and progress, as well as a completely redeveloped website which presents IntAct data in a much more user-friendly and detailed way
    • ā€¦
    corecore