536 research outputs found

    Chemoinformatics Research at the University of Sheffield: A History and Citation Analysis

    Get PDF
    This paper reviews the work of the Chemoinformatics Research Group in the Department of Information Studies at the University of Sheffield, focusing particularly on the work carried out in the period 1985-2002. Four major research areas are discussed, these involving the development of methods for: substructure searching in databases of three-dimensional structures, including both rigid and flexible molecules; the representation and searching of the Markush structures that occur in chemical patents; similarity searching in databases of both two-dimensional and three-dimensional structures; and compound selection and the design of combinatorial libraries. An analysis of citations to 321 publications from the Group shows that it attracted a total of 3725 residual citations during the period 1980-2002. These citations appeared in 411 different journals, and involved 910 different citing organizations from 54 different countries, thus demonstrating the widespread impact of the Group's work

    Effectiveness of graph-based and fingerprint-based similarity measures for virtual screening of 2D chemical structure databases

    Get PDF
    This paper reports an evaluation of both graph-based and fingerprint-based measures of structural similarity, when used for virtual screening of sets of 2D molecules drawn from the MDDR and ID Alert databases. The graph-based measures employ a new maximum common edge subgraph isomorphism algorithm, called RASCAL, with several similarity coefficients described previously for quantifying the similarity between pairs of graphs. The effectiveness of these graph-based searches is compared with that resulting from similarity searches using BCI, Daylight and Unity 2D fingerprints. Our results suggest that graph-based approaches provide an effective complement to existing fingerprint-based approaches to virtual screening

    Graph indexing and retrieval based on graph prototypes

    Get PDF
    [ANGLÈS] Taking a query from a high number of data stored into a database, as fast as possible, is a recurrent problem in the field of computer sciences practically since its origins. At the existence of this problem, it’s necessary to add, moreover, the fact that actually databases contains data types of more diverse and unexpected character possible. Now we are not talking about originating databases which only contained sets of numbers or characters strings. (...) All that I want to make into the present work and I think that was achieved as far as possible, has been to develop and to present a methodology to carry out this process. The Metric Trees of prototypes are based on a well-known strategy, which is based on grouping the data stored in database at the smartest possible way. But also we has added the concept of a graph prototype. A structure that contains information of a set of instances represented by graphs, used until now for classification and recognition. In this thesis we have used graphs as representatives of elements that have to be queried in databases. Note that graphs have the capacity to represent complex objects, for this reason the number of graph databases is increasing. Due to in the literature appears different ways to build a prototype, the work presented here shows a comparative study between the main methods. Combining these two concepts, the Metric Tree and the graph prototype, we propose the construction of metric trees where the graph prototypes are routing nodes to help to decide the way to explore when we make a search in the tree. We have used Metric Trees to make classification and to find all instances that are lower than a maximum distance. (...)[CATALÀ] El trobar-nos davant una gran quantitat de dades i tenir que fer cerques d’aquestes el més ràpid possible és un problema recurrent en el camp de les ciències de la computació pràcticament des dels seus orígens. A l'existència d'aquest problema, se li ha d’afegir, a més a més, el fet de que actualment les bases de dades emmagatzemen tipus de dades de la naturalesa més diversa i molts cops inesperada possible. Ja no parlem de les bases de dades originaries que únicament contenien números o cadenes caràcters. (...) El que he volgut en aquest treball i penso que en la mesura del que era possible s'ha aconseguit, és desenvolupar i presentar una metodologia per portar a terme aquest procés. Els Metric Trees de prototips, que es basen en la ja coneguda estratègia d'agrupar les dades que anem guardant a una base de dades de la forma més intel·ligent possible per no haver d’explorar totes les instàncies que tenim quan volem fer una cerca, però a més a més s'ha afegit el concepte de prototip. Una estructura, que agrupa la informació d'un conjunt d'instàncies, utilitzada fins ara per a fer classificació i reconeixement. Conjugant aquests dos conceptes, el de Metric Tree i el de prototip, plantejem la construcció d'arbres de cerca on els prototips siguin els nodes intermedis, que ens ajudin a decidir quin camí explorar quan volem fer una cerca sobre l'arbre. I utilitzant, aquests tant per a fer classificació com per a buscar totes les instàncies que estiguin una distància més petita d’una distància máxima. Tot això tenint present, que les dades amb que treballem són grafs, és a dir que la metodologia presentada, té la versatilitat de poder-se aplicar, a qualsevol tipus d'informació que es pugui representar d'aquesta manera. (...

    Efficient Subgraph Similarity Search on Large Probabilistic Graph Databases

    Full text link
    Many studies have been conducted on seeking the efficient solution for subgraph similarity search over certain (deterministic) graphs due to its wide application in many fields, including bioinformatics, social network analysis, and Resource Description Framework (RDF) data management. All these works assume that the underlying data are certain. However, in reality, graphs are often noisy and uncertain due to various factors, such as errors in data extraction, inconsistencies in data integration, and privacy preserving purposes. Therefore, in this paper, we study subgraph similarity search on large probabilistic graph databases. Different from previous works assuming that edges in an uncertain graph are independent of each other, we study the uncertain graphs where edges' occurrences are correlated. We formally prove that subgraph similarity search over probabilistic graphs is #P-complete, thus, we employ a filter-and-verify framework to speed up the search. In the filtering phase,we develop tight lower and upper bounds of subgraph similarity probability based on a probabilistic matrix index, PMI. PMI is composed of discriminative subgraph features associated with tight lower and upper bounds of subgraph isomorphism probability. Based on PMI, we can sort out a large number of probabilistic graphs and maximize the pruning capability. During the verification phase, we develop an efficient sampling algorithm to validate the remaining candidates. The efficiency of our proposed solutions has been verified through extensive experiments.Comment: VLDB201

    Structural Analysis Algorithms for Nanomaterials

    Get PDF
    • …
    corecore