536 research outputs found
Chemoinformatics Research at the University of Sheffield: A History and Citation Analysis
This paper reviews the work of the Chemoinformatics Research Group in the Department of Information Studies at the University of Sheffield, focusing particularly on the work carried out in the period 1985-2002. Four major research areas are discussed, these involving the development of methods for: substructure searching in databases of three-dimensional structures, including both rigid and flexible molecules; the representation and searching of the Markush structures that occur in chemical patents; similarity searching in databases of both two-dimensional and three-dimensional structures; and compound selection and the design of combinatorial libraries. An analysis of citations to 321 publications from the Group shows that it attracted a total of 3725 residual citations during the period 1980-2002. These citations appeared in 411 different journals, and involved 910 different citing organizations from 54 different countries, thus demonstrating the widespread impact of the Group's work
Effectiveness of graph-based and fingerprint-based similarity measures for virtual screening of 2D chemical structure databases
This paper reports an evaluation of both graph-based and fingerprint-based measures of structural similarity, when used for virtual screening of sets of 2D molecules drawn from the MDDR and ID Alert databases. The graph-based measures employ a new maximum common edge subgraph isomorphism algorithm, called RASCAL, with several similarity coefficients described previously for quantifying the similarity between pairs of graphs. The effectiveness of these graph-based searches is compared with that resulting from similarity searches using BCI, Daylight and Unity 2D fingerprints. Our results suggest that graph-based approaches provide an effective complement to existing fingerprint-based approaches to virtual screening
Graph indexing and retrieval based on graph prototypes
[ANGLÈS] Taking a query from a high number of data stored into a database, as fast as possible, is a recurrent problem in the field of computer sciences practically since its origins. At the existence of this problem, it’s necessary to add, moreover, the fact that actually databases contains data types of more diverse and unexpected character possible. Now we are not talking about originating databases which only contained sets of numbers or characters strings. (...) All that I want to make into the present work and I think that was achieved as far as possible, has been to develop and to present a methodology to carry out this process. The Metric Trees of prototypes are based on a well-known strategy, which is based on grouping the data stored in database at the smartest possible way. But also we has added the concept of a graph prototype. A structure that contains information of a set of instances represented by graphs, used until now for classification and recognition.
In this thesis we have used graphs as representatives of elements that have to be queried in databases. Note that graphs have the capacity to represent complex objects, for this reason the number of graph databases is increasing. Due to in the literature appears different ways to build a prototype, the work presented here shows a comparative study between the main methods.
Combining these two concepts, the Metric Tree and the graph prototype, we propose the construction of metric trees where the graph prototypes are routing nodes to help to decide the way to explore when we make a search in the tree. We have used Metric Trees to make classification and to find all instances that are lower than a maximum distance. (...)[CATALÀ] El trobar-nos davant una gran quantitat de dades i tenir que fer cerques d’aquestes el més rà pid possible és un problema recurrent en el camp de les ciències de la computació prà cticament des dels seus orÃgens. A l'existència d'aquest problema, se li ha d’afegir, a més a més, el fet de que actualment les bases de dades emmagatzemen tipus de dades de la naturalesa més diversa i molts cops inesperada possible. Ja no parlem de les bases de dades originaries que únicament contenien números o cadenes carà cters. (...) El que he volgut en aquest treball i penso que en la mesura del que era possible s'ha aconseguit, és desenvolupar i presentar una metodologia per portar a terme aquest procés. Els Metric Trees de prototips, que es basen en la ja coneguda estratègia d'agrupar les dades que anem guardant a una base de dades de la forma més intel·ligent possible per no haver d’explorar totes les instà ncies que tenim quan volem fer una cerca, però a més a més s'ha afegit el concepte de prototip. Una estructura, que agrupa la informació d'un conjunt d'instà ncies, utilitzada fins ara per a fer classificació i reconeixement.
Conjugant aquests dos conceptes, el de Metric Tree i el de prototip, plantejem la construcció d'arbres de cerca on els prototips siguin els nodes intermedis, que ens ajudin a decidir quin camà explorar quan volem fer una cerca sobre l'arbre. I utilitzant, aquests tant per a fer classificació com per a buscar totes les instà ncies que estiguin una distà ncia més petita d’una distà ncia máxima. Tot això tenint present, que les dades amb que treballem són grafs, és a dir que la metodologia presentada, té la versatilitat de poder-se aplicar, a qualsevol tipus d'informació que es pugui representar d'aquesta manera. (...
Efficient Subgraph Similarity Search on Large Probabilistic Graph Databases
Many studies have been conducted on seeking the efficient solution for
subgraph similarity search over certain (deterministic) graphs due to its wide
application in many fields, including bioinformatics, social network analysis,
and Resource Description Framework (RDF) data management. All these works
assume that the underlying data are certain. However, in reality, graphs are
often noisy and uncertain due to various factors, such as errors in data
extraction, inconsistencies in data integration, and privacy preserving
purposes. Therefore, in this paper, we study subgraph similarity search on
large probabilistic graph databases. Different from previous works assuming
that edges in an uncertain graph are independent of each other, we study the
uncertain graphs where edges' occurrences are correlated. We formally prove
that subgraph similarity search over probabilistic graphs is #P-complete, thus,
we employ a filter-and-verify framework to speed up the search. In the
filtering phase,we develop tight lower and upper bounds of subgraph similarity
probability based on a probabilistic matrix index, PMI. PMI is composed of
discriminative subgraph features associated with tight lower and upper bounds
of subgraph isomorphism probability. Based on PMI, we can sort out a large
number of probabilistic graphs and maximize the pruning capability. During the
verification phase, we develop an efficient sampling algorithm to validate the
remaining candidates. The efficiency of our proposed solutions has been
verified through extensive experiments.Comment: VLDB201
- …