5,132 research outputs found
Chemoinformatics Research at the University of Sheffield: A History and Citation Analysis
This paper reviews the work of the Chemoinformatics Research Group in the Department of Information Studies at the University of Sheffield, focusing particularly on the work carried out in the period 1985-2002. Four major research areas are discussed, these involving the development of methods for: substructure searching in databases of three-dimensional structures, including both rigid and flexible molecules; the representation and searching of the Markush structures that occur in chemical patents; similarity searching in databases of both two-dimensional and three-dimensional structures; and compound selection and the design of combinatorial libraries. An analysis of citations to 321 publications from the Group shows that it attracted a total of 3725 residual citations during the period 1980-2002. These citations appeared in 411 different journals, and involved 910 different citing organizations from 54 different countries, thus demonstrating the widespread impact of the Group's work
Generic Strategies for Chemical Space Exploration
Computational approaches to exploring "chemical universes", i.e., very large
sets, potentially infinite sets of compounds that can be constructed by a
prescribed collection of reaction mechanisms, in practice suffer from a
combinatorial explosion. It quickly becomes impossible to test, for all pairs
of compounds in a rapidly growing network, whether they can react with each
other. More sophisticated and efficient strategies are therefore required to
construct very large chemical reaction networks.
Undirected labeled graphs and graph rewriting are natural models of chemical
compounds and chemical reactions. Borrowing the idea of partial evaluation from
functional programming, we introduce partial applications of rewrite rules.
Binding substrate to rules increases the number of rules but drastically prunes
the substrate sets to which it might match, resulting in dramatically reduced
resource requirements. At the same time, exploration strategies can be guided,
e.g. based on restrictions on the product molecules to avoid the explicit
enumeration of very unlikely compounds. To this end we introduce here a generic
framework for the specification of exploration strategies in graph-rewriting
systems. Using key examples of complex chemical networks from sugar chemistry
and the realm of metabolic networks we demonstrate the feasibility of a
high-level strategy framework.
The ideas presented here can not only be used for a strategy-based chemical
space exploration that has close correspondence of experimental results, but
are much more general. In particular, the framework can be used to emulate
higher-level transformation models such as illustrated in a small puzzle game
Shared Memory Parallel Subgraph Enumeration
The subgraph enumeration problem asks us to find all subgraphs of a target
graph that are isomorphic to a given pattern graph. Determining whether even
one such isomorphic subgraph exists is NP-complete---and therefore finding all
such subgraphs (if they exist) is a time-consuming task. Subgraph enumeration
has applications in many fields, including biochemistry and social networks,
and interestingly the fastest algorithms for solving the problem for
biochemical inputs are sequential. Since they depend on depth-first tree
traversal, an efficient parallelization is far from trivial. Nevertheless,
since important applications produce data sets with increasing difficulty,
parallelism seems beneficial.
We thus present here a shared-memory parallelization of the state-of-the-art
subgraph enumeration algorithms RI and RI-DS (a variant of RI for dense graphs)
by Bonnici et al. [BMC Bioinformatics, 2013]. Our strategy uses work stealing
and our implementation demonstrates a significant speedup on real-world
biochemical data---despite a highly irregular data access pattern. We also
improve RI-DS by pruning the search space better; this further improves the
empirical running times compared to the already highly tuned RI-DS.Comment: 18 pages, 12 figures, To appear at the 7th IEEE Workshop on Parallel
/ Distributed Computing and Optimization (PDCO 2017
A Generic Framework for Engineering Graph Canonization Algorithms
The state-of-the-art tools for practical graph canonization are all based on
the individualization-refinement paradigm, and their difference is primarily in
the choice of heuristics they include and in the actual tool implementation. It
is thus not possible to make a direct comparison of how individual algorithmic
ideas affect the performance on different graph classes.
We present an algorithmic software framework that facilitates implementation
of heuristics as independent extensions to a common core algorithm. It
therefore becomes easy to perform a detailed comparison of the performance and
behaviour of different algorithmic ideas. Implementations are provided of a
range of algorithms for tree traversal, target cell selection, and node
invariant, including choices from the literature and new variations. The
framework readily supports extraction and visualization of detailed data from
separate algorithm executions for subsequent analysis and development of new
heuristics.
Using collections of different graph classes we investigate the effect of
varying the selections of heuristics, often revealing exactly which individual
algorithmic choice is responsible for particularly good or bad performance. On
several benchmark collections, including a newly proposed class of difficult
instances, we additionally find that our implementation performs better than
the current state-of-the-art tools
The Vadalog System: Datalog-based Reasoning for Knowledge Graphs
Over the past years, there has been a resurgence of Datalog-based systems in
the database community as well as in industry. In this context, it has been
recognized that to handle the complex knowl\-edge-based scenarios encountered
today, such as reasoning over large knowledge graphs, Datalog has to be
extended with features such as existential quantification. Yet, Datalog-based
reasoning in the presence of existential quantification is in general
undecidable. Many efforts have been made to define decidable fragments. Warded
Datalog+/- is a very promising one, as it captures PTIME complexity while
allowing ontological reasoning. Yet so far, no implementation of Warded
Datalog+/- was available. In this paper we present the Vadalog system, a
Datalog-based system for performing complex logic reasoning tasks, such as
those required in advanced knowledge graphs. The Vadalog system is Oxford's
contribution to the VADA research programme, a joint effort of the universities
of Oxford, Manchester and Edinburgh and around 20 industrial partners. As the
main contribution of this paper, we illustrate the first implementation of
Warded Datalog+/-, a high-performance Datalog+/- system utilizing an aggressive
termination control strategy. We also provide a comprehensive experimental
evaluation.Comment: Extended version of VLDB paper
<https://doi.org/10.14778/3213880.3213888
- …