Search CORE

67 research outputs found

Maximum common subgraph isomorphism algorithms for the matching of chemical structures

Author: Raymond J.W.
Willett P.
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2002
Field of study

The maximum common subgraph (MCS) problem has become increasingly important in those aspects of chemoinformatics that involve the matching of 2D or 3D chemical structures. This paper provides a classification and a review of the many MCS algorithms, both exact and approximate, that have been described in the literature, and makes recommendations regarding their applicability to typical chemoinformatics tasks

CiteSeerX

White Rose Research Online

TeachOpenCADD: a teaching platform for computer-aided drug design using open source packages and data

Author: Driller Maximilian
Morger Andrea
Sydow Dominique
Volkamer Andrea
Publication venue
Publication date: 01/01/2019
Field of study

Owing to the increase in freely available software and data for cheminformatics and structural bioinformatics, research for computer-aided drug design (CADD) is more and more built on modular, reproducible, and easy-to-share pipelines. While documentation for such tools is available, there are only a few freely accessible examples that teach the underlying concepts focused on CADD, especially addressing users new to the field. Here, we present TeachOpenCADD, a teaching platform developed by students for students, using open source compound and protein data as well as basic and CADD-related Python packages. We provide interactive Jupyter notebooks for central CADD topics, integrating theoretical background and practical code. TeachOpenCADD is freely available on GitHub: https://github.com/volkamerlab/TeachOpenCAD

Institutional Repository of the Freie Universität Berlin

Directory of Open Access Journals

EDULISS: a small-molecule database with data-mining and pharmacophore searching capabilities

Author: Anderson
Andrew C. Hinton
Ballester
Butina
Chen
Deanda
EMBL-EBI
Fang
Geer
Ghosh
Hann
Hartshorn
Hugh P. Morgan
Irwin
Ivanciuc
Kastenholz
Kun-Yi Hsin
Lipinski
Liu
Lyne
Malcolm D. Walkinshaw
McGregor
Mesecar
Mihalic
Miller
Morgan
Patel
Paul Taylor
Raymond
Seiler
Steven R. Shave
Todeschini
Wang
Wishart
Wolber
Publication venue: Oxford University Press
Publication date: 01/01/2011
Field of study

We present the relational database EDULISS (EDinburgh University Ligand Selection System), which stores structural, physicochemical and pharmacophoric properties of small molecules. The database comprises a collection of over 4 million commercially available compounds from 28 different suppliers. A user-friendly web-based interface for EDULISS (available at http://eduliss.bch.ed.ac.uk/) has been established providing a number of data-mining possibilities. For each compound a single 3D conformer is stored along with over 1600 calculated descriptor values (molecular properties). A very efficient method for unique compound recognition, especially for a large scale database, is demonstrated by making use of small subgroups of the descriptors. Many of the shape and distance descriptors are held as pre-calculated bit strings permitting fast and efficient similarity and pharmacophore searches which can be used to identify families of related compounds for biological testing. Two ligand searching applications are given to demonstrate how EDULISS can be used to extract families of molecules with selected structural and biophysical features

Crossref

PubMed Central

Edinburgh Research Explorer

Malware Classification based on Call Graph Clustering

Author: Kinable Joris
Kostakis Orestis
Publication venue
Publication date: 25/08/2010
Field of study

Each day, anti-virus companies receive tens of thousands samples of potentially harmful executables. Many of the malicious samples are variations of previously encountered malware, created by their authors to evade pattern-based detection. Dealing with these large amounts of data requires robust, automatic detection approaches. This paper studies malware classification based on call graph clustering. By representing malware samples as call graphs, it is possible to abstract certain variations away, and enable the detection of structural similarities between samples. The ability to cluster similar samples together will make more generic detection techniques possible, thereby targeting the commonalities of the samples within a cluster. To compare call graphs mutually, we compute pairwise graph similarity scores via graph matchings which approximately minimize the graph edit distance. Next, to facilitate the discovery of similar malware samples, we employ several clustering algorithms, including k-medoids and DBSCAN. Clustering experiments are conducted on a collection of real malware samples, and the results are evaluated against manual classifications provided by human malware analysts. Experiments show that it is indeed possible to accurately detect malware families via call graph clustering. We anticipate that in the future, call graphs can be used to analyse the emergence of new malware families, and ultimately to automate implementation of generic detection schemes.Comment: This research has been supported by TEKES - the Finnish Funding Agency for Technology and Innovation as part of its ICT SHOK Future Internet research programme, grant 40212/0

arXiv.org e-Print Archive

CiteSeerX

Repository TU/e

Linguistic measures of chemical diversity and the "keywords" of molecular collections

Author: A Cadeddu
A Kilgarriff
A Roy
B Kowalczyk
B Zhang
C Bian
C Lipinski
D Conte
D Hoover
EJ Martin
F Font-Clos
F Tweedie
FW Goldberg
G Skoraczyński
GM Maggiora
GM Rishton
JW Raymond
K Kettunen
M Krallinger
M Kubát
M Suggitt
MA Covington
ME Welsch
MM Cone
NG Olinghouse
S Soh
WP Walters
Y Cao
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/05/2018
Field of study

Computerized linguistic analyses have proven of immense value in comparing and searching through large text collections ("corpora"), including those deposited on the Internet-indeed, it would nowadays be hard to imagine browsing the Web without, for instance, search algorithms extracting most appropriate keywords from documents. This paper describes how such corpus-linguistic concepts can be extended to chemistry based on characteristic "chemical words" that span more than traditional functional groups and, instead, look at common structural fragments molecules share. Using these words, it is possible to quantify the diversity of chemical collections/databases in new ways and to define molecular "keywords" by which such collections are best characterized and annotated

Crossref

ScholarWorks@UNIST

On parameterized complexity of the Multi-MCS problem

Author: Chen Wenbin
Samatova Nagiza F.
Schmidt Matthew C.
Publication venue
Publication date: 17/05/2009
Field of study

AbstractWe introduce the maximum common subgraph problem for multiple graphs (Multi-MCS) inspired by various biological applications such as multiple alignments of gene sequences, protein structures, metabolic pathways, or protein–protein interaction networks. Multi-MCS is a generalization of the two-graph Maximum Common Subgraph problem (MCS). On the basis of the framework of parameterized complexity theory, we derive the parameterized complexity of Multi-MCS for various parameters for different classes of graphs. For example, for directed graphs with labeled vertices, we prove that the parameterized m-Multi-MCS problem is W[2]-hard, while the parameterized k-Multi-MCS problem is W[t]-hard (∀t≥1), where m and k are the size of the maximum common subgraph and the number of multiple graphs, respectively. We show similar results for other parameterized versions of the Multi-MCS problem for directed graphs with vertex labels and undirected graphs with vertex and edge labels by giving linear FPT reductions of the problems from parameterized versions of the longest common subsequence problem. Likewise, for unlabeled undirected graphs, we show that a parameterized version of the Multi-MCS problem with a fixed number of input graphs is W[1]-complete by showing a linear FPT reduction to and from a parameterized version of the maximum clique problem

Elsevier - Publisher Connector

Functional Group and Substructure Searching as a Tool in Metabolomics

Author: A Dalby
AG McDonald
Andrew G. McDonald
B Chen
BL Bush
C Chang
CA James
CA Lipinski
CA Nicolaou
D Weininger
D Weininger
D Wild
DM Bayada
DR Flower
F Oellien
FH Allen
G Klopman
GJ Leigh
I Muegge
J Antal
J Polanski
Ji Zhu
JL Wisniewski
JM Barnard
JW Raymond
JW Raymond
JW Raymond
K Hult
Keith F. Tipton
LB Ellis
M Hattori
M Kanehisa
M Kotera
M Kotera
Masaaki Kotera
MG Poolman
NJ Richmond
O Hofmann
R Arimoto
RD Brown
RD Brown
S Ash
SA Khedkar
Sinéad Boyce
SJ Coles
VV Poroikov
W Schwab
WD Ihlenfeldt
WD Ihlenfeldt
WJ Wiswesser
WJ Wiswesser
WJ Wiswesser
Publication venue: Public Library of Science
Publication date: 06/02/2008
Field of study

BACKGROUND: A direct link between the names and structures of compounds and the functional groups contained within them is important, not only because biochemists frequently rely on literature that uses a free-text format to describe functional groups, but also because metabolic models depend upon the connections between enzymes and substrates being known and appropriately stored in databases. METHODOLOGY: We have developed a database named "Biochemical Substructure Search Catalogue" (BiSSCat), which contains 489 functional groups, >200,000 compounds and >1,000,000 different computationally constructed substructures, to allow identification of chemical compounds of biological interest. CONCLUSIONS: This database and its associated web-based search program (http://bisscat.org/) can be used to find compounds containing selected combinations of substructures and functional groups. It can be used to determine possible additional substrates for known enzymes and for putative enzymes found in genome projects. Its applications to enzyme inhibitor design are also discussed

Public Library of Science (PLOS)

Crossref

Directory of Open Access Journals

PubMed Central