Search CORE

18 research outputs found

EDULISS: a small-molecule database with data-mining and pharmacophore searching capabilities

Author: Anderson
Andrew C. Hinton
Ballester
Butina
Chen
Deanda
EMBL-EBI
Fang
Geer
Ghosh
Hann
Hartshorn
Hugh P. Morgan
Irwin
Ivanciuc
Kastenholz
Kun-Yi Hsin
Lipinski
Liu
Lyne
Malcolm D. Walkinshaw
McGregor
Mesecar
Mihalic
Miller
Morgan
Patel
Paul Taylor
Raymond
Seiler
Steven R. Shave
Todeschini
Wang
Wishart
Wolber
Publication venue: Oxford University Press
Publication date: 01/01/2011
Field of study

We present the relational database EDULISS (EDinburgh University Ligand Selection System), which stores structural, physicochemical and pharmacophoric properties of small molecules. The database comprises a collection of over 4 million commercially available compounds from 28 different suppliers. A user-friendly web-based interface for EDULISS (available at http://eduliss.bch.ed.ac.uk/) has been established providing a number of data-mining possibilities. For each compound a single 3D conformer is stored along with over 1600 calculated descriptor values (molecular properties). A very efficient method for unique compound recognition, especially for a large scale database, is demonstrated by making use of small subgroups of the descriptors. Many of the shape and distance descriptors are held as pre-calculated bit strings permitting fast and efficient similarity and pharmacophore searches which can be used to identify families of related compounds for biological testing. Two ligand searching applications are given to demonstrate how EDULISS can be used to extract families of molecules with selected structural and biophysical features

Crossref

PubMed Central

Edinburgh Research Explorer

STITCH: interaction networks of chemicals and proteins

Author: Brooksbank
C. von Mering
Caspi
Joshi-Tope
Kanehisa
L. J. Jensen
Letunic
M. Campillos
M. Kuhn
P. Bork
von Mering
Weinstein
Wishart
Publication venue: Oxford University Press
Publication date: 01/01/2008
Field of study

The knowledge about interactions between proteins and small molecules is essential for the understanding of molecular and cellular functions. However, information on such interactions is widely dispersed across numerous databases and the literature. To facilitate access to this data, STITCH (‘search tool for interactions of chemicals’) integrates information about interactions from metabolic pathways, crystal structures, binding experiments and drug–target relationships. Inferred information from phenotypic effects, text mining and chemical structure similarity is used to predict relations between chemicals. STITCH further allows exploring the network of chemical relations, also in the context of associated binding proteins. Each proposed interaction can be traced back to the original data sources. Our database contains interaction information for over 68 000 different chemicals, including 2200 drugs, and connects them to 1.5 million genes across 373 genomes and their interactions contained in the STRING database. STITCH is available at http://stitch.embl.de

MMsINC: a large-scale chemoinformatics database

Author: Anderson
Berman
Chen
Fabian Cedrati
Gianfranco Frau
Irwin
Joel Masciocchi
Johnson
Kaiser
Lipinski
Luca Pireddu
Marco Fanton
Matteo Floris
Mattia Sturlese
McNaught
Oprea
Patricia Rodriguez-Tomé
Piergiorgio Palla
Schreiber
Stefano Moro
Steinbeck
Swamidass
Wheeler
Wishart
Publication venue: Oxford University Press
Publication date: 01/01/2008
Field of study

MMsINC (http://mms.dsfarm.unipd.it/MMsINC/search) is a database of non-redundant, richly annotated and biomedically relevant chemical structures. A primary goal of MMsINC is to guarantee the highest quality and the uniqueness of each entry. MMsINC then adds value to these entries by including the analysis of crucial chemical properties, such as ionization and tautomerization processes, and the in silico prediction of 24 important molecular properties in the biochemical profile of each structure. MMsINC is consequently a natural input for different chemoinformatics and virtual screening applications. In addition, MMsINC supports various types of queries, including substructure queries and the novel ‘molecular scissoring’ query. MMsINC is interfaced with other primary data collectors, such as PubChem, Protein Data Bank (PDB), the Food and Drug Administration database of approved drugs and ZINC

CiteSeerX

Crossref

PubMed Central

Archivio istituzionale della ricerca - Università di Padova

Inhibition of Antiapoptotic BCL-XL, BCL-2, and MCL-1 Proteins by Small Molecule Mimetics

Author: Dalafave D.S.
Prisco G.
Publication venue: Libertas Academica
Publication date: 01/08/2010
Field of study

Informatics and computational design methods were used to create new molecules that could potentially bind antiapoptotic proteins, thus promoting death of cancer cells. Apoptosis is a cellular process that leads to the death of damaged cells. Its malfunction can cause cancer and poor response to conventional chemotherapy. After being activated by cellular stress signals, proapoptotic proteins bind antiapoptotic proteins, thus allowing apoptosis to go forward. An excess of antiapoptotic proteins can prevent apoptosis. Designed molecules that mimic the roles of proapoptotic proteins can promote the death of cancer cells. The goal of our study was to create new putative mimetics that could simultaneously bind several antiapoptotic proteins. Five new small molecules were designed that formed stable complexes with BCL-2, BCL-XL, and MCL-1 antiapoptotic proteins. These results are novel because, to our knowledge, there are not many, if any, small molecules known to bind all three proteins. Drug-likeness studies performed on the designed molecules, as well as previous experimental and preclinical studies on similar agents, strongly suggest that the designed molecules may indeed be promising drug candidates. All five molecules showed “drug-like” properties and had overall drug-likeness scores between 81% and 96%. A single drug based on these mimetics should cost less and cause fewer side effects than a combination of drugs each aimed at a single protein. Computer-based molecular design promises to accelerate drug research by predicting potential effectiveness of designed molecules prior to laborious experiments and costly preclinical trials

Directory of Open Access Journals

PubMed Central

Accelerated similarity searching and clustering of large compound sets by geometric embedding and locality sensitive hashing

Author: Agrafiotis
Agrafiotis
Agrafiotis
Agrafiotis
Austin
Baldi
Bentley
Bohm
Brinkhoff
Cao
Chang
Chen
Chen
Cheng
Datar
Downs
Faloutsos
Fu
Gionis
Girke
Haggarty
Ihlenfeldt
Irwin
Katayama
Kruskal
Lv
NIH Chemical Genomics Center
Oprea
Oprea
Raymond
Roweis
Savchuk
Seiler
Sheridan
Smellie
Strausberg
Swamidass
Tao Jiang
Tenenbaum
Thomas Girke
Vaidya
Weber
Willett
Willett
Willett
Xu
Yiqun Cao
Zhu
Publication venue: Oxford University Press
Publication date: 01/04/2010
Field of study

Motivation: Similarity searching and clustering of chemical compounds by structural similarities are important computational approaches for identifying drug-like small molecules. Most algorithms available for these tasks are limited by their speed and scalability, and cannot handle today's large compound databases with several million entries

Crossref

PubMed Central

eScholarship - University of California

Evaluating Modules in Graph Contrastive Learning

Author: Cui Ganqu
Du Yufeng
Liu Zhiyuan
Wang Lifeng
Xu Liang
Yang Cheng
Zhou Jie
Publication venue
Publication date: 15/06/2021
Field of study

The recent emergence of contrastive learning approaches facilitates the research on graph representation learning (GRL), introducing graph contrastive learning (GCL) into the literature. These methods contrast semantically similar and dissimilar sample pairs to encode the semantics into node or graph embeddings. However, most existing works only performed model-level evaluation, and did not explore the combination space of modules for more comprehensive and systematic studies. For effective module-level evaluation, we propose a framework that decomposes GCL models into four modules: (1) a sampler to generate anchor, positive and negative data samples (nodes or graphs); (2) an encoder and a readout function to get sample embeddings; (3) a discriminator to score each sample pair (anchor-positive and anchor-negative); and (4) an estimator to define the loss function. Based on this framework, we conduct controlled experiments over a wide range of architectural designs and hyperparameter settings on node and graph classification tasks. Specifically, we manage to quantify the impact of a single module, investigate the interaction between modules, and compare the overall performance with current model architectures. Our key findings include a set of module-level guidelines for GCL, e.g., simple samplers from LINE and DeepWalk are strong and robust; an MLP encoder associated with Sum readout could achieve competitive performance on graph classification. Finally, we release our implementations and results as OpenGCL, a modularized toolkit that allows convenient reproduction, standard model and module evaluation, and easy extension

arXiv.org e-Print Archive

BLASTing small molecules—statistics and extreme statistics of chemical similarity scores

Author: Bohacek
Hassan
Hert
HINKLEY
Holliday
P. Baldi
R. W. Benz
Xue
Xue
Publication venue: Oxford University Press
Publication date
Field of study

Motivation: Small organic molecules, from nucleotides and amino acids to metabolites and drugs, play a fundamental role in chemistry, biology and medicine. As databases of small molecules continue to grow and become more open, it is important to develop the tools to search them efficiently. In order to develop a BLAST-like tool for small molecules, one must first understand the statistical behavior of molecular similarity scores

Crossref

PubMed Central

Accurate and efficient target prediction using a potency-sensitive influence-relevance voter

Author: Baldi Pierre
Browning Michael
Fooshee David
Lusci Alessandro
Swamidass Joshua
Publication venue: Digital Commons@Becker
Publication date: 01/01/2015
Field of study

BackgroundA number of algorithms have been proposed to predict the biological targets of diverse molecules. Some are structure-based, but the most common are ligand-based and use chemical fingerprints and the notion of chemical similarity. These methods tend to be computationally faster than others, making them particularly attractive tools as the amount of available data grows.ResultsUsing a ChEMBL-derived database covering 490,760 molecule-protein interactions and 3236 protein targets, we conduct a large-scale assessment of the performance of several target-prediction algorithms at predicting drug-target activity. We assess algorithm performance using three validation procedures: standard tenfold cross-validation, tenfold cross-validation in a simulated screen that includes random inactive molecules, and validation on an external test set composed of molecules not present in our database.ConclusionsWe present two improvements over current practice. First, using a modified version of the influence-relevance voter (IRV), we show that using molecule potency data can improve target prediction. Second, we demonstrate that random inactive molecules added during training can boost the accuracy of several algorithms in realistic target-prediction experiments. Our potency-sensitive version of the IRV (PS-IRV) obtains the best results on large test sets in most of the experiments. Models and software are publicly accessible through the chemoinformatics portal at http://chemdb.ics.uci.edu/

Crossref

Springer - Publisher Connector

Digital Commons@Becker

PubMed Central

eScholarship - University of California

Open Babel: An open chemical toolbox

Author: A Amini
A Andronico
A Bender
A Gakh
A Karwath
A Maunz
A Maunz
A Poater
A Rappe
AA Gakh
AD Hill
B-b Yan
BD McKay
C Helma
C Reynès
Chris Morley
CR Jacob
Craig A James
CW Bullock
D Filimonov
D Lagorce
D Lagorce
D Weininger
DC Bas
DC Lonie
DR Koes
F Fontaine
Geoffrey R Hutchison
GL Holliday
HL Morgan
I Wallach
I Wallach
IV Filippov
IV Tetko
J Ahmed
J Ahmed
J Kazius
J Myers
J Wang
J Wang
JH Chen
JJ Langham
JL Melville
JL Sharman
K Fogel
K Martin
L Fabian
L Liu
L Schietgat
M Brüstle
M Buehler
M Dehmer
M Konyk
M Krier
M Kuhn
MA Meineke
MA Miteva
Michael Banck
MJ Gómez
N O'Boyle
N Zonta
NM O'Boyle
NM O'Boyle
Noel M O'Boyle
O Sperandio
P Lind
P Murray-Rust
P Murray-Rust
P Murray-Rust
P Murray-Rust
P Rydberg
P Tosco
P Tosco
R Esposito
RA Bauer
RA Bauer
RS Armen
S Arbor
S Ingsriswang
SV Trepalin
T Cheng
T Halgren
T Halgren
T Halgren
T Halgren
T Halgren
T Kogej
T Pencheva
Tim Vandermeersch
TWH Backman
U Schmidt
VV Mihaleva
William H Green
X Jiang
X Wang
YD Paila
Z Huang
Publication venue: BioMed Central
Publication date: 01/01/2011
Field of study

Background: A frequent problem in computational modeling is the interconversion of chemical structures between different formats. While standard interchange formats exist (for example, Chemical Markup Language) and de facto standards have arisen (for example, SMILES format), the need to interconvert formats is a continuing problem due to the multitude of different application areas for chemistry data, differences in the data stored by different formats (0D versus 3D, for example), and competition between software along with a lack of vendorneutral formats. Results: We discuss, for the first time, Open Babel, an open-source chemical toolbox that speaks the many languages of chemical data. Open Babel version 2.3 interconverts over 110 formats. The need to represent such a wide variety of chemical and molecular data requires a library that implements a wide range of cheminformatics algorithms, from partial charge assignment and aromaticity detection, to bond order perception and canonicalization. We detail the implementation of Open Babel, describe key advances in the 2.3 release, and outline a variety of uses both in terms of software products and scientific research, including applications far beyond simple format interconversion. Conclusions: Open Babel presents a solution to the proliferation of multiple chemical file formats. In addition, it provides a variety of useful utilities from conformer searching and 2D depiction, to filtering, batch conversion, and substructure and similarity searching. For developers, it can be used as a programming library to handle chemical data in areas such as organic chemistry, drug design, materials science, and computational chemistry. It is freely available under an open-source license fro

CiteSeerX

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

Irish Universities

PubMed Central

Cork Open Research Archive

Artificial intelligence, machine learning, and drug repurposing in cancer

Author: Aittokallio Tero
Tanoli Ziaurrehman
Vähä-Koskela Markus
Publication venue
Publication date: 01/01/2021
Field of study

Introduction: Drug repurposing provides a cost-effective strategy to re-use approved drugs for new medical indications. Several machine learning (ML) and artificial intelligence (AI) approaches have been developed for systematic identification of drug repurposing leads based on big data resources, hence further accelerating and de-risking the drug development process by computational means. Areas covered: The authors focus on supervised ML and AI methods that make use of publicly available databases and information resources. While most of the example applications are in the field of anticancer drug therapies, the methods and resources reviewed are widely applicable also to other indications including COVID-19 treatment. A particular emphasis is placed on the use of comprehensive target activity profiles that enable a systematic repurposing process by extending the target profile of drugs to include potent off-targets with therapeutic potential for a new indication. Expert opinion: The scarcity of clinical patient data and the current focus on genetic aberrations as primary drug targets may limit the performance of anticancer drug repurposing approaches that rely solely on genomics-based information. Functional testing of cancer patient cells exposed to a large number of targeted therapies and their combinations provides an additional source of repurposing information for tissue-aware AI approaches.Peer reviewe

Helsingin yliopiston digitaalinen arkisto

NORA - Norwegian Open Research Archives