Search CORE

7 research outputs found

Modular Chemical Descriptor Language (MCDL): Stereochemical modules

Author: A Dalby
AA Gakh
AA Gakh
Alexander V Yarkov
Andrei A Gakh
C Liang
D Weininger
D Weininger
E Fischer
E Fischer
G Moreau
GP Moss
H Iwamura
JC Chambron
KK Agarwal
Michael N Burnett
RS Cahn
SE Stein
Sergei V Trepalin
SV Trepalin
SV Trepalin
SV Trepalin
T Cieplak
V Prelog
Publication venue: BioMed Central
Publication date: 01/01/2011
Field of study

Abstract Background In our previous papers we introduced the Modular Chemical Descriptor Language (MCDL) for providing a linear representation of chemical information. A subsequent development was the MCDL Java Chemical Structure Editor which is capable of drawing chemical structures from linear representations and generating MCDL descriptors from structures. Results In this paper we present MCDL modules and accompanying software that incorporate unique representation of molecular stereochemistry based on Cahn-Ingold-Prelog and Fischer ideas in constructing stereoisomer descriptors. The paper also contains additional discussions regarding canonical representation of stereochemical isomers, and brief algorithm descriptions of the open source LINDES, Java applet, and Open Babel MCDL processing module software packages. Conclusions Testing of the upgraded MCDL Java Chemical Structure Editor on compounds taken from several large and diverse chemical databases demonstrated satisfactory performance for storage and processing of stereochemical information in MCDL format.</p

Crossref

Directory of Open Access Journals

PubMed Central

Tautomerism in large databases

Author: AR Leach
D He
D Rogers
ED Raczynska
F Milletti
F Oellien
JC Shelley
K Noack
LC Blum
LD Joseph
M Hassan
M Sitzmann
M Smith
Marc C. Nicklaus
Markus Sitzmann
O Bokareva
P Pospisil
SV Trepalin
W Ihlenfeldt
WD Ihlenfeldt
WD Ihlenfeldt
Wolf-Dietrich Ihlenfeldt
Publication venue: Springer Netherlands
Publication date: 01/01/2010
Field of study

We have used the Chemical Structure DataBase (CSDB) of the NCI CADD Group, an aggregated collection of over 150 small-molecule databases totaling 103.5 million structure records, to conduct tautomerism analyses on one of the largest currently existing sets of real (i.e. not computer-generated) compounds. This analysis was carried out using calculable chemical structure identifiers developed by the NCI CADD Group, based on hash codes available in the chemoinformatics toolkit CACTVS and a newly developed scoring scheme to define a canonical tautomer for any encountered structure. CACTVS’s tautomerism definition, a set of 21 transform rules expressed in SMIRKS line notation, was used, which takes a comprehensive stance as to the possible types of tautomeric interconversion included. Tautomerism was found to be possible for more than 2/3 of the unique structures in the CSDB. A total of 680 million tautomers were calculated from, and including, the original structure records. Tautomerism overlap within the same individual database (i.e. at least one other entry was present that was really only a different tautomeric representation of the same compound) was found at an average rate of 0.3% of the original structure records, with values as high as nearly 2% for some of the databases in CSDB. Projected onto the set of unique structures (by FICuS identifier), this still occurred in about 1.5% of the cases. Tautomeric overlap across all constituent databases in CSDB was found for nearly 10% of the records in the collection

Crossref

Springer

Springer - Publisher Connector

PubMed Central

Open Babel: An open chemical toolbox

Author: A Amini
A Andronico
A Bender
A Gakh
A Karwath
A Maunz
A Maunz
A Poater
A Rappe
AA Gakh
AD Hill
B-b Yan
BD McKay
C Helma
C Reynès
Chris Morley
CR Jacob
Craig A James
CW Bullock
D Filimonov
D Lagorce
D Lagorce
D Weininger
DC Bas
DC Lonie
DR Koes
F Fontaine
Geoffrey R Hutchison
GL Holliday
HL Morgan
I Wallach
I Wallach
IV Filippov
IV Tetko
J Ahmed
J Ahmed
J Kazius
J Myers
J Wang
J Wang
JH Chen
JJ Langham
JL Melville
JL Sharman
K Fogel
K Martin
L Fabian
L Liu
L Schietgat
M Brüstle
M Buehler
M Dehmer
M Konyk
M Krier
M Kuhn
MA Meineke
MA Miteva
Michael Banck
MJ Gómez
N O'Boyle
N Zonta
NM O'Boyle
NM O'Boyle
Noel M O'Boyle
O Sperandio
P Lind
P Murray-Rust
P Murray-Rust
P Murray-Rust
P Murray-Rust
P Rydberg
P Tosco
P Tosco
R Esposito
RA Bauer
RA Bauer
RS Armen
S Arbor
S Ingsriswang
SV Trepalin
T Cheng
T Halgren
T Halgren
T Halgren
T Halgren
T Halgren
T Kogej
T Pencheva
Tim Vandermeersch
TWH Backman
U Schmidt
VV Mihaleva
William H Green
X Jiang
X Wang
YD Paila
Z Huang
Publication venue: BioMed Central
Publication date: 01/01/2011
Field of study

Background: A frequent problem in computational modeling is the interconversion of chemical structures between different formats. While standard interchange formats exist (for example, Chemical Markup Language) and de facto standards have arisen (for example, SMILES format), the need to interconvert formats is a continuing problem due to the multitude of different application areas for chemistry data, differences in the data stored by different formats (0D versus 3D, for example), and competition between software along with a lack of vendorneutral formats. Results: We discuss, for the first time, Open Babel, an open-source chemical toolbox that speaks the many languages of chemical data. Open Babel version 2.3 interconverts over 110 formats. The need to represent such a wide variety of chemical and molecular data requires a library that implements a wide range of cheminformatics algorithms, from partial charge assignment and aromaticity detection, to bond order perception and canonicalization. We detail the implementation of Open Babel, describe key advances in the 2.3 release, and outline a variety of uses both in terms of software products and scientific research, including applications far beyond simple format interconversion. Conclusions: Open Babel presents a solution to the proliferation of multiple chemical file formats. In addition, it provides a variety of useful utilities from conformer searching and 2D depiction, to filtering, batch conversion, and substructure and similarity searching. For developers, it can be used as a programming library to handle chemical data in areas such as organic chemistry, drug design, materials science, and computational chemistry. It is freely available under an open-source license fro

CiteSeerX

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

Irish Universities

PubMed Central

Cork Open Research Archive

Combining Machine Learning Systems and Multiple Docking Simulation Packages to Improve Docking Prediction Reliability for Network Pharmacology

Author: A Dalby
A Oda
AN Jain
C Venkatachalam
D Böhning
D Plewczynski
D Plewczynski
D Weininger
E Caron
Franca Fraternali
GM Morris
H Kitano
Hiroaki Kitano
I Manousaridis
J Ferguson
J Sadowski
JP Bai
K Oda
K Oda
Kun-Yi Hsin
L Breiman
L Xie
M Cases
M Rarey
ML Verdonk
MW Karaman
NM O'Boyle
O Trott
P Csermely
P Englebienne
PJ Ballester
PW Rose
R Kerkela
R Teramoto
R Wang
R Wang
RA Friesner
S Ghosh
S Wilhelm
Samik Ghosh
SV Trepalin
T Cheng
T Force
Y Chen
Z Zsoldos
Publication venue: 'Public Library of Science (PLoS)'
Publication date
Field of study

Crossref

Assessment of tautomer distribution using the condensed reaction graph approach

Author: A Dalby
A Nicholls
A Varnek
A Varnek
A. Varnek
AD Becke
AI Lin
AV Marenich
C Muller
C-C Chang
CJ Cramer
D Horvath
D Horvath
DN Laikov
DN Laikov
F Bonachéra
F Hoonakker
F Milletti
F Oellien
F Ruggiu
F Ruggiu
G Kjellin
G Kjellin
G Rastelli
I Soteras
I. I. Baskin
I. S. Antipin
J Catalán
J Catalán
J Stewart
J Tomasi
J Tomasi
J-L Stigliani
JB Brown
JP Perdew
JR Greenwood
JR Pliego
L Guasch
L Guasch
M Garcia-Viloca
M Harańczyk
M Mathea
M Sitzmann
MJ Kamlet
MJ Kamlet
NP Todorov
NT Kochev
P Pospisil
R. I. Nugmanov
RA Sayle
RF Ribeiro
RI Nugmanov
RW Taft
S-O Chua
SV Trepalin
T Clark
T. I. Madzhidov
T. R. Gimadiev
TI Madzhidov
TI Madzhidov
TI Madzhidov
TII Madzhidov
VA Palm
W Warr
Y Martin
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Crossref

PubChem chemical structure standardization

Author: Volker D. Hähnke
Sunghwan Kim
Evan E. Bolton
FK Brown
M Hann
J Gasteiger
T Engel
A Varnek
M Vogt
J Brecher
D Weininger
D Weininger
A McNaught
SR Heller
S Ash
RW Homer
AA Gakh
AA Gakh
R Panico
HA Favre
GJ Leigh
A Dalby
WA Warr
S Urbaczek
SA Akhondi
EC Meng
JC Baber
M Hendlich
S Urbaczek
D Young
RA Sayle
AR Katritzky
E Ferrari
RM Balabin
J Elguero
T Scior
M Sitzmann
P Pospisil
F Oellien
NP Todorov
T Kalliokoski
SW Muchmore
HA Duarte
YH Jang
J Hastings
C Bobach
SV Trepalin
YC Martin
F Milletti
JR Greenwood
S Urbaczek
A Gobbi
WA Warr
PV Schleyer
D Lloyd
MK Cyranski
M Randic
A Stanger
E Hückel
E Hückel
A Kekulé
A Kekulé
WC Herndon
M Randic
BDJ Blazic
I Gutman
F Cai
Z Rashid
SK Kearsley
P Hansen
B Blessington
E Martin
D Fourches
RD Clark
KS Egorova
T Oprea
P Tiikkainen
S Kim
S Kim
YL Wang
J McEntyre
EE Bolton
EE Bolton
S Kim
WA Warr
M Fanton
FB Rogers
G Audi
HC Ehrlich
NM O’Boyle
AM Clark
J Brecher
M Razinger
M Perdih
T Cieplak
DJ Wild
G Schneider
RS Cahn
P Ertl
HL Morgan
J Figueras
WD Ihlenfeldt
WD Ihlenfeldt
S Kim
Publication venue: BMC
Publication date: 01/01/2011
Field of study

Abstract Background PubChem is a chemical information repository, consisting of three primary databases: Substance, Compound, and BioAssay. When individual data contributors submit chemical substance descriptions to Substance, the unique chemical structures are extracted and stored into Compound through an automated process called structure standardization. The present study describes the PubChem standardization approaches and analyzes them for their success rates, reasons that cause structures to be rejected, and modifications applied to structures during the standardization process. Furthermore, the PubChem standardization is compared to the structure normalization of the IUPAC International Chemical Identifier (InChI) software, as manifested by conversion of the InChI back into a chemical structure. Results The observed rejection rate for substances processed by PubChem standardization was 0.36%, which is predominantly attributed to structures with invalid atom valences that cannot be readily corrected without additional information from contributors. Of all structures that pass standardization, 44% are modified in the process, reducing the count of unique structures from 53,574,724 in substance to 45,808,881 in compound as identified by de-aromatized canonical isomeric SMILES. Even though the processing time is very low on average (only 0.4% of structures have individual standardization time above 0.1 s), total standardization time is completely dominated by edge cases: 90% of the time to standardize all structures in PubChem substance is spent on the 2.05% of structures with the highest individual standardization time. It is worth noting that 60% of the structures obtained from PubChem structure standardization are not identical to the chemical structure resulting from the InChI (primarily due to preferences for a different tautomeric form). Conclusions Standardization of chemical structures is complicated by the diversity of chemical information and their representations approaches. The PubChem standardization is an effective and efficient tool to account for molecular diversity and to eliminate invalid/incomplete structures. Further development will concentrate on improved tautomer consideration and an expanded stereocenter definition. Modifications are difficult to thoroughly validate, with slight changes often affecting many thousands of structures and various edge cases. The PubChem structure standardization service is accessible as a public resource (https://pubchem.ncbi.nlm.nih.gov/standardize), and via programmatic interfaces

Crossref

ucs.sulsellib.net

Directory of Open Access Journals

Many InChIs and quite some feat

Author: A Barth
A Dalby
A Drefahl
A Gakh
A Gaulton
A Gobbi
A Kazakov
A Kos
A McNaught
A Monge
A Simon
A Toropov
A Tropsha
A Williams
A Yerin
AA Toropov
AA Toropov
AA Toropov
AA Toropov
AE Day
AJ Carroll
AJ Carroll
AJ Lawson
AJ Pawson
AJ Williams
AJ Williams
AJ Williams
AJ Williams
AJ Williams
AJ Williams
AJ Williams
AL Teixeira
AM Richard
AM Richard
AM Wassermann
AP Toropova
AR Kinjo
AT Valko
AV Zakharov
B Chen
B Hardy
B Plainchont
B Zhou
B Zhou
BD McKay
C Bertinetto
C Bertinetto
C Bobach
C Hill
C Laurence
C Ludwig
C Southan
C Southan
C Southan
C Southan
C Steinbeck
C Steinbeck
C Steinbeck
C Zhang
D Goldmann
D Jessop
D Jessop
D Weininger
D Weininger
DR Burgess
DR Burgess
DS Wishart
DS Wishart
DS Wishart
DS Wishart
DS Wishart
DS Wishart
E Fahy
E Gregori-Puigjané
E Martin
E Willighagen
E Zass
E Zass
EE Bolton
EL Schymanski
EL Willighagen
EO Cannon
F Mu
G Grethe
G Ivan
G Ivan
G Iván
G Wohlgemuth
GDJ Davis
GR Magoon
H Haraldsdottir
H Jenkins
H Kalchhauser
H Kraut
H Redestig
HL Morgan
I Pletnev
I Schomburg
ID Brown
IS Yadav
IV Filippov
J Barthelmes
J Chambers
J Chambers
J Choi
J Downing
J Frey
J Frey
J Galgonek
J Gu
J Hastings
J Hastings
J Hummel
J Masciocchi
J Nielsen
J Park
J Peironcely
J Rhodes
J Thibault
J Townsend
JD Westbrook
JG Frey
JG Frey
JJ Langham
JL Sharman
JM Fostel
JN Currano
JR McDaniel
JW May
K Degtyarenko
K Degtyarenko
K Haug
K Henrick
K Hettne
K Nöh
K Tallapragada
K Tanaka
KB Arvidson
KM Hettne
KP Seiler
KR Taylor
L Ahmed
L Chepelev
L Chepelev
L Fabian
L Sumner
LG Nashev
M Annies
M Borkum
M Brown
M Fanton
M Hilbig
M Kuhn
M Kuhn
M Kuhn
M Kuhn
M Lang
M Nowotka
M Rojas-Chertó
M Samwald
M Sitzmann
M Zimmermann
M Zimmermann
MD Prasanna
MD Prasanna
MD Stobbe
MD Stobbe
ME Cass
MH Maeda
MJ Herrgard
MK Gilson
N Jeliazkova
N O’Boyle
N O’Boyle
NM O’Boyle
NM O’Boyle
NT Kochev
O Casher
O Casher
O Fiehn
O Spjuth
O Spjuth
O Spjuth
P Carbonell
P Matos de
P Murray-Rust
P Murray-Rust
P Murray-Rust
P Murray-Rust
P Murray-Rust
P Tiikkainen
PW Rose
R Dunkel
R Gledhill
R Huang
R Kiss
R Klinger
R Ordog
R Ramakrishnan
R Shirley
R Smith
RC Murphy
RD Benz
RD Finn
RJ Schenck
RJM Weber
RW Homer
S Ash
S Bachrach
S Chavan
S Heller
S Kuhn
S Moco
S Muresan
S Muresan
S Orchard
SA Akhondi
SG Spanton
SJ Coles
SJ Coles
SM Bachrach
SP Kelley
SR Heller
SR Johnson
SV Trepalin
T Altman
T Bernard
T Ginex
T Kind
T Liu
T Thalheim
T Thalheim
T Velden
T Will
TJ Bruno
TS Totton
U Rossler
U Schmidt
V Guilloux
V Law
V Ruusmann
V Wakelam
W Bremser
W Ihlenfeldt
W Phadungsukanan
W-D Ihlenfeldt
WA Warr
WA Warr
Wendy A. Warr
X Qu
Y Liu
Y Qiao
Y Sushko
YA Ba
YS Cho
Z Szabadka
Z Szabadka
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Crossref