Search CORE

28 research outputs found

Kernel-based estimation of the applicability domain of QSAR models

Author: A Jahn
A Zell
Georg Hinselmann
Nikolas Fechner
Publication venue: Springer Nature
Publication date: 04/05/2010
Field of study

Springer - Publisher Connector

PubMed Central

Estimation of the applicability domain of kernel-based machine learning models for virtual screening

Author: Fechner Nikolas
Hinselmann Georg
Jahn Andreas
Zell Andreas
Publication venue: BioMed Central
Publication date: 01/01/2010
Field of study

Springer - Publisher Connector

PubMed Central

jCompoundMapper: An open source Java library and command-line tool for chemical fingerprints

Author: Fechner Nikolas
Hinselmann Georg
Jahn Andreas
Rosenbaum Lars
Zell Andreas
Publication venue: BioMed Central
Publication date: 01/01/2011
Field of study

Abstract Background The decomposition of a chemical graph is a convenient approach to encode information of the corresponding organic compound. While several commercial toolkits exist to encode molecules as so-called fingerprints, only a few open source implementations are available. The aim of this work is to introduce a library for exactly defined molecular decompositions, with a strong focus on the application of these features in machine learning and data mining. It provides several options such as search depth, distance cut-offs, atom- and pharmacophore typing. Furthermore, it provides the functionality to combine, to compare, or to export the fingerprints into several formats. Results We provide a Java 1.6 library for the decomposition of chemical graphs based on the open source Chemistry Development Kit toolkit. We reimplemented popular fingerprinting algorithms such as depth-first search fingerprints, extended connectivity fingerprints, autocorrelation fingerprints (e.g. CATS2D), radial fingerprints (e.g. Molprint2D), geometrical Molprint, atom pairs, and pharmacophore fingerprints. We also implemented custom fingerprints such as the all-shortest path fingerprint that only includes the subset of shortest paths from the full set of paths of the depth-first search fingerprint. As an application of jCompoundMapper, we provide a command-line executable binary. We measured the conversion speed and number of features for each encoding and described the composition of the features in detail. The quality of the encodings was tested using the default parametrizations in combination with a support vector machine on the Sutherland QSAR data sets. Additionally, we benchmarked the fingerprint encodings on the large-scale Ames toxicity benchmark using a large-scale linear support vector machine. The results were promising and could often compete with literature results. On the large Ames benchmark, for example, we obtained an AUC ROC performance of 0.87 with a reimplementation of the extended connectivity fingerprint. This result is comparable to the performance achieved by a non-linear support vector machine using state-of-the-art descriptors. On the Sutherland QSAR data set, the best fingerprint encodings showed a comparable or better performance on 5 of the 8 benchmarks when compared against the results of the best descriptors published in the paper of Sutherland et al. Conclusions jCompoundMapper is a library for chemical graph fingerprints with several tweaking possibilities and exporting options for open source data mining toolkits. The quality of the data mining results, the conversion speed, the LPGL software license, the command-line interface, and the exporters should be useful for many applications in cheminformatics like benchmarks against literature methods, comparison of data mining algorithms, similarity searching, and similarity-based data mining.</p

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Efficient extraction of canonical spatial relationships using a recursive enumeration of k-subsets

Author: A Jahn
Andreas Zell
Georg Hinselmann
Nikolas Fechner
P Mahé
T Rolfe
Publication venue: BioMed Central
Publication date: 01/01/2010
Field of study

The spatial arrangement of a chemical compound plays an important role regarding the related properties or activities. A straightforward approach to encode the geometry is to enumerate pairwise spatial relationships between k substructures, like functional groups or subgraphs. This leads to a combinatorial explosion with th

CiteSeerX

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Automatic pharmacophore model generation using weighted substructure assignments

Author: A Jahn
A Zell
Andreas Jahn
Georg Hinselmann
H Planatscher
JF Truchon
N Huang
Nikolas Fechner
Publication venue: BioMed Central
Publication date: 01/01/2010
Field of study

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Interpreting linear support vector machine models with heat map molecule coloring

Author: A Bender
Andreas Jahn
Andreas Zell
B Schölkopf
C Steinbeck
D Bossemeyer
D Fourches
D Rogers
D Weininger
G Hinselmann
Georg Hinselmann
H Kubinyi
I Guyon
J Bajorath
J Kazius
J Mohr
J Orts
K Hasegawa
KD Freeman-Cook
KH Bleicher
L Han
L Prade
L Ralaivola
Lars Rosenbaum
MS Buchanan
N Fechner
P Jonathan
RE Fan
SG Rohrer
SJ Swamidass
SM Free
Publication venue: BioMed Central
Publication date: 01/01/2011
Field of study

Abstract Background Model-based virtual screening plays an important role in the early drug discovery stage. The outcomes of high-throughput screenings are a valuable source for machine learning algorithms to infer such models. Besides a strong performance, the interpretability of a machine learning model is a desired property to guide the optimization of a compound in later drug discovery stages. Linear support vector machines showed to have a convincing performance on large-scale data sets. The goal of this study is to present a heat map molecule coloring technique to interpret linear support vector machine models. Based on the weights of a linear model, the visualization approach colors each atom and bond of a compound according to its importance for activity. Results We evaluated our approach on a toxicity data set, a chromosome aberration data set, and the maximum unbiased validation data sets. The experiments show that our method sensibly visualizes structure-property and structure-activity relationships of a linear support vector machine model. The coloring of ligands in the binding pocket of several crystal structures of a maximum unbiased validation data set target indicates that our approach assists to determine the correct ligand orientation in the binding pocket. Additionally, the heat map coloring enables the identification of substructures important for the binding of an inhibitor. Conclusions In combination with heat map coloring, linear support vector machine models can help to guide the modification of a compound in later stages of drug discovery. Particularly substructures identified as important by our method might be a starting point for optimization of a lead compound. The heat map coloring should be considered as complementary to structure based modeling approaches. As such, it helps to get a better understanding of the binding mode of an inhibitor.</p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Optimal assignment methods for ligand-based virtual screening

Abstract Background Ligand-based virtual screening experiments are an important task in the early drug discovery stage. An ambitious aim in each experiment is to disclose active structures based on new scaffolds. To perform these "scaffold-hoppings" for individual problems and targets, a plethora of different similarity methods based on diverse techniques were published in the last years. The optimal assignment approach on molecular graphs, a successful method in the field of quantitative structure-activity relationships, has not been tested as a ligand-based virtual screening method so far. Results We evaluated two already published and two new optimal assignment methods on various data sets. To emphasize the "scaffold-hopping" ability, we used the information of chemotype clustering analyses in our evaluation metrics. Comparisons with literature results show an improved early recognition performance and comparable results over the complete data set. A new method based on two different assignment steps shows an increased "scaffold-hopping" behavior together with a good early recognition performance. Conclusion The presented methods show a good combination of chemotype discovery and enrichment of active structures. Additionally, the optimal assignment on molecular graphs has the advantage to investigate and interpret the mappings, allowing precise modifications of internal parameters of the similarity measure for specific targets. All methods have low computation times which make them applicable to screen large data sets.</p

CiteSeerX

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Linking the Epigenome to the Genome: Correlation of Different Features to DNA Methylation of CpG Islands

Author: A Barski
A Bird
A Henckel
A Jeltsch
A Meissner
A Siepel
AH Ting
Andreas Zell
AP Bird
B Rhead
BE Bernstein
BE Bernstein
Brock C. Christensen
C Bock
C Bock
C Bock
C Previti
C Wrzodek
CC Chang
CD Bustos
Clemens Wrzodek
D Jia
D Takai
D Zilberman
DE Schones
E Schilling
EJ Gardiner
ES Lander
F Antequera
F Antequera
F Eckhardt
F Fang
F Fuks
F Mohn
FA Feltus
Finja Büchel
Florian Mittag
GD Stormo
Georg Hinselmann
H Cedar
H Vikas
JF Costello
JG Cleary
Johannes Eichner
JT Bell
KL Thu
M Burset
M Esteller
M Esteller
M Gardiner-Garden
M Hall
M Oka
P Baldi
P Dehan
P Hajkova
PA Jones
R Das
R Fan
R Lister
RA Rollins
RM Brena
RM Brena
S Aerts
S Fan
S Kim
S Kochanek
SE Celniker
SKT Ooi
W Reik
WJ Kent
Y Wang
Y Zhang
Publication venue: Public Library of Science
Publication date: 01/01/2012
Field of study

DNA methylation of CpG islands plays a crucial role in the regulation of gene expression. More than half of all human promoters contain CpG islands with a tissue-specific methylation pattern in differentiated cells. Still today, the whole process of how DNA methyltransferases determine which region should be methylated is not completely revealed. There are many hypotheses of which genomic features are correlated to the epigenome that have not yet been evaluated. Furthermore, many explorative approaches of measuring DNA methylation are limited to a subset of the genome and thus, cannot be employed, e.g., for genome-wide biomarker prediction methods. In this study, we evaluated the correlation of genetic, epigenetic and hypothesis-driven features to DNA methylation of CpG islands. To this end, various binary classifiers were trained and evaluated by cross-validation on a dataset comprising DNA methylation data for 190 CpG islands in HEPG2, HEK293, fibroblasts and leukocytes. We achieved an accuracy of up to 91% with an MCC of 0.8 using ten-fold cross-validation and ten repetitions. With these models, we extended the existing dataset to the whole genome and thus, predicted the methylation landscape for the given cell types. The method used for these predictions is also validated on another external whole-genome dataset. Our results reveal features correlated to DNA methylation and confirm or disprove various hypotheses of DNA methylation related features. This study confirms correlations between DNA methylation and histone modifications, DNA structure, DNA sequence, genomic attributes and CpG island properties. Furthermore, the method has been validated on a genome-wide dataset from the ENCODE consortium. The developed software, as well as the predicted datasets and a web-service to compare methylation states of CpG islands are available at http://www.cogsys.cs.uni-tuebingen.de/software/dna-methylation/

CiteSeerX

Public Library of Science (PLOS)

Crossref

Directory of Open Access Journals

PubMed Central

Publikationsserver der Universität Tübingen