Search CORE

56 research outputs found

Virtual Screening of Bioassay Data

Author: Amanda C Schierz
AR Leach
B Chen
C Drummond
C Elkan
CA Lipinski
D Bradley
EE Bolton
HL Lo
IH Witten
J Hollmen
JA DiMasi
K Liu
P Domingos
T Eitrich
TM Ehrman
VS Sheng
YW Seo
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2009
Field of study

Background: There are three main problems associated with the virtual screening of bioassay data. The first is access to freely-available curated data, the second is the number of false positives that occur in the physical primary screening process, and finally the data is highly-imbalanced with a low ratio of Active compounds to Inactive compounds. This paper first discusses these three problems and then a selection of Weka cost-sensitive classifiers (Naive Bayes, SVM, C4.5 and Random Forest) are applied to a variety of bioassay datasets. Results: Pharmaceutical bioassay data is not readily available to the academic community. The data held at PubChem is not curated and there is a lack of detailed cross-referencing between Primary and Confirmatory screening assays. With regard to the number of false positives that occur in the primary screening process, the analysis carried out has been shallow due to the lack of crossreferencing mentioned above. In six cases found, the average percentage of false positives from the High-Throughput Primary screen is quite high at 64%. For the cost-sensitive classification, Weka's implementations of the Support Vector Machine and C4.5 decision tree learner have performed relatively well. It was also found, that the setting of the Weka cost matrix is dependent on the base classifier used and not solely on the ratio of class imbalance. Conclusions: Understandably, pharmaceutical data is hard to obtain. However, it would be beneficial to both the pharmaceutical industry and to academics for curated primary screening and corresponding confirmatory data to be provided. Two benefits could be gained by employing virtual screening techniques to bioassay data. First, by reducing the search space of compounds to be screened and secondly, by analysing the false positives that occur in the primary screening process, the technology may be improved. The number of false positives arising from primary screening leads to the issue of whether this type of data should be used for virtual screening. Care when using Weka's cost-sensitive classifiers is needed - across the board misclassification costs based on class ratios should not be used when comparing differing classifiers for the same dataset

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Bournemouth University Research Online

Parallel Cost-Sensitive Support Vector Machine Software for Classification

Author: Eitrich T.
Lang B.
Publication venue: John von Neumann Institute for Computing
Publication date: 01/01/2006
Field of study

Juelich Shared Electronic Resources

Maschinelles Lernen - Künstliche Intelligenz praktisch umgesetzt

Author: Eitrich T.
Publication venue
Publication date: 01/01/2003
Field of study

Juelich Shared Electronic Resources

Support-Vektor-Maschinen und ihre Anwendung auf Datensätze aus der pharmazeutischen Forschung

Author: Eitrich T.
Publication venue
Publication date: 01/01/2003
Field of study

Juelich Shared Electronic Resources

Dreistufig parallele Software zur Parameteroptimierung von Support-Vektor-Maschinen mit kostensensitiven Gütemaßen

Author: Eitrich T.
Publication venue: John von Neumann-Institut für Computing
Publication date: 01/01/2007
Field of study

Juelich Shared Electronic Resources

Analysis of Support Vector Machine Training Costs for Large and Unbalanced Data from Pharmaceutical Industry

Author: Eitrich T.
Publication venue
Publication date: 01/01/2005
Field of study

Juelich Shared Electronic Resources

On the Advantages of Weighted L1-Norm Support Vector Learning for Unbalanced Binary Classification Problems

Author: Eitrich T.
Publication venue: Forschungszentrum, Zentralinstitut für Angewandte Mathematik
Publication date: 01/01/2005
Field of study

Juelich Shared Electronic Resources

Support-Vektor-Maschinen: Statistik und Künstliche Intelligenz im Bunde

Author: Eitrich T.
Publication venue
Publication date: 01/01/2003
Field of study

Juelich Shared Electronic Resources

Klassifikations-Algorithmen für Daten aus der Pharmaindustrie

Author: Eitrich T.
Publication venue
Publication date: 01/01/2005
Field of study

Juelich Shared Electronic Resources

Parallel Tuning of Support Vector Machine Learning Parameters for Large and Unbalanced Data Sets

Author: Eitrich T.
Publication venue
Publication date: 01/01/2005
Field of study

Juelich Shared Electronic Resources