Search CORE

Éclair—a web service for unravelling species origin of sequences sampled from mixed host interfaces

Author: Rudd Stephen
Tetko Igor V.
Publication venue: Oxford University Press
Publication date: 01/01/2005
Field of study

The identification of the genes that participate at the biological interface of two species remains critical to our understanding of the mechanisms of disease resistance, disease susceptibility and symbiosis. The sequencing of complementary DNA (cDNA) libraries prepared from the biological interface between two organisms provides an inexpensive way to identify the novel genes that may be expressed as a cause or consequence of compatible or incompatible interactions. Sequence classification and annotation of species origin typically use an orthology-based approach and require access to large portions of either genome, or a close relative. Novel species- or clade-specific sequences may have no counterpart within existing databases and remain ambiguous features. Here we present a web-service, Éclair, which utilizes support vector machines for the classification of the origin of expressed sequence tags stemming from mixed host cDNA libraries. In addition to providing an interface for the classification of sequences, users are presented with the opportunity to train a model to suit their preferred species pair. Éclair is freely available at

Crossref

The development of models to predict melting and pyrolysis point data associated with several hundred thousand compounds mined from PATENTS

Author: Antony J. Williams
Daniel M. Lowe
Igor V. Tetko
Publication venue: Springer Nature
Publication date: 01/01/2016
Field of study

BACKGROUND: Melting point (MP) is an important property in regards to the solubility of chemical compounds. Its prediction from chemical structure remains a highly challenging task for quantitative structure-activity relationship studies. Success in this area of research critically depends on the availability of high quality MP data as well as accurate chemical structure representations in order to develop models. Currently, available datasets for MP predictions have been limited to around 50k molecules while lots more data are routinely generated following the synthesis of novel materials. Significant amounts of MP data are freely available within the patent literature and, if it were available in the appropriate form, could potentially be used to develop predictive models. RESULTS: We have developed a pipeline for the automated extraction and annotation of chemical data from published PATENTS. Almost 300,000 data points have been collected and used to develop models to predict melting and pyrolysis (decomposition) points using tools available on the OCHEM modeling platform (http://ochem.eu). A number of technical challenges were simultaneously solved to develop models based on these data. These included the handing of sparse data matrices with >200,000,000,000 entries and parallel calculations using 32 × 6 cores per task using 13 descriptor sets totaling more than 700,000 descriptors. We showed that models developed using data collected from PATENTS had similar or better prediction accuracy compared to the highly curated data used in previous publications. The separation of data for chemicals that decomposed rather than melting, from compounds that did undergo a normal melting transition, was performed and models for both pyrolysis and MPs were developed. The accuracy of the consensus MP models for molecules from the drug-like region of chemical space was similar to their estimated experimental accuracy, 32 °C. Last but not least, important structural features related to the pyrolysis of chemicals were identified, and a model to predict whether a compound will decompose instead of melting was developed. CONCLUSIONS: We have shown that automated tools for the analysis of chemical information have reached a mature stage allowing for the extraction and collection of high quality data to enable the development of structure-activity relationship models. The developed models and data are publicly available at http://ochem.eu/article/99826

FigShare

Classification of CYP450 1A2 inhibitors using PubChem data

Author: Körner R
Novotarskyi Sergii
Pandey AK
Sushko Iurii
Tetko Igor
Publication venue: BioMed Central
Publication date: 01/01/2010
Field of study

Directory of Open Access Journals

Applicability domain for classification problems

Author: Körner R
Novotarskyi S
Pandey AK
Sushko Iurii
Tetko Igor
Publication venue: BioMed Central
Publication date: 01/01/2010
Field of study

Directory of Open Access Journals

QSAR modeling for In vitro assays: linking ToxCast™ database to the integrated modeling framework “OCHEM”

Author: Abdelaziz Ahmed
Körner Robert
Novotarskyi Sergii
Sushko Iurii
Teetz Wolfram
Tetko Igor V
Publication venue: BioMed Central
Publication date: 01/01/2012
Field of study

ToxCast™ project, phases I and II, is testing a combined total of 960 unique chemicals with more than 650 high-throughput assays. The aim of this database is to use advanced science tools to help understand how human body processes are impacted by exposures to chemicals and helps determine which exposures are most likely to lead to adverse health effects. To better serve this goal and to allow In silico analysis of In vitro assays, we linked the database with an integrated QSAR modeling framework. The Online Chemical Modeling Environment is a web-based platform that aims to automate and simplify the typical steps required for QSAR modeling. The platform consists of two major subsystems: the database of experimental measurements and the modeling framework. A user-contributed database contains a set of tools for easy input, search and modification of thousands of records. The OCHEM database is based on the wiki principle and focuses primarily on the quality and verifiability of the data. The database is tightly integrated with the modeling framework, which supports all the steps required to create a predictive model: data search, calculation and selection of a vast variety of molecular descriptors, application of machine learning methods, validation, analysis of the model and assessment of the applicability domain. Our intention is to make OCHEM a widely used platform to perform the QSPR/QSAR studies online and share it with other users on the Web. By such integration, scientists can model In vitro assays using In silico descriptor packages while making benefit of multi-learning features and automatics applicability domain estimation. http://ochem.eu &nbsp

Crossref