Search CORE

6,396 research outputs found

A tree-based method for the rapid screening of chemical fingerprints

Author: Kristensen Thomas G
Nielsen Jesper
Pedersen Christian NS
Publication venue: BioMed Central
Publication date: 01/01/2010
Field of study

Abstract Background The fingerprint of a molecule is a bitstring based on its structure, constructed such that structurally similar molecules will have similar fingerprints. Molecular fingerprints can be used in an initial phase of drug development for identifying novel drug candidates by screening large databases for molecules with fingerprints similar to a query fingerprint. Results In this paper, we present a method which efficiently finds all fingerprints in a database with Tanimoto coefficient to the query fingerprint above a user defined threshold. The method is based on two novel data structures for rapid screening of large databases: the <it>k</it>D grid and the Multibit tree. The <it>k</it>D grid is based on splitting the fingerprints into <it>k </it>shorter bitstrings and utilising these to compute bounds on the similarity of the complete bitstrings. The Multibit tree uses hierarchical clustering and similarity within each cluster to compute similar bounds. We have implemented our method and tested it on a large real-world data set. Our experiments show that our method yields approximately a three-fold speed-up over previous methods. Conclusions Using the novel <it>k</it>D grid and Multibit tree significantly reduce the time needed for searching databases of fingerprints. This will allow researchers to (1) perform more searches than previously possible and (2) to easily search large databases.</p

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Chemical fingerprinting of wood sampled along a pith-to-bark gradient for individual comparison and provenance identification

Author: Beeckman Hans
Deklerck Victor
Espinoza Edgard
Lancaster Cady
Van Acker Joris
Van den Bulcke Jan
Publication venue: 'MDPI AG'
Publication date: 01/01/2020
Field of study

Background and Objectives: The origin of traded timber is one of the main questions in the enforcement of regulations to combat the illegal timber trade. Substantial efforts are still needed to develop techniques that can determine the exact geographical provenance of timber and this is vital to counteract the destructive effects of illegal logging, ranging from economical loss to habitat destruction. The potential of chemical fingerprints from pith-to-bark growth rings for individual comparison and geographical provenance determination is explored. Materials and Methods: A wood sliver was sampled per growth ring from four stem disks from four individuals of Pericopsis elata (Democratic Republic of the Congo) and from 14 stem disks from 14 individuals of Terminalia superba (Cote d'Ivoire and Democratic Republic of the Congo). Chemical fingerprints were obtained by analyzing these wood slivers with Direct Analysis in Real Time Time-Of-Flight Mass Spectrometry (DART TOFMS). Results: Individual distinction for both species was achieved but the accuracy was dependent on the dataset size and number of individuals included. As this is still experimental, we can only speak of individual comparison and not individual distinction at this point. The prediction accuracy for the country of origin increases with increasing sample number and a random sample can be placed in the correct country. When a complete disk is removed from the training dataset, its rings (samples) are correctly attributed to the country with an accuracy ranging from 43% to 100%. Relative abundances of ions appear to contribute more to differentiation compared to frequency differences. Conclusions: DART TOFMS shows potential for geographical provenancing but is still experimental for individual distinction; more research is needed to make this an established method. Sampling campaigns should focus on sampling tree cores from pith-to-bark, paving the way towards a chemical fingerprint database for species provenance

Multidisciplinary Digital Publishing Institute

Ghent University Academic Bibliography

Inductive queries for a drug designing robot scientist

Author: A. Lingas
C. Hansch
C.A. Lipinski
D.R. Jones
D.R. Jones
H. Blockeel
J. Matousek
L. Raedt De
R.D. King
R.D. King
T. Gärtner
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2010
Field of study

It is increasingly clear that machine learning algorithms need to be integrated in an iterative scientific discovery loop, in which data is queried repeatedly by means of inductive queries and where the computer provides guidance to the experiments that are being performed. In this chapter, we summarise several key challenges in achieving this integration of machine learning and data mining algorithms in methods for the discovery of Quantitative Structure Activity Relationships (QSARs). We introduce the concept of a robot scientist, in which all steps of the discovery process are automated; we discuss the representation of molecular data such that knowledge discovery tools can analyse it, and we discuss the adaptation of machine learning and data mining algorithms to guide QSAR experiments

Lirias

Crossref

Bournemouth University Research Online

The University of Manchester - Institutional Repository

DIAL UCLouvain

Software for supporting large scale data processing for High Throughput Screening

Author: Tai David
Publication venue: 'Paleontological Institute at The University of Kansas'
Publication date: 01/01/2011
Field of study

High Throughput Screening for is a valuable data generation technique for data driven knowledge discovery. Because the rate of data generation is so great, it is a challenge to cope with the demands of post experiment data analysis. This thesis presents three software solutions that I implemented in an attempt to alleviate this problem. The first is K-Screen, a Laboratory Information Management System designed to handle and visualize large High Throughput Screening datasets. K-Screen is being successfully used by the University of Kansas High Throughput Screening Laboratory to better organize and visualize their data. The next two algorithms are designed to accelerate the search times for chemical similarity searches using 1-dimensional fingerprints. The first algorithm balances information content in bit strings to attempt to find more optimal ordering and segmentation patterns for chemical fingerprints. The second algorithm eliminates redundant pruning calculations for large batch chemical similarity searches and shows a 250% improvement for the fastest current fingerprint search algorithm for large batch queries

KU ScholarWorks

Application of Infrared and Raman Spectroscopy for the Identification of Disease Resistant Trees

Author: Bonello Pierluigi
Conrad Anna O.
Publication venue: UKnowledge
Publication date: 01/01/2016
Field of study

New approaches for identifying disease resistant trees are needed as the incidence of diseases caused by non-native and invasive pathogens increases. These approaches must be rapid, reliable, cost-effective, and should have the potential to be adapted for high-throughput screening or phenotyping. Within the context of trees and tree diseases, we summarize vibrational spectroscopic and chemometric methods that have been used to distinguish between groups of trees which vary in disease susceptibility or other important characteristics based on chemical fingerprint data. We also provide specific examples from the literature of where these approaches have been used successfully. Finally, we discuss future application of these approaches for wide-scale screening and phenotyping efforts aimed at identifying disease resistant trees and managing forest diseases

Directory of Open Access Journals

Frontiers - Publisher Connector

PubMed Central

University of Kentucky

LightGBM: An Effective and Scalable Algorithm for Prediction of Chemical Toxicity – Application to the Tox21 and Mutagenicity Datasets

Author: Mucs D
Norinder U
Svensson F
Zhang J
Publication venue: 'American Chemical Society (ACS)'
Publication date: 28/10/2019
Field of study

Machine learning algorithms have attained widespread use in assessing the potential toxicities of pharmaceuticals and industrial chemicals because of their faster-speed and lower-cost compared to experimental bioassays. Gradient boosting is an effective algorithm that often achieves high predictivity, but historically the relative long computational time limited its applications in predicting large compound libraries or developing in silico predictive models that require frequent retraining. LightGBM, a recent improvement of the gradient boosting algorithm inherited its high predictivity but resolved its scalability and long computational time by adopting leaf-wise tree growth strategy and introducing novel techniques. In this study, we compared the predictive performance and the computational time of LightGBM to deep neural networks, random forests, support vector machines, and XGBoost. All algorithms were rigorously evaluated on publicly available Tox21 and mutagenicity datasets using a Bayesian optimization integrated nested 10-fold cross-validation scheme that performs hyperparameter optimization while examining model generalizability and transferability to new data. The evaluation results demonstrated that LightGBM is an effective and highly scalable algorithm offering the best predictive performance while consuming significantly shorter computational time than the other investigated algorithms across all Tox21 and mutagenicity datasets. We recommend LightGBM for applications in in silico safety assessment and also in other areas of cheminformatics to fulfill the ever-growing demand for accurate and rapid prediction of various toxicity or activity related endpoints of large compound libraries present in the pharmaceutical and chemical industry

UCL Discovery

Scalable Similarity Search for Molecular Descriptors

Author: A Leach
AM Bender
B Chen
D Vida
J Chen
M Keiser
M Kotera
M Kotera
R Nasr
R Sawada
R Todeschini
TG Kristensen
Publication venue
Publication date: 09/08/2017
Field of study

Similarity search over chemical compound databases is a fundamental task in the discovery and design of novel drug-like molecules. Such databases often encode molecules as non-negative integer vectors, called molecular descriptors, which represent rich information on various molecular properties. While there exist efficient indexing structures for searching databases of binary vectors, solutions for more general integer vectors are in their infancy. In this paper we present a time- and space- efficient index for the problem that we call the succinct intervals-splitting tree algorithm for molecular descriptors (SITAd). Our approach extends efficient methods for binary-vector databases, and uses ideas from succinct data structures. Our experiments, on a large database of over 40 million compounds, show SITAd significantly outperforms alternative approaches in practice.Comment: To be appeared in the Proceedings of SISAP'1

arXiv.org e-Print Archive

Crossref

Substrate-specific clades of active marine methylotrophs associated with a phytoplankton bloom in a temperate coastal environment

Author: Boden Rich
Moussard Hélène
Murrell J. C. (J. Colin)
Neufeld Josh D.
Schäfer Hendrik
Publication venue: 'American Society for Microbiology'
Publication date: 10/10/2008
Field of study

Marine microorganisms that consume one-carbon (C1) compounds are poorly described, despite their impact on global climate via an influence on aquatic and atmospheric chemistry. This study investigated marine bacterial communities involved in the metabolism of C1 compounds. These communities were of relevance to surface seawater and atmospheric chemistry in the context of a bloom that was dominated by phytoplankton known to produce dimethylsulfoniopropionate. In addition to using 16S rRNA gene fingerprinting and clone libraries to characterize samples taken from a bloom transect in July 2006, seawater samples from the phytoplankton bloom were incubated with 13C-labeled methanol, monomethylamine, dimethylamine, methyl bromide, and dimethyl sulfide to identify microbial populations involved in the turnover of C1 compounds, using DNA stable isotope probing. The [13C]DNA samples from a single time point were characterized and compared using denaturing gradient gel electrophoresis (DGGE), fingerprint cluster analysis, and 16S rRNA gene clone library analysis. Bacterial community DGGE fingerprints from 13C-labeled DNA were distinct from those obtained with the DNA of the nonlabeled community DNA and suggested some overlap in substrate utilization between active methylotroph populations growing on different C1 substrates. Active methylotrophs were affiliated with Methylophaga spp. and several clades of undescribed Gammaproteobacteria that utilized methanol, methylamines (both monomethylamine and dimethylamine), and dimethyl sulfide. rRNA gene sequences corresponding to populations assimilating 13C-labeled methyl bromide and other substrates were associated with members of the Alphaproteobacteria (e.g., the family Rhodobacteraceae), the Cytophaga-Flexibacter-Bacteroides group, and unknown taxa. This study expands the known diversity of marine methylotrophs in surface seawater and provides a comprehensive data set for focused cultivation and metagenomic analyses in the future

Crossref

PubMed Central

Warwick Research Archives Portal Repository

University of East Anglia digital repository