21 research outputs found
Automated Identification and Classification of Stereochemistry: Chirality and Double Bond Stereoisomerism
Stereoisomers have the same molecular formula and the same atom connectivity
and their existence can be related to the presence of different
three-dimensional arrangements. Stereoisomerism is of great importance in many
different fields since the molecular properties and biological effects of the
stereoisomers are often significantly different. Most drugs for example, are
often composed of a single stereoisomer of a compound, and while one of them
may have therapeutic effects on the body, another may be toxic. A challenging
task is the automatic detection of stereoisomers using line input
specifications such as SMILES or InChI since it requires information about
group theory (to distinguish stereoisomers using mathematical information about
its symmetry), topology and geometry of the molecule. There are several
software packages that include modules to handle stereochemistry, especially
the ones to name a chemical structure and/or view, edit and generate chemical
structure diagrams. However, there is a lack of software capable of
automatically analyzing a molecule represented as a graph and generate a
classification of the type of isomerism present in a given atom or bond.
Considering the importance of stereoisomerism when comparing chemical
structures, this report describes a computer program for analyzing and
processing steric information contained in a chemical structure represented as
a molecular graph and providing as output a binary classification of the isomer
type based on the recommended conventions. Due to the complexity of the
underlying issue, specification of stereochemical information is currently
limited to explicit stereochemistry and to the two most common types of
stereochemistry caused by asymmetry around carbon atoms: chiral atom and double
bond. A Webtool to automatically identify and classify stereochemistry is
available at http://nams.lasige.di.fc.ul.pt/tools.ph
Rationale, study design, and analysis plan of the Alveolar Recruitment for ARDS Trial (ART): Study protocol for a randomized controlled trial
Background: Acute respiratory distress syndrome (ARDS) is associated with high in-hospital mortality. Alveolar recruitment followed by ventilation at optimal titrated PEEP may reduce ventilator-induced lung injury and improve oxygenation in patients with ARDS, but the effects on mortality and other clinical outcomes remain unknown. This article reports the rationale, study design, and analysis plan of the Alveolar Recruitment for ARDS Trial (ART). Methods/Design: ART is a pragmatic, multicenter, randomized (concealed), controlled trial, which aims to determine if maximum stepwise alveolar recruitment associated with PEEP titration is able to increase 28-day survival in patients with ARDS compared to conventional treatment (ARDSNet strategy). We will enroll adult patients with ARDS of less than 72 h duration. The intervention group will receive an alveolar recruitment maneuver, with stepwise increases of PEEP achieving 45 cmH(2)O and peak pressure of 60 cmH2O, followed by ventilation with optimal PEEP titrated according to the static compliance of the respiratory system. In the control group, mechanical ventilation will follow a conventional protocol (ARDSNet). In both groups, we will use controlled volume mode with low tidal volumes (4 to 6 mL/kg of predicted body weight) and targeting plateau pressure <= 30 cmH2O. The primary outcome is 28-day survival, and the secondary outcomes are: length of ICU stay; length of hospital stay; pneumothorax requiring chest tube during first 7 days; barotrauma during first 7 days; mechanical ventilation-free days from days 1 to 28; ICU, in-hospital, and 6-month survival. ART is an event-guided trial planned to last until 520 events (deaths within 28 days) are observed. These events allow detection of a hazard ratio of 0.75, with 90% power and two-tailed type I error of 5%. All analysis will follow the intention-to-treat principle. Discussion: If the ART strategy with maximum recruitment and PEEP titration improves 28-day survival, this will represent a notable advance to the care of ARDS patients. Conversely, if the ART strategy is similar or inferior to the current evidence-based strategy (ARDSNet), this should also change current practice as many institutions routinely employ recruitment maneuvers and set PEEP levels according to some titration method.Hospital do Coracao (HCor) as part of the Program 'Hospitais de Excelencia a Servico do SUS (PROADI-SUS)'Brazilian Ministry of Healt
Crowdsourced assessment of common genetic contribution to predicting anti-TNF treatment response in rheumatoid arthritis
Correction: vol 7, 13205, 2016, doi:10.1038/ncomms13205Rheumatoid arthritis (RA) affects millions world-wide. While anti-TNF treatment is widely used to reduce disease progression, treatment fails in Bone-third of patients. No biomarker currently exists that identifies non-responders before treatment. A rigorous community-based assessment of the utility of SNP data for predicting anti-TNF treatment efficacy in RA patients was performed in the context of a DREAM Challenge (http://www.synapse.org/RA_Challenge). An open challenge framework enabled the comparative evaluation of predictions developed by 73 research groups using the most comprehensive available data and covering a wide range of state-of-the-art modelling methodologies. Despite a significant genetic heritability estimate of treatment non-response trait (h(2) = 0.18, P value = 0.02), no significant genetic contribution to prediction accuracy is observed. Results formally confirm the expectations of the rheumatology community that SNP information does not significantly improve predictive performance relative to standard clinical traits, thereby justifying a refocusing of future efforts on collection of other data.Peer reviewe
Analysis and Comparison of Vector Space and Metric Space Representations in QSAR Modeling
The performance of quantitative structure−activity relationship (QSAR) models largely depends on the relevance of the selected molecular representation used as input data matrices. This work presents a thorough comparative analysis of two main categories of molecular representations (vector space and metric space) for fitting robust machine learning models in QSAR problems. For the assessment of these methods, seven different molecular representations that included RDKit descriptors, five different fingerprints types (MACCS, PubChem, FP2-based, Atom Pair, and ECFP4), and a graph matching approach (non-contiguous atom matching structure similarity; NAMS) in both vector space and metric space, were subjected to state-of-art machine learning methods that included different dimensionality reduction methods (feature selection and linear dimensionality reduction). Five distinct QSAR data sets were used for direct assessment and analysis. Results show that, in general, metric-space and vector-space representations are able to produce equivalent models, but there are significant differences between individual approaches. The NAMS-based similarity approach consistently outperformed most fingerprint representations in model quality, closely followed by Atom Pair fingerprints. To further verify these findings, the metric space-based models were fitted to the same data sets with the closest neighbors removed. These latter results further strengthened the above conclusions. The metric space graph-based approach appeared significantly superior to the other representations, albeit at a significant computational cost
Noncontiguous Atom Matching Structural Similarity Function
Measuring similarity between molecules
is a fundamental problem
in cheminformatics. Given that similar molecules tend to have similar
physical, chemical, and biological properties, the notion of molecular
similarity plays an important role in the exploration of molecular
data sets, query-retrieval in molecular databases, and in structure–property/activity
modeling. Various methods to define structural similarity between
molecules are available in the literature, but so far none has been
used with consistent and reliable results for all situations. We propose
a new similarity method based on atom alignment for the analysis of
structural similarity between molecules. This method is based on the
comparison of the bonding profiles of atoms on comparable molecules,
including features that are seldom found in other structural or graph
matching approaches like chirality or double bond stereoisomerism.
The similarity measure is then defined on the annotated molecular
graph, based on an iterative directed graph similarity procedure and
optimal atom alignment between atoms using a pairwise matching algorithm.
With the proposed approach the similarities detected are more intuitively
understood because similar atoms in the molecules are explicitly shown.
This noncontiguous atom matching structural similarity method (NAMS)
was tested and compared with one of the most widely used similarity
methods (fingerprint-based similarity) using three difficult data
sets with different characteristics. Despite having a higher computational
cost, the method performed well being able to distinguish either different
or very similar hydrocarbons that were indistinguishable using a fingerprint-based
approach. NAMS also verified the similarity principle using a data
set of structurally similar steroids with differences in the binding
affinity to the corticosteroid binding globulin receptor by showing
that pairs of steroids with a high degree of similarity (>80%)
tend
to have smaller differences in the absolute value of binding activity.
Using a highly diverse set of compounds with information about the
monoamine oxidase inhibition level, the method was also able to recover
a significantly higher average fraction of active compounds when the
seed is active for different cutoff threshold values of similarity.
Particularly, for the cutoff threshold values of 86%, 93%, and 96.5%,
NAMS was able to recover a fraction of actives of 0.57, 0.63, and
0.83, respectively, while the fingerprint-based approach was able
to recover a fraction of actives of 0.41, 0.40, and 0.39, respectively.
NAMS is made available freely for the whole community in a simple
Web based tool as well as the Python source code at http://nams.lasige.di.fc.ul.pt/
Structural Similarity Based Kriging for Quantitative Structure Activity and Property Relationship Modeling
Structurally similar molecules tend
to have similar properties,
i.e. closer molecules in the molecular space are more likely to yield
similar property values while distant molecules are more likely to
yield different values. Based on this principle, we propose the use
of a new method that takes into account the high dimensionality of
the molecular space, predicting chemical, physical, or biological
properties based on the most similar compounds with measured properties.
This methodology uses ordinary kriging coupled with three different
molecular similarity approaches (based on molecular descriptors, fingerprints,
and atom matching) which creates an interpolation map over the molecular
space that is capable of predicting properties/activities for diverse
chemical data sets. The proposed method was tested in two data sets
of diverse chemical compounds collected from the literature and preprocessed.
One of the data sets contained dihydrofolate reductase inhibition
activity data, and the second molecules for which aqueous solubility
was known. The overall predictive results using kriging for both data
sets comply with the results obtained in the literature using typical
QSPR/QSAR approaches. However, the procedure did not involve any type
of descriptor selection or even minimal information about each problem,
suggesting that this approach is directly applicable to a large spectrum
of problems in QSAR/QSPR. Furthermore, the predictive results improve
significantly with the similarity threshold between the training and
testing compounds, allowing the definition of a confidence threshold
of similarity and error estimation for each case inferred. The use
of kriging for interpolation over the molecular metric space is independent
of the training data set size, and no reparametrizations are necessary
when more compounds are added or removed from the set, and increasing
the size of the database will consequentially improve the quality
of the estimations. Finally it is shown that this model can be used
for checking the consistency of measured data and for guiding an extension
of the training set by determining the regions of the molecular space
for which new experimental measurements could be used to maximize
the model’s predictive performance
Noncontiguous Atom Matching Structural Similarity Function
Measuring similarity between molecules
is a fundamental problem
in cheminformatics. Given that similar molecules tend to have similar
physical, chemical, and biological properties, the notion of molecular
similarity plays an important role in the exploration of molecular
data sets, query-retrieval in molecular databases, and in structure–property/activity
modeling. Various methods to define structural similarity between
molecules are available in the literature, but so far none has been
used with consistent and reliable results for all situations. We propose
a new similarity method based on atom alignment for the analysis of
structural similarity between molecules. This method is based on the
comparison of the bonding profiles of atoms on comparable molecules,
including features that are seldom found in other structural or graph
matching approaches like chirality or double bond stereoisomerism.
The similarity measure is then defined on the annotated molecular
graph, based on an iterative directed graph similarity procedure and
optimal atom alignment between atoms using a pairwise matching algorithm.
With the proposed approach the similarities detected are more intuitively
understood because similar atoms in the molecules are explicitly shown.
This noncontiguous atom matching structural similarity method (NAMS)
was tested and compared with one of the most widely used similarity
methods (fingerprint-based similarity) using three difficult data
sets with different characteristics. Despite having a higher computational
cost, the method performed well being able to distinguish either different
or very similar hydrocarbons that were indistinguishable using a fingerprint-based
approach. NAMS also verified the similarity principle using a data
set of structurally similar steroids with differences in the binding
affinity to the corticosteroid binding globulin receptor by showing
that pairs of steroids with a high degree of similarity (>80%)
tend
to have smaller differences in the absolute value of binding activity.
Using a highly diverse set of compounds with information about the
monoamine oxidase inhibition level, the method was also able to recover
a significantly higher average fraction of active compounds when the
seed is active for different cutoff threshold values of similarity.
Particularly, for the cutoff threshold values of 86%, 93%, and 96.5%,
NAMS was able to recover a fraction of actives of 0.57, 0.63, and
0.83, respectively, while the fingerprint-based approach was able
to recover a fraction of actives of 0.41, 0.40, and 0.39, respectively.
NAMS is made available freely for the whole community in a simple
Web based tool as well as the Python source code at http://nams.lasige.di.fc.ul.pt/
A novel algorithm for feature selection using Harmony Search and its application for non-technical losses detection
Finding an optimal subset of features that maximizes classification accuracy is still an open problem. In this paper, we exploit the speed of the Harmony Search algorithm and the Optimum-Path Forest classifier in order to propose a new fast and accurate approach for feature selection. Comparisons to some other pattern recognition and feature selection techniques showed that the proposed hybrid algorithm for feature selection outperformed them. The experiments were carried out in the context of identifying non-technical losses in power distribution systems. (C) 2011 Elsevier Ltd. All rights reserved.Coordenação de Aperfeiçoamento de Pessoal de NÃvel Superior (CAPES)Fundação de Amparo à Pesquisa do Estado de São Paulo (FAPESP
A Bayesian Approach to <i>in Silico</i> Blood-Brain Barrier Penetration Modeling
The human blood-brain barrier (BBB) is a membrane that
protects
the central nervous system (CNS) by restricting the passage of solutes.
The development of any new drug must take into account its existence
whether for designing new molecules that target components of the
CNS or, on the other hand, to find new substances that should not
penetrate the barrier. Several studies in the literature have attempted
to predict BBB penetration, so far with limited success and few, if
any, application to real world drug discovery and development programs.
Part of the reason is due to the fact that only about 2% of small
molecules can cross the BBB, and the available data sets are not representative
of that reality, being generally biased with an over-representation
of molecules that show an ability to permeate the BBB (BBB positives).
To circumvent this limitation, the current study aims to devise and
use a new approach based on Bayesian statistics, coupled with state-of-the-art
machine learning methods to produce a robust model capable of being
applied in real-world drug research scenarios. The data set used,
gathered from the literature, totals 1970 curated molecules, one of
the largest for similar studies. Random Forests and Support Vector
Machines were tested in various configurations against several chemical
descriptor set combinations. Models were tested in a 5-fold cross-validation
process, and the best one tested over an independent validation set.
The best fitted model produced an overall accuracy of 95%, with a
mean square contingency coefficient (Ï•) of 0.74, and showing
an overall capacity for predicting BBB positives of 83% and 96% for
determining BBB negatives. This model was adapted into a Web based
tool made available for the whole community at http://b3pp.lasige.di.fc.ul.pt