2,848 research outputs found
Data Quality in Predictive Toxicology: Identification of Chemical Structures and Calculation of Chemical Descriptors
Every technique for toxicity prediction and for the detection of structure–activity relationships relies on the accurate estimation and representation of chemical and toxicologic properties. In this paper we discuss the potential sources of errors associated with the identification of compounds, the representation of their structures, and the calculation of chemical descriptors. It is based on a case study where machine learning techniques were applied to data from noncongeneric compounds and a complex toxicologic end point (carcinogenicity). We propose methods applicable to the routine quality control of large chemical datasets, but our main intention is to raise awareness about this topic and to open a discussion about quality assurance in predictive toxicology. The accuracy and reproducibility of toxicity data will be reported in another paper
Multitask Learning Deep Neural Networks to Combine Revealed and Stated Preference Data
It is an enduring question how to combine revealed preference (RP) and stated
preference (SP) data to analyze travel behavior. This study presents a
framework of multitask learning deep neural networks (MTLDNNs) for this
question, and demonstrates that MTLDNNs are more generic than the traditional
nested logit (NL) method, due to its capacity of automatic feature learning and
soft constraints. About 1,500 MTLDNN models are designed and applied to the
survey data that was collected in Singapore and focused on the RP of four
current travel modes and the SP with autonomous vehicles (AV) as the one new
travel mode in addition to those in RP. We found that MTLDNNs consistently
outperform six benchmark models and particularly the classical NL models by
about 5% prediction accuracy in both RP and SP datasets. This performance
improvement can be mainly attributed to the soft constraints specific to
MTLDNNs, including its innovative architectural design and regularization
methods, but not much to the generic capacity of automatic feature learning
endowed by a standard feedforward DNN architecture. Besides prediction, MTLDNNs
are also interpretable. The empirical results show that AV is mainly the
substitute of driving and AV alternative-specific variables are more important
than the socio-economic variables in determining AV adoption. Overall, this
study introduces a new MTLDNN framework to combine RP and SP, and demonstrates
its theoretical flexibility and empirical power for prediction and
interpretation. Future studies can design new MTLDNN architectures to reflect
the speciality of RP and SP and extend this work to other behavioral analysis
Fr\'echet ChemNet Distance: A metric for generative models for molecules in drug discovery
The new wave of successful generative models in machine learning has
increased the interest in deep learning driven de novo drug design. However,
assessing the performance of such generative models is notoriously difficult.
Metrics that are typically used to assess the performance of such generative
models are the percentage of chemically valid molecules or the similarity to
real molecules in terms of particular descriptors, such as the partition
coefficient (logP) or druglikeness. However, method comparison is difficult
because of the inconsistent use of evaluation metrics, the necessity for
multiple metrics, and the fact that some of these measures can easily be
tricked by simple rule-based systems. We propose a novel distance measure
between two sets of molecules, called Fr\'echet ChemNet distance (FCD), that
can be used as an evaluation metric for generative models. The FCD is similar
to a recently established performance metric for comparing image generation
methods, the Fr\'echet Inception Distance (FID). Whereas the FID uses one of
the hidden layers of InceptionNet, the FCD utilizes the penultimate layer of a
deep neural network called ChemNet, which was trained to predict drug
activities. Thus, the FCD metric takes into account chemically and biologically
relevant information about molecules, and also measures the diversity of the
set via the distribution of generated molecules. The FCD's advantage over
previous metrics is that it can detect if generated molecules are a) diverse
and have similar b) chemical and c) biological properties as real molecules. We
further provide an easy-to-use implementation that only requires the SMILES
representation of the generated molecules as input to calculate the FCD.
Implementations are available at: https://www.github.com/bioinf-jku/FCDComment: Implementations are available at:
https://www.github.com/bioinf-jku/FC
Recurrent Latent Variable Networks for Session-Based Recommendation
In this work, we attempt to ameliorate the impact of data sparsity in the
context of session-based recommendation. Specifically, we seek to devise a
machine learning mechanism capable of extracting subtle and complex underlying
temporal dynamics in the observed session data, so as to inform the
recommendation algorithm. To this end, we improve upon systems that utilize
deep learning techniques with recurrently connected units; we do so by adopting
concepts from the field of Bayesian statistics, namely variational inference.
Our proposed approach consists in treating the network recurrent units as
stochastic latent variables with a prior distribution imposed over them. On
this basis, we proceed to infer corresponding posteriors; these can be used for
prediction and recommendation generation, in a way that accounts for the
uncertainty in the available sparse training data. To allow for our approach to
easily scale to large real-world datasets, we perform inference under an
approximate amortized variational inference (AVI) setup, whereby the learned
posteriors are parameterized via (conventional) neural networks. We perform an
extensive experimental evaluation of our approach using challenging benchmark
datasets, and illustrate its superiority over existing state-of-the-art
techniques
Predicting Skin Permeability by means of Computational Approaches : Reliability and Caveats in Pharmaceutical Studies
© 2019 American Chemical Society.The skin is the main barrier between the internal body environment and the external one. The characteristics of this barrier and its properties are able to modify and affect drug delivery and chemical toxicity parameters. Therefore, it is not surprising that permeability of many different compounds has been measured through several in vitro and in vivo techniques. Moreover, many different in silico approaches have been used to identify the correlation between the structure of the permeants and their permeability, to reproduce the skin behavior, and to predict the ability of specific chemicals to permeate this barrier. A significant number of issues, like interlaboratory variability, experimental conditions, data set building rationales, and skin site of origin and hydration, still prevent us from obtaining a definitive predictive skin permeability model. This review wants to show the main advances and the principal approaches in computational methods used to predict this property, to enlighten the main issues that have arisen, and to address the challenges to develop in future research.Peer reviewedFinal Accepted Versio
- …