28 research outputs found
Mechanism Deduction from Noisy Chemical Reaction Networks
We introduce KiNetX, a fully automated meta-algorithm for the kinetic
analysis of complex chemical reaction networks derived from semi-accurate but
efficient electronic structure calculations. It is designed to (i) accelerate
the automated exploration of such networks, and (ii) cope with model-inherent
errors in electronic structure calculations on elementary reaction steps. We
developed and implemented KiNetX to possess three features. First, KiNetX
evaluates the kinetic relevance of every species in a (yet incomplete) reaction
network to confine the search for new elementary reaction steps only to those
species that are considered possibly relevant. Second, KiNetX identifies and
eliminates all kinetically irrelevant species and elementary reactions to
reduce a complex network graph to a comprehensible mechanism. Third, KiNetX
estimates the sensitivity of species concentrations toward changes in
individual rate constants (derived from relative free energies), which allows
us to systematically select the most efficient electronic structure model for
each elementary reaction given a predefined accuracy. The novelty of KiNetX
consists in the rigorous propagation of correlated free-energy uncertainty
through all steps of our kinetic analyis. To examine the performance of KiNetX,
we developed AutoNetGen. It semirandomly generates chemistry-mimicking reaction
networks by encoding chemical logic into their underlying graph structure.
AutoNetGen allows us to consider a vast number of distinct chemistry-like
scenarios and, hence, to discuss assess the importance of rigorous uncertainty
propagation in a statistical context. Our results reveal that KiNetX reliably
supports the deduction of product ratios, dominant reaction pathways, and
possibly other network properties from semi-accurate electronic structure data.Comment: 36 pages, 4 figures, 2 table
Heuristics-Guided Exploration of Reaction Mechanisms
For the investigation of chemical reaction networks, the efficient and
accurate determination of all relevant intermediates and elementary reactions
is mandatory. The complexity of such a network may grow rapidly, in particular
if reactive species are involved that might cause a myriad of side reactions.
Without automation, a complete investigation of complex reaction mechanisms is
tedious and possibly unfeasible. Therefore, only the expected dominant reaction
paths of a chemical reaction network (e.g., a catalytic cycle or an enzymatic
cascade) are usually explored in practice. Here, we present a computational
protocol that constructs such networks in a parallelized and automated manner.
Molecular structures of reactive complexes are generated based on heuristic
rules derived from conceptual electronic-structure theory and subsequently
optimized by quantum chemical methods to produce stable intermediates of an
emerging reaction network. Pairs of intermediates in this network that might be
related by an elementary reaction according to some structural similarity
measure are then automatically detected and subjected to an automated search
for the connecting transition state. The results are visualized as an
automatically generated network graph, from which a comprehensive picture of
the mechanism of a complex chemical process can be obtained that greatly
facilitates the analysis of the whole network. We apply our protocol to the
Schrock dinitrogen-fixation catalyst to study alternative pathways of catalytic
ammonia production.Comment: 27 pages, 9 figure
Electrosynthetic screening and modern optimization strategies for electrosynthesis of highly value-added products
Unlike common analytical techniques such as cyclic voltammetry, statistics-based optimization tools are not yet often in the toolbox of preparative organic electrochemists. In general, experimental effort is not optimally utilized because the selection of experimental conditions is based on the one-variable-at-a-time principle. We will summarize statistically motivated optimization approaches already used in the context of electroorganic synthesis. We discuss the central ideas of these optimization methods which originate from other fields of chemistry in relation to electrosynthetic applications
A multi-label, dual-output deep neural network for automated bug triaging
Bug tracking enables the monitoring and resolution of issues and bugs within
organizations. Bug triaging, or assigning bugs to the owner(s) who will resolve
them, is a critical component of this process because there are many incorrect
assignments that waste developer time and reduce bug resolution throughput. In
this work, we explore the use of a novel two-output deep neural network
architecture (Dual DNN) for triaging a bug to both an individual team and
developer, simultaneously. Dual DNN leverages this simultaneous prediction by
exploiting its own guess of the team classes to aid in developer assignment. A
multi-label classification approach is used for each of the two outputs to
learn from all interim owners, not just the last one who closed the bug. We
make use of a heuristic combination of the interim owners
(owner-importance-weighted labeling) which is converted into a probability mass
function (pmf). We employ a two-stage learning scheme, whereby the team portion
of the model is trained first and then held static to train the team--developer
and bug--developer relationships. The scheme employed to encode the
team--developer relationships is based on an organizational chart (org chart),
which renders the model robust to organizational changes as it can adapt to
role changes within an organization. There is an observed average lift (with
respect to both team and developer assignment) of 13%-points in 11-fold
incremental-learning cross-validation (IL-CV) accuracy for Dual DNN utilizing
owner-weighted labels compared with the traditional multi-class classification
approach. Furthermore, Dual DNN with owner-weighted labels achieves average
11-fold IL-CV accuracies of 76% (team assignment) and 55% (developer
assignment), outperforming reference models by 14%- and 25%-points,
respectively, on a proprietary dataset with 236,865 entries.Comment: 8 pages, 2 figures, 9 table
Reliable estimation of prediction uncertainty for physico-chemical property models
The predictions of parameteric property models and their uncertainties are
sensitive to systematic errors such as inconsistent reference data, parametric
model assumptions, or inadequate computational methods. Here, we discuss the
calibration of property models in the light of bootstrapping, a sampling method
akin to Bayesian inference that can be employed for identifying systematic
errors and for reliable estimation of the prediction uncertainty. We apply
bootstrapping to assess a linear property model linking the 57Fe Moessbauer
isomer shift to the contact electron density at the iron nucleus for a diverse
set of 44 molecular iron compounds. The contact electron density is calculated
with twelve density functionals across Jacob's ladder (PWLDA, BP86, BLYP, PW91,
PBE, M06-L, TPSS, B3LYP, B3PW91, PBE0, M06, TPSSh). We provide systematic-error
diagnostics and reliable, locally resolved uncertainties for isomer-shift
predictions. Pure and hybrid density functionals yield average prediction
uncertainties of 0.06-0.08 mm/s and 0.04-0.05 mm/s, respectively, the latter
being close to the average experimental uncertainty of 0.02 mm/s. Furthermore,
we show that both model parameters and prediction uncertainty depend
significantly on the composition and number of reference data points.
Accordingly, we suggest that rankings of density functionals based on
performance measures (e.g., the coefficient of correlation, r2, or the
root-mean-square error, RMSE) should not be inferred from a single data set.
This study presents the first statistically rigorous calibration analysis for
theoretical Moessbauer spectroscopy, which is of general applicability for
physico-chemical property models and not restricted to isomer-shift
predictions. We provide the statistically meaningful reference data set MIS39
and a new calibration of the isomer shift based on the PBE0 functional.Comment: 49 pages, 9 figures, 7 table
Computational Systems Chemistry with Rigorous Uncertainty Quantification
The success of in silico design approaches for molecules and materials that attempt to solve major technological issues of our society depends crucially on knowing the uncertainty of property predictions. Calibration is an essential model-building approach in this respect as it renders the inference of uncertainty-equipped predictions based on computer simulations possible. However, there exist various pitfalls that may affect the transferability of a property model to new data. By resorting to Bayesian inference and resampling methods (bootstrapping and cross-validation), we discuss issues such as the proper selection of reference data and property models, the identification and elimination of systematic errors, and the rigorous quantification of prediction uncertainty. We apply this statistical calibration approach to the prediction of 57Fe Mössbauer isomer shifts from electron densities obtained with density functional theory. Our findings reveal that the specific selection of reference iron complexes can have a significant effect on the ranking of density functionals with respect to model transferability. Furthermore, we show that bootstrapping can be harnessed to determine the sensitivity of such model rankings to changes in the reference data set, which is inevitable to guide future computational studies. Such a statistically rigorous approach to calibration is almost unknown to chemistry. Our study is one of the very few addressing this issue and its results can be applied by all chemists to arbitrary property models with our open-source software reBoot. In this thesis, we define a new standard for the calibration of computational results due to the rigor, transparency, and generality of our statistical approach, which is completely automatable. Black-box uncertainty quantification can also be applied to macroscopic systems by propagating the uncertainties inferred for single-molecule properties, which will ultimately allow modeling in chemistry to accelerate the discovery of important drugs, organic materials for solar cells, electrolytes for flow batteries, etc. A rather fundamental application area of this systems-focussed uncertainty quantification approach is the understanding of complex chemical reaction mechanisms, which is therefore another focus of this thesis. For an approach that accounts for all elementary processes within a reactive mixture, it is essential to know all relevant intermediates and transition states, to determine relative (free) energies, to quantify their uncertainties, and to model the systems kinetics based on uncertainty propagation. The advantage of a holistic in silico approach to chemistry is that the origin of all data can be rigorously controlled, which allows for reliable uncertainty quantification and propagation. In this thesis, we present the first automated exploration of parts of chemical reaction space based on quantum mechanical descriptors at the example of synthetic nitrogen fixation. Moreover, an extension to the exploration strategy considering uncertainty propagation through all stages of in silico modeling is presented in detail at the example of the formose reaction. It is generally hard to model the kinetics of such complex reactive systems as they usually constitute processes spanning multiple time scales. Here, we present a simple and efficient strategy based on computational singular perturbation, which allows us to model the kinetics of complex chemical systems at arbitrary time scales. To study arbitrary reaction networks of dilute chemical systems (low-pressure gas or low-concentration solution phase), we implemented a generalized scheme of our kinetic modeling approach referred to as KiNetX. Main features of the completely automated KiNetX meta-algorithm are hierarchical network reduction, uncertainty propagation, and global sensitivity analysis, the latter of which detects critical (uncertainty-amplifying) regions of a network such that more complex electronic structure models are only employed if necessary. We also developed an automatic generator of abstract reaction networks encoding chemical logic, named AutoNetGen, which is coupled to KiNetX and allows us to examine a multitude of different chemical scenarios in short time. In a final case study, we apply the insights gained from computational systems chemistry with rigorous uncertainty quantification to model the thermochemistry, kinetics, and spectroscopic properties of iron porphyrin compounds, which constitute a crucial type of active centers in metalloenzyme research
Uncertainty Quantification of Reactivity Scales
​According to Mayr, polar organic synthesis can be rationalized by a simple empirical relationship linking bimolecular rate constants to as few as three reactivity parameters. Here, we propose an extension to Mayr’s reactivity method that is rooted in uncertainty quantification and transforms the reactivity parameters into probability distributions. Through uncertainty propagation, these distributions can be transformed into uncertainty estimates for bimolecular rate constants. Chemists can exploit these virtual error bars to enhance synthesis planning and to decrease the ambiguity of conclusions drawn from experimental data. We demonstrate the above at the example of the reference data set released by Mayr and co-workers [J. Am. Chem. Soc. 2001, 123, 9500; J. Am. Chem. Soc. 2012, 134, 13902]. As by-product of the new approach, we obtain revised reactivity parameters for 36 π-nucleophiles and 32 benzhydrylium ions
Error Assessment of Computational Models in Chemistry
Computational models in chemistry rely on a number of approximations. The effect of such approximations on observables derived from them is often unpredictable. Therefore, it is challenging to quantify the uncertainty of a computational result, which, however, is necessary to assess
the suitability of a computational model. Common performance statistics such as the mean absolute error are prone to failure as they do not distinguish the explainable (systematic) part of the errors from their unexplainable (random) part. In this paper, we discuss problems and solutions for
performance assessment of computational models based on several examples from the quantum chemistry literature. For this purpose, we elucidate the different sources of uncertainty, the elimination of systematic errors, and the combination of individual uncertainty components to the uncertainty of a prediction