18 research outputs found
Artificial Intelligence for Science in Quantum, Atomistic, and Continuum Systems
Advances in artificial intelligence (AI) are fueling a new paradigm of
discoveries in natural sciences. Today, AI has started to advance natural
sciences by improving, accelerating, and enabling our understanding of natural
phenomena at a wide range of spatial and temporal scales, giving rise to a new
area of research known as AI for science (AI4Science). Being an emerging
research paradigm, AI4Science is unique in that it is an enormous and highly
interdisciplinary area. Thus, a unified and technical treatment of this field
is needed yet challenging. This work aims to provide a technically thorough
account of a subarea of AI4Science; namely, AI for quantum, atomistic, and
continuum systems. These areas aim at understanding the physical world from the
subatomic (wavefunctions and electron density), atomic (molecules, proteins,
materials, and interactions), to macro (fluids, climate, and subsurface) scales
and form an important subarea of AI4Science. A unique advantage of focusing on
these areas is that they largely share a common set of challenges, thereby
allowing a unified and foundational treatment. A key common challenge is how to
capture physics first principles, especially symmetries, in natural systems by
deep learning methods. We provide an in-depth yet intuitive account of
techniques to achieve equivariance to symmetry transformations. We also discuss
other common technical challenges, including explainability,
out-of-distribution generalization, knowledge transfer with foundation and
large language models, and uncertainty quantification. To facilitate learning
and education, we provide categorized lists of resources that we found to be
useful. We strive to be thorough and unified and hope this initial effort may
trigger more community interests and efforts to further advance AI4Science
Femtosecond real-time probing of reactions. VIII. The bimolecular reaction Br+I2
In this paper, we discuss the experimental technique for real-time measurement of the lifetimes of the collision complex of bimolecular reactions. An application to the atom–molecule Br+I_2 reaction at two collision energies is made. Building on our earlier Communication [J. Chem. Phys. 95, 7763 (1991)], we report on the observed transients and lifetimes for the collision complex, the nature of the transition state, and the dynamics near threshold. Classical trajectory calculations provide a framework for deriving the global nature of the reactive potential energy surface, and for discussing the real-time, scattering, and asymptotic (product-state distribution) aspects of the dynamics. These experimental and theoretical results are compared with the extensive array of kinetic, crossed beam, and theoretical studies found in the literature for halogen radical–halogen molecule exchange reactions
Computer modeling of dapsone-mediated heteroactivation of flurbiprofen metabolism by CYP2C9
The occurrence of atypical kinetics in cytochrome P450 reactions can confound in vitro determinations of a drug\u27s kinetic parameters. During drug development, inaccurate kinetic parameter estimates can lead to incorrect decisions about a lead compound\u27s potential for success. It has become widely accepted that in certain CYP subfamilies more than one molecule can occupy the active site simultaneously, in some cases resulting in enhanced substrate turnover (heteroactivation). However, the specific mechanism(s) by which dual-compound binding results in heteroactivation remain unclear. It is known that orientation of the substrate in the active site, as dictated by interactions with active site residues, plays a large role in metabolic outcome. Effector compounds have been shown in vitro to alter substrate position in the active site. Here, data obtained via in silico methods including docking, molecular dynamics, semi-empirical and ab initio quantum mechanics indicate that direct interaction between effector and substrate can play a role in stabilizing the substrate in an alternative conformation conducive to oxidation. In this study a high-throughput screening computer model of heteroactivation of flurbiprofen metabolism by CYP2C9 has been developed for the purpose of elucidating key interactions between substrate, effector, and enzyme responsible for heteroactivation in this system, as well as to predict as yet unknown activators
Structural Studies on Flexible Small Molecules Based on NMR in Oriented Media. Methodology and Application to Natural Products
This thesis describes the development and application of structural elucidation
methodologies based on NMR in aligned media. Nuclear magnetic resonance is arguably
the most important technique for the structural analysis of organic molecules
in solution. In the last decade, Residual Dipolar Coupling (RDC) analysis emerged
as a powerful tool for the determination of the three-dimensional structure of organic
molecules in solution, complementing and even outperforming the approach based
on the classical NMR observables such as NOE or 3J couplings. While application of
RDCs to the structural analysis of proteins developed rapidly, their use with “small”
molecules (typically organic compounds and natural products with MW < 1000 Da)
is still scarce. From the spectroscopic point of view, two features of small molecules
pose the main obstacles to the application of RDC to their analysis: the scarcity of
observable couplings and the complexity stemming from conformational flexibility in
solution. Besides, sample preparation with the optimal degree of alignment is still an
issue for most classes of compounds.
In this thesis, all these topics are addressed and new experimental and computational
advancements are presented.
i) Sample preparation. Weak alignment in water and aligning properties of polyacrylamide
gels.
ii) New observables. Long-range proton–carbon RDCs.
iii) Analysis of flexible organic molecules
Development of quantitative structure property relationships to support non-target LC-HRMS screening
Κατά την τελευταία δεκαετία, ένας μεγάλος αριθμός αναδυόμενων ρύπων έχουν ανιχνευθεί και ταυτοποιηθεί σε επιφανειακά ύδατα και λύματα, προκαλώντας ανησυχία για το υδάτινο οικοσύστημα, λόγω της πιθανής χημικής τους σταθερότητας. Η τεχνική της υγροχρωματογραφίας - φασματομετρίας μάζας υψηλής διακριτικής ικανότητας (LC-HRMS) αποτελεί μια αποτελεσματική τεχνική για την ανίχνευση αναδυόμενων ρύπων στο περιβάλλον. Η ταυτόχρονη δε ανάλυση των δειγμάτων με τις συμπληρωματικές τεχνικές της υγροχρωματογραφίας αντίστροφης φάσης (RPLC) και της υγροχρωματογραφίας υδρόφιλων αλληλεπιδράσεων (HILIC), συντελεί στην ταυτοποίηση «ύποπτων» ή και άγνωστων ρύπων με ποικίλες φυσικοχημικές ιδιότητες. Για την ταυτοποίηση τους, απαιτείται να πληρούνται συγκεκριμένα κριτήρια, τα οποία αξιολογούνται με βάση τη χρήση διαγνωστικών εργαλείων, όπως η ακριβής πρόβλεψη του χρόνου ανάσχεσης, η in silico θραυσματοποίηση και η πρόβλεψη της συμπεριφορά τους στον ιοντισμό.
Στο 3ο κεφάλαιο της παρούσας διδακτορικής διατριβής περιγράφεται η ανάπτυξη μιας ολοκληρωμένης πορείας εργασίας (workflow) για τη διερεύνηση των παραμέτρων που επηρεάζουν τον χρόνο έκλουσης μεγάλου αριθμού ενώσεων που συγκαταλέγονται στους αναδυόμενους ρύπους. Για τον σκοπό αυτό, πάνω από 2.500 αναδυόμενοι ρύποι χρησιμοποιήθηκαν για την ανάπτυξη του μοντέλου πρόβλεψης χρόνου ανάσχεσης για τις 2 υγροχρωματογραφικές τεχνικές (RP- και HILIC-LC-HRMS) και για ηλεκτροψεκασμό τόσο σε θετικό όσο και σε αρνητικό ιοντισμό (+/-ESI). Στη συνέχεια, πραγματοποιήθηκε εφαρμογή του μοντέλου για την υπολογιστική πρόβλεψη του χρόνου ανάσχεσης, για την ταυτοποίηση 10 νέων προϊόντων μετασχματισμού των φαρμακευτικών ενώσεων (tramadol, furosemide και niflumic acid) ύστερα από επεξεργασία με όζον.
Στο 4ο κεφάλαιο παρουσιάζεται η ανάπτυξη ενός καινοτόμου γενικευμένου χημειομετρικού μοντέλου το οποίο είναι ικανό να προβλέπει τον χρόνο έκλουσης κάθε πιθανού ρύπου, ανεξαρτήτου υγροχρωματογραφικής μεθόδου που χρησιμοποιείται, συμβάλλοντας σημαντικά στην σύγκριση αποτελεσμάτων από διαφορετικές LC-HRMS μεθόδους. Το συγκεκριμένο μοντέλο χρησιμοποιήθηκε για την ταυτοποίηση «ύποπτων» και άγνωστων ενώσεων σε διεργαστηριακές δοκιμές.
Το Κεφάλαιο 5, περιέχει την περιγραφή της ανάπτυξης ενός υπολογιστικού μοντέλου πρόβλεψης τοξικότητας αναδυόμενων ρύπων που ανιχνεύονται στο υδάτινο οικοσύστημα. Το συγκεκριμένο μοντέλο αποσκοπεί στην εκτίμηση του πιθανού περιβαλλοντικού κινδύνου για νέες ενώσεις που ταυτοποιήθηκαν μέσω σάρωσης «ύποπτων» ενώσεων και μη-στοχευμένης σάρωσης, για τις οποίες δεν είναι ακόμα διαθέσιμα πειραματικά δεδομένα τοξικότητας.
Τέλος, στο κεφάλαιο 6 παρουσιάζεται ένας αυτοματοποιημένος και συστηματικός τρόπος σάρωσης «ύποπτων» ενώσεων και μη-στοχευμένης σάρωσης σε δεδομένα από LC-HRMS. Η νέα αυτή αυτοματοποιημένη πορεία εργασίας, αποσκοπεί στην λιγότερο χρονοβόρα επεξεργασία των HRMS δεδομένων, και στην εφαρμογή της μη-στοχευμένης σάρωσης ώστε να είναι δυνατή η εφαρμογή τους σε καθημερινούς ελέγχους ρουτίνας ή/και για χρήση από τις κανονιστικές αρχές.Over the last decade, a high number of emerging contaminants were detected and identified in surface and waste waters that could threaten the aquatic environment due to their pseudo-persistence. As it is described in chapters 1 and 2, liquid chromatography high resolution mass spectroscopy (LC-HRMS) can be used as an efficient tool for their screening. Simultaneously screening of these samples by hydrophilic interaction liquid chromatography (HILIC) and reversed phase (RP) would help with full identification of suspects and unknown compounds. However, to confirm the identity of the most relevant suspect or unknown compounds, their chemical properties such as retention time behavior, MSn fragmentation and ionization modes should be investigated.
Chapter 3 of this thesis discusses the development of a comprehensive workflow to study the retention time behavior of large groups of compounds belonging to emerging contaminants. A dataset consisted of more than 2500 compounds was used for RP/HILIC-LC-HRMS, and their retention times were derived in both Electrospray Ionization mode (+/-ESI). These in silico approaches were then applied on the identification of 10 new transformation products of tramadol, furosemide and niflumic acid (under ozonation treatment).
Chapter 4 discusses about the development of a first retention time index system for LC-HRMS. Some practical applications of this RTI system in suspect and non-target screening in collaborative trials have been presented as well.
Chapter 5 describes the development of in silico based toxicity models to estimate the acute toxicity of emerging pollutants in the aquatic environment. This would help link the suspect/non-target screening results to the tentative environmental risk by predicting the toxicity of newly tentatively identified compounds.
Chapter 6 introduces an automatic and systematic way to perform suspect and non-target screening in LC-HRMS data. This would save time and the data analysis loads and enable the routine application of non-target screening for regulatory or monitoring purpose
A robust machine learning approach for the prediction of allosteric binding sites
Previously held under moratorium from 28 March 2017 until 28 March 2022Allosteric regulatory sites are highly prized targets in drug discovery. They remain difficult to detect by conventional methods, with the vast majority of known examples being found serendipitously. Herein, a rigorous, wholly-computational protocol is presented for the prediction of allosteric sites. Previous attempts to predict the location of allosteric sites by computational means drew on only a small amount of data. Moreover, no attempt was made to modify the initial crystal structure beyond the in silico deletion of the allosteric ligand. This behaviour can leave behind a conformation with a significant structural deformation, often betraying the location of the allosteric binding site. Despite this artificial advantage, modest success rates are observed at best. This work addresses both of these issues. A set of 60 protein crystal structures with known allosteric modulators was collected. To remove the imprint on protein structure caused by the presence of bound modulators, molecular dynamics was performed on each protein prior to analysis. A wide variety of analytical techniques were then employed to extract meaningful data from the trajectories. Upon fusing them into a single, coherent dataset, random forest - a machine learning algorithm - was applied to train a high performance classification model. After successive rounds of optimisation, the final model presented in this work correctly identified the allosteric site for 72% of the proteins tested. This is not only an improvement over alternative strategies in the literature; crucially, this method is unique among site prediction tools in that is does not abuse crystal structures containing imprints of bound ligands - of key importance when making live predictions, where no allosteric regulatory sites are known.Allosteric regulatory sites are highly prized targets in drug discovery. They remain difficult to detect by conventional methods, with the vast majority of known examples being found serendipitously. Herein, a rigorous, wholly-computational protocol is presented for the prediction of allosteric sites. Previous attempts to predict the location of allosteric sites by computational means drew on only a small amount of data. Moreover, no attempt was made to modify the initial crystal structure beyond the in silico deletion of the allosteric ligand. This behaviour can leave behind a conformation with a significant structural deformation, often betraying the location of the allosteric binding site. Despite this artificial advantage, modest success rates are observed at best. This work addresses both of these issues. A set of 60 protein crystal structures with known allosteric modulators was collected. To remove the imprint on protein structure caused by the presence of bound modulators, molecular dynamics was performed on each protein prior to analysis. A wide variety of analytical techniques were then employed to extract meaningful data from the trajectories. Upon fusing them into a single, coherent dataset, random forest - a machine learning algorithm - was applied to train a high performance classification model. After successive rounds of optimisation, the final model presented in this work correctly identified the allosteric site for 72% of the proteins tested. This is not only an improvement over alternative strategies in the literature; crucially, this method is unique among site prediction tools in that is does not abuse crystal structures containing imprints of bound ligands - of key importance when making live predictions, where no allosteric regulatory sites are known
Density-functional theory study of small Fe, Co, Ni and Pt clusters on graphene
Master'sMASTER OF SCIENC