1,220 research outputs found
Closed-loop optimization of fast-charging protocols for batteries with machine learning.
Simultaneously optimizing many design parameters in time-consuming experiments causes bottlenecks in a broad range of scientific and engineering disciplines1,2. One such example is process and control optimization for lithium-ion batteries during materials selection, cell manufacturing and operation. A typical objective is to maximize battery lifetime; however, conducting even a single experiment to evaluate lifetime can take months to years3-5. Furthermore, both large parameter spaces and high sampling variability3,6,7 necessitate a large number of experiments. Hence, the key challenge is to reduce both the number and the duration of the experiments required. Here we develop and demonstrate a machine learning methodology to efficiently optimize a parameter space specifying the current and voltage profiles of six-step, ten-minute fast-charging protocols for maximizing battery cycle life, which can alleviate range anxiety for electric-vehicle users8,9. We combine two key elements to reduce the optimization cost: an early-prediction model5, which reduces the time per experiment by predicting the final cycle life using data from the first few cycles, and a Bayesian optimization algorithm10,11, which reduces the number of experiments by balancing exploration and exploitation to efficiently probe the parameter space of charging protocols. Using this methodology, we rapidly identify high-cycle-life charging protocols among 224 candidates in 16 days (compared with over 500 days using exhaustive search without early prediction), and subsequently validate the accuracy and efficiency of our optimization approach. Our closed-loop methodology automatically incorporates feedback from past experiments to inform future decisions and can be generalized to other applications in battery design and, more broadly, other scientific domains that involve time-intensive experiments and multi-dimensional design spaces
Machine learning approaches for computer aided drug discovery
Pharmaceutical drug discovery is expensive, time consuming and scientifically challenging. In order to increase efficiency of the pre-clinical drug discovery pathway, computational drug discovery methods and most recently, machine learning-based methods are increasingly used as powerful tools to aid early stage drug discovery.
In this thesis, I present three complementary computer-aided drug discovery methods, with a focus on aiding hit discovery and hit-to-lead optimization. In addition, this thesis particularly focuses on exploring different molecular representations used to featurise machine learning models, in order explore how best to capture valuable information about protein, ligands and 3D protein-ligand complexes to build more robust, more interpretable and more accurate machine learning models.
First, I developed ligand-based models using a Gaussian Process (GP) as an easy-to-implement tool to guide exploration of chemical space for the optimization of protein-ligand binding affinity. I explored different topological fingerprint and autoencoder representations for Bayesian optimisation (BO) and showed that BO is a powerful tool to help medicinal chemists to prioritise which new compounds to make for single-target as well as multi-target optimisation. The algorithm achieved high enrichment of top compounds for both single target and multiobjective optimisation when tested on a well known benchmark dataset of the drug target matrix metalloproteinase-12 and a real, ongoing drug optimisation dataset targeting four bacterial metallo-β-lactamases.
Next, I present the development of a knowledge-based approach to drug design, combining new protein-ligand interaction fingerprints with a fragment-based drug discovery approach to understand SARS-CoV-2 Mpro-substrate specificity and to design novel small molecule inhibitors in silico. In combination with a fragment-based drug discovery approach, I show how this knowledge-based interaction fingerprint-driven approach can reveal fruitful fragment-growth design strategies.
Lastly, I expand on the knowledge-based contact fingerprints to create a ligand-shaped molecular graph representation (Protein Ligand Interaction Graphs, PLIGs) to develop novel graph-based deep learning protein-ligand binding affinity scoring functions. PLIGs encode all intermolecular interactions in a protein-ligand complex within the node features of the graph and are therefore simple and fully interpretable. I explore a variety of Graph Neural Network architectures in combination with PLIGs and found Graph Attention Networks to perform slightly better than other GNN architectures, performing amongst the best known protein-ligand binding affinity scoring functions
Descoberta de conhecimento biomédico através de representações continuas de grafos multi-relacionais
Knowledge graphs are multi-relational graph structures that allow to
organize data in a way that is not only query able but that also allows
the inference of implicit knowledge by both humans and, particularly,
machines. In recent years new methods have been developed in order to
maximize the knowledge that can be extracted from these structures,
especially in the machine learning field. Knowledge graph embedding
(KGE) strategies allow to map the data of these graphs to a lower dimensional
space to facilitate the application of downstream tasks such
as link prediction or node classification. In this work the capabilities
and limitations of using these techniques to derive new knowledge from
pre-existing biomedical networks was explored, since this is a field that
not only has seen efforts towards converting its large knowledge bases
into knowledge graphs, but that also can make use of the predictive
capabilities of these models in order to accelerate research in the field.
In order to do so, several KGE models were studied and a pipeline was
created in order to obtain and train such models on different biomedical
datasets. The results show that these models can make accurate predictions
on some datasets, but that their performance can be hampered
by some inherent characteristics of the networks.
Additionally, with the knowledge acquired during this research a notebook
was created that aims to be an entry point to other researchers
interested in exploring this field.Grafos de conhecimento são grafos multi-relacionais que permitem organizar
informação de maneira a que esta seja não apenas passível de
ser inquirida, mas que também permita a inferência logica de nova
informação por parte de humanos e especialmente sistemas computacionais.
Recentemente vários métodos têm vindo a ser criados de
maneira a maximizar a informação que pode ser retirada destas estruturas,
sendo a área de \Machine Learning" um dos grandes propulsores
para tal. \Knowledge graph embeddings" (KGE) permitem que
os componentes destes grafos sejam mapeados num espaço latente, de
maneira a facilitar a aplicação de tarefas como a predição de novas
ligações no grafo ou classificação de nós.
Neste trabalho foram exploradas as capacidades e limitações da
aplicação de modelos baseados em \Knowledge graph embeddings"
a redes biomédicas existentes, dado que a biomedicina é uma área na
qual têm sido feitos esforços no sentido de organizar a sua vasta base
de conhecimento em grafos de conhecimento, e onde esta capacidade
de predição pode ser usada para potenciar avanços nos seus diversos
domínios. Para tal, no presente trabalho, vários modelos foram
estudados e uma pipeline foi criada para treinar os mesmos sobre algumas
redes biomédicas. Os resultados mostram que estes modelos conseguem
de facto ser precisos no que diz respeito á tarefa de predição de
ligações em alguns conjuntos de dados, contudo esta precisão aparenta
ser afetada por características inerentes à estrutura do grafo.
Adicionalmente, com o conhecimento adquirido durante a realização
deste trabalho foi criado um \notebook" que tem como objetivo servir
como uma introdução à área de \Knowledge graph embeddings" para
investigadores interessados em explorar a mesma.Mestrado em Engenharia de Computadores e Telemátic
Learning with multiple pairwise kernels for drug bioactivity prediction
Motivation: Many inference problems in bioinformatics, including drug bioactivity prediction, can be formulated as pairwise learning problems, in which one is interested in making predictions for pairs of objects, e.g. drugs and their targets. Kernel-based approaches have emerged as powerful tools for solving problems of that kind, and especially multiple kernel learning (MKL) offers promising benefits as it enables integrating various types of complex biomedical information sources in the form of kernels, along with learning their importance for the prediction task. However, the immense size of pairwise kernel spaces remains a major bottleneck, making the existing MKL algorithms computationally infeasible even for small number of input pairs. Results: We introduce pairwiseMKL, the first method for time- and memory-efficient learning with multiple pairwise kernels. pairwiseMKL first determines the mixture weights of the input pairwise kernels, and then learns the pairwise prediction function. Both steps are performed efficiently without explicit computation of the massive pairwise matrices, therefore making the method applicable to solving large pairwise learning problems. We demonstrate the performance of pairwiseMKL in two related tasks of quantitative drug bioactivity prediction using up to 167 995 bioactivity measurements and 3120 pairwise kernels: (i) prediction of anticancer efficacy of drug compounds across a large panel of cancer cell lines; and (ii) prediction of target profiles of anticancer compounds across their kinome-wide target spaces. We show that pairwiseMKL provides accurate predictions using sparse solutions in terms of selected kernels, and therefore it automatically identifies also data sources relevant for the prediction problem.Peer reviewe
Multi-Fidelity Methods for Optimization: A Survey
Real-world black-box optimization often involves time-consuming or costly
experiments and simulations. Multi-fidelity optimization (MFO) stands out as a
cost-effective strategy that balances high-fidelity accuracy with computational
efficiency through a hierarchical fidelity approach. This survey presents a
systematic exploration of MFO, underpinned by a novel text mining framework
based on a pre-trained language model. We delve deep into the foundational
principles and methodologies of MFO, focusing on three core components --
multi-fidelity surrogate models, fidelity management strategies, and
optimization techniques. Additionally, this survey highlights the diverse
applications of MFO across several key domains, including machine learning,
engineering design optimization, and scientific discovery, showcasing the
adaptability and effectiveness of MFO in tackling complex computational
challenges. Furthermore, we also envision several emerging challenges and
prospects in the MFO landscape, spanning scalability, the composition of lower
fidelities, and the integration of human-in-the-loop approaches at the
algorithmic level. We also address critical issues related to benchmarking and
the advancement of open science within the MFO community. Overall, this survey
aims to catalyze further research and foster collaborations in MFO, setting the
stage for future innovations and breakthroughs in the field.Comment: 47 pages, 9 figure
Toward real-world automated antibody design with combinatorial Bayesian optimization
Antibodies are multimeric proteins capable of highly specific molecular recognition. The complementarity determining region 3 of the antibody variable heavy chain (CDRH3) often dominates antigen-binding specificity. Hence, it is a priority to design optimal antigen-specific CDRH3 to develop therapeutic antibodies. The combinatorial structure of CDRH3 sequences makes it impossible to query binding-affinity oracles exhaustively. Moreover, antibodies are expected to have high target specificity and developability. Here, we present AntBO, a combinatorial Bayesian optimization framework utilizing a CDRH3 trust region for an in silico design of antibodies with favorable developability scores. The in silico experiments on 159 antigens demonstrate that AntBO is a step toward practically viable in vitro antibody design. In under 200 calls to the oracle, AntBO suggests antibodies outperforming the best binding sequence from 6.9 million experimentally obtained CDRH3s. Additionally, AntBO finds very-high-affinity CDRH3 in only 38 protein designs while requiring no domain knowledge
- …