Search CORE

45 research outputs found

Designing algorithms to aid discovery by chemical robots

Author: Cronin Leroy
Gromski Piotr S.
Henson Alon B.
Publication venue: 'American Chemical Society (ACS)'
Publication date: 25/07/2018
Field of study

Recently, automated robotic systems have become very efficient, thanks to improved coupling between sensor systems and algorithms, of which the latter have been gaining significance thanks to the increase in computing power over the past few decades. However, intelligent automated chemistry platforms for discovery orientated tasks need to be able to cope with the unknown, which is a profoundly hard problem. In this Outlook, we describe how recent advances in the design and application of algorithms, coupled with the increased amount of chemical data available, and automation and control systems may allow more productive chemical research and the development of chemical robots able to target discovery. This is shown through examples of workflow and data processing with automation and control, and through the use of both well-used and cutting-edge algorithms illustrated using recent studies in chemistry. Finally, several algorithms are presented in relation to chemical robots and chemical intelligence for knowledge discovery

Enlighten

Deeper Connections between Neural Networks and Gaussian Processes Speed-up Active Learning

Author: Makarychev Sergei
Panov Maxim
Shapeev Alexander
Tsymbalov Evgenii
Publication venue: 'International Joint Conferences on Artificial Intelligence'
Publication date: 27/02/2019
Field of study

Active learning methods for neural networks are usually based on greedy criteria which ultimately give a single new design point for the evaluation. Such an approach requires either some heuristics to sample a batch of design points at one active learning iteration, or retraining the neural network after adding each data point, which is computationally inefficient. Moreover, uncertainty estimates for neural networks sometimes are overconfident for the points lying far from the training sample. In this work we propose to approximate Bayesian neural networks (BNN) by Gaussian processes, which allows us to update the uncertainty estimates of predictions efficiently without retraining the neural network, while avoiding overconfident uncertainty prediction for out-of-sample points. In a series of experiments on real-world data including large-scale problems of chemical and physical modeling, we show superiority of the proposed approach over the state-of-the-art methods

arXiv.org e-Print Archive

Convolutional architectures for virtual screening

Author: Ardizzone E.
Contino S.
Mendolia I.
Perricone U.
Pirrone R.
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 16/09/2020
Field of study

Background: A Virtual Screening algorithm has to adapt to the different stages of this process. Early screening needs to ensure that all bioactive compounds are ranked in the first positions despite of the number of false positives, while a second screening round is aimed at increasing the prediction accuracy. Results: A novel CNN architecture is presented to this aim, which predicts bioactivity of candidate compounds on CDK1 using a combination of molecular fingerprints as their vector representation, and has been trained suitably to achieve good results as regards both enrichment factor and accuracy in different screening modes (98.55% accuracy in active-only selection, and 98.88% in high precision discrimination). Conclusion: The proposed architecture outperforms state-of-the-art ML approaches, and some interesting insights on molecular fingerprints are devised

Archivio istituzionale della ricerca - Università di Palermo

Effect of missing data on multitask prediction methods

Author: A Anighoro
A Mayr
A Tropsha
Antonio de la Vega de León
AP Bento
B Chen
B Ramsundar
Beining Chen
D Fourches
D Rogers
D Weininger
G Harper
J Ma
J Simm
JG Moffat
KY Helal
L Breiman
M Glick
MR Berthold
S Kim
S Knapp
SL Kinnings
SM Wilhelm
T Unterthiner
TWH Backman
Valerie J. Gillet
Y LeCun
Y Wang
Y Xu
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/05/2018
Field of study

There has been a growing interest in multitask prediction in chemoinformatics, helped by the increasing use of deep neural networks in this field. This technique is applied to multitarget data sets, where compounds have been tested against different targets, with the aim of developing models to predict a profile of biological activities for a given compound. However, multitarget data sets tend to be sparse; i.e., not all compound-target combinations have experimental values. There has been little research on the effect of missing data on the performance of multitask methods. We have used two complete data sets to simulate sparseness by removing data from the training set. Different models to remove the data were compared. These sparse sets were used to train two different multitask methods, deep neural networks and Macau, which is a Bayesian probabilistic matrix factorization technique. Results from both methods were remarkably similar and showed that the performance decrease because of missing data is at first small before accelerating after large amounts of data are removed. This work provides a first approximation to assess how much data is required to produce good performance in multitask prediction exercises

Directory of Open Access Journals

Computational Experimentation

Author: Ebrahim Tabrez Y.
Publication venue: CWSL Scholarly Commons
Publication date: 01/01/2019
Field of study

Experimentation conjures images of laboratories and equipment in biotechnology, chemistry, materials science, and pharmaceuticals. Yet modern day experimentation is not limited to only chemical synthesis, but is increasingly computational. Researchers in the unpredictable arts can experiment upon the functions, properties, reactions, and structures of chemical compounds with highly accurate computational techniques. These computational capabilities challenge the enablement and utility patentability requirements. The patent statute requires that the inventor explain how to make and use the invention without undue experimentation and that the invention have at least substantial and specific utility. These patentability requirements do not align with computational research capabilities, which allow inventors to file earlier patent applications, develop prophetic examples, and provide supporting disclosure in the patent specification without necessarily conducting traditional, laboratory-based experiments. This Article explores the contours and applications of computational capabilities on patentability, proposes reforms to the utility doctrine and to patent examination, responds to potential critiques of the proposed reforms, and analyzes innovation policy in the unpredictable arts. In light of increasing computational experimentation, this Article recommends strengthening the utility requirement in order to prevent a state of patent law in which enablement is subsumed into utility

California Western School of Law

Towards Foundational Models for Molecular Learning on Large-Scale Multi-Task Datasets

Recently, pre-trained foundation models have enabled significant advancements in multiple fields. In molecular machine learning, however, where datasets are often hand-curated, and hence typically small, the lack of datasets with labeled features, and codebases to manage those datasets, has hindered the development of foundation models. In this work, we present seven novel datasets categorized by size into three distinct categories: ToyMix, LargeMix and UltraLarge. These datasets push the boundaries in both the scale and the diversity of supervised labels for molecular learning. They cover nearly 100 million molecules and over 3000 sparsely defined tasks, totaling more than 13 billion individual labels of both quantum and biological nature. In comparison, our datasets contain 300 times more data points than the widely used OGB-LSC PCQM4Mv2 dataset, and 13 times more than the quantum-only QM1B dataset. In addition, to support the development of foundational models based on our proposed datasets, we present the Graphium graph machine learning library which simplifies the process of building and training molecular machine learning models for multi-task and multi-level molecular datasets. Finally, we present a range of baseline results as a starting point of multi-task and multi-level training on these datasets. Empirically, we observe that performance on low-resource biological datasets show improvement by also training on large amounts of quantum data. This indicates that there may be potential in multi-task and multi-level training of a foundation model and fine-tuning it to resource-constrained downstream tasks

arXiv.org e-Print Archive

딥러닝 기반의 분자 특성 예측 연구

Author: 조정희
Publication venue: 서울대학교 대학원
Publication date: 01/08/2021
Field of study

학위논문(박사) -- 서울대학교대학원 : 자연과학대학 협동과정 생물정보학전공, 2021.8. 윤성로.Deep learning (DL) has been advanced in various fields, such as vision tasks, language processing, and natural sciences. Recently, several remarkable researches in computational chemistry were accomplished by DL-based methods. However, the chemical system consists of diverse elements and their interactions. As a result, it is not trivial to predict chemical properties which are determined by intrinsically complicated factors. Consequently, conventional approaches usually depend on tremendous calculations for chemical simulations or predictions, which are cost-intensive and time-consuming. To address recent issues, we studied deep learning for computational chemistry. We focused on the chemical property prediction from molecular structure representations. A molecular structure is a complex of atoms and their arrangements. The molecular property is determined by the interactions from all these components. Therefore, molecular structural representations are the key factor in the chemical property prediction tasks. In particular, we explored public property prediction tasks in pharmacology, organic chemistry, and quantum chemistry. Molecular structures can be described as categorical sequences or geometric graphs. We utilized both representational formats for prediction tasks, and achieved competitive model performances. Our studies verified that the molecular representation is essential for various tasks in chemistry, and using appropriate types of neural networks for the representation type is significant to the model predictability.딥러닝 방법론은 이미지 및 언어 처리 분야를 포함하여, 공학 및 자연과학을 포함한 여러 분야에서 진보하였다. 최근에는 특히 계산 화학 분야에서 딥러닝 기반으로 연구된 우수한 성과들이 여럿 보고되었다. 그러나 화학적인 계 내에서는 많은 종류의 요소들과 상호작용들이 복잡하게 얽혀있다. 따라서 이러한 요소들을 이용하여 화학 특성을 예측하는 것은 쉽지 않은 일이다. 결과적으로, 전통적인 방법들은 주로 상당한 비용과 시간이 소요되는 엄청난 계산량을 기반으로 하였다. 이러한 한계점을 해결하기 위하여, 본 연구는 딥러닝을 활용한 화학에서의 계산 문제를 연구하였다. 본 연구에서는 특히 분자 구조 표현 데이터를 이용, 분자의 특성을 예측하는 문제들에 집중하였다. 분자 구조는 다양한 원자들이 특정한 배열을 이루고 있는 복합체이며, 분자 특성은 이러한 원자 및 그들의 상호 관계들에 의하여 결정 된다. 따라서, 분자 구조는 화학적 특성을 예측하는 문제에 있어서 필수적인 요소이다. 본 연구에서는 약학, 유기 화학, 양자 화학 등 다양한 분야에서의 화학 특성 예측연구들을 진행하였다. 분자 구조는 시퀀스 혹은 그래프 형태로 표현할 수 있고, 본 연구에서는 두 가지 형태를 모두 활용하여서 진행하였다. 본 연구는 분자 표현이 화학 분야 내의 여러 가지 태스크에 활용 될 수 있으며, 분자 표현에 따른 적절한 딥러닝 모델의 선택이 모델 성능을 크게 높일 수 있음을 보였다.1 Introduction 1 1.1 Motivation 1 1.2 Contents of dissertation 3 2 Background 8 2.1 Deep learning in Chemistry 8 2.2 Deep Learning for molecular property prediction 9 2.3 Approaches for molecular property prediction 12 2.3.1 Sequential modeling for molecular string 12 2.3.2 Structural modeling for molecular graph 15 2.4 Tasks on molecular properties 20 2.4.1 Pharmacological tasks 20 2.4.2 Biophysical and physiological tasks 21 2.4.3 Quantum-mechanical tasks 21 3 Application I. Drug class classification 23 3.1 Introduction 23 3.2 Proposed method 26 3.2.1 Preprocessing 27 3.2.2 Model architecture 27 3.2.3 Training and evaluation 30 3.3 Experimental results 31 3.4 Discussion 37 4 Application II. Biophysical property prediction 39 4.1 Introduction 39 4.2 Proposed method 41 4.2.1 Preprocessing 41 4.2.2 model architecture 42 4.2.3 Training and evaluation 45 4.3 Experimental results 47 4.4 Discussion 53 5 Application III. Quantum-mechanical property prediction 55 5.1 Introduction 55 5.2 Proposed method 57 5.2.1 Preprocessing 59 5.2.2 Model architecture 62 5.2.3 Training and evaluation 67 5.3 Experimental results 69 5.4 Discussion 70 6 Conclusion 74 Bibliography 76 초 록 93박