Search CORE

1,382 research outputs found

MR-GNN: Multi-Resolution and Dual Graph Neural Network for Predicting Structured Entity Interactions

Author: Chen Long
Tao Jing
Wang Pinghui
Xu Nuo
Zhao Junzhou
Publication venue: 'International Joint Conferences on Artificial Intelligence'
Publication date: 23/05/2019
Field of study

Predicting interactions between structured entities lies at the core of numerous tasks such as drug regimen and new material design. In recent years, graph neural networks have become attractive. They represent structured entities as graphs and then extract features from each individual graph using graph convolution operations. However, these methods have some limitations: i) their networks only extract features from a fix-sized subgraph structure (i.e., a fix-sized receptive field) of each node, and ignore features in substructures of different sizes, and ii) features are extracted by considering each entity independently, which may not effectively reflect the interaction between two entities. To resolve these problems, we present MR-GNN, an end-to-end graph neural network with the following features: i) it uses a multi-resolution based architecture to extract node features from different neighborhoods of each node, and, ii) it uses dual graph-state long short-term memory networks (L-STMs) to summarize local features of each graph and extracts the interaction features between pairwise graphs. Experiments conducted on real-world datasets show that MR-GNN improves the prediction of state-of-the-art methods.Comment: Accepted by IJCAI 201

arXiv.org e-Print Archive

Crossref

TeachOpenCADD: a teaching platform for computer-aided drug design using open source packages and data

Author: Driller Maximilian
Morger Andrea
Sydow Dominique
Volkamer Andrea
Publication venue
Publication date: 01/01/2019
Field of study

Owing to the increase in freely available software and data for cheminformatics and structural bioinformatics, research for computer-aided drug design (CADD) is more and more built on modular, reproducible, and easy-to-share pipelines. While documentation for such tools is available, there are only a few freely accessible examples that teach the underlying concepts focused on CADD, especially addressing users new to the field. Here, we present TeachOpenCADD, a teaching platform developed by students for students, using open source compound and protein data as well as basic and CADD-related Python packages. We provide interactive Jupyter notebooks for central CADD topics, integrating theoretical background and practical code. TeachOpenCADD is freely available on GitHub: https://github.com/volkamerlab/TeachOpenCAD

Institutional Repository of the Freie Universität Berlin

Directory of Open Access Journals

A customizable multi-agent system for distributed data mining

Author: Di Fatta Giuseppe
Fortino Giancarlo
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2007
Field of study

We present a general Multi-Agent System framework for distributed data mining based on a Peer-to-Peer model. Agent protocols are implemented through message-based asynchronous communication. The framework adopts a dynamic load balancing policy that is particularly suitable for irregular search algorithms. A modular design allows a separation of the general-purpose system protocols and software components from the specific data mining algorithm. The experimental evaluation has been carried out on a parallel frequent subgraph mining algorithm, which has shown good scalability performances

Central Archive at the University of Reading

CiteSeerX

Crossref

심층학습을 이용한 액체계의 성질 예측

Author: 임현태
Publication venue: 서울대학교 대학원
Publication date: 01/02/2020
Field of study

학위논문(박사)--서울대학교 대학원 :자연과학대학 화학부,2020. 2. 정연준.최근 기계학습 기술의 급격한 발전과 이의 화학 분야에 대한 적용은 다양한 화학적 성질에 대한 구조-성질 정량 관계를 기반으로 한 예측 모형의 개발을 가속하고 있다. 용매화 자유 에너지는 그러한 기계학습의 적용 예중 하나이며 다양한 용매 내의 화학반응에서 중요한 역할을 하는 근본적 성질 중 하나이다. 본 연구에서 우리는 목표로 하는 용매화 자유 에너지를 원자간의 상호작용으로부터 구할 수 있는 새로운 심층학습 기반 용매화 모형을 소개한다. 제안된 심층학습 모형의 계산 과정은 용매와 용질 분자에 대한 부호화 함수가 각 원자와 분자들의 구조적 성질에 대한 벡터 표현을 추출하며, 이를 토대로 원자간 상호작용을 복잡한 퍼셉트론 신경망 대신 벡터간의 간단한 내적으로 구할 수 있다. 952가지의 유기용질과 147가지의 유기용매를 포함하는 6,493가지의 실험치를 토대로 기계학습 모형의 교차 검증 시험을 실시한 결과, 평균 절대 오차 기준 0.2 kcal/mol 수준으로 매우 높은 정확도를 가진다. 스캐폴드-기반 교차 검증의 결과 역시 0.6 kcal/mol 수준으로, 외삽으로 분류할 수 있는 비교적 새로운 분자 구조에 대한 예측에 대해서도 우수한 정확도를 보인다. 또한, 제안된 특정 기계학습 모형은 그 구조 상 특정 용매에 특화되지 않았기 때문에 높은 양도성을 가지며 학습에 이용할 데이터의 수를 늘이는 데 용이하다. 원자간 상호작용에 대한 분석을 통해 제안된 심층학습 모형 용매화 자유 에너지에 대한 그룹-기여도를 잘 재현할 수 있음을 알 수 있으며, 기계학습을 통해 단순히 목표로 하는 성질만을 예측하는 것을 넘어 더욱 상세한 물리화학적 이해를 하는 것이 가능할 것이라 기대할 수 있다.Recent advances in machine learning technologies and their chemical applications lead to the developments of diverse structure-property relationship based prediction models for various chemical properties; the free energy of solvation is one of them and plays a dominant role as a fundamental measure of solvation chemistry. Here, we introduce a novel machine learning-based solvation model, which calculates the target solvation free energy from pairwise atomistic interactions. The novelty of our proposed solvation model involves rather simple architecture: two encoding function extracts vector representations of the atomic and the molecular features from the given chemical structure, while the inner product between two atomistic features calculates their interactions, instead of black-boxed perceptron networks. The cross-validation result on 6,493 experimental measurements for 952 organic solutes and 147 organic solvents achieves an outstanding performance, which is 0.2 kcal/mol in MUE. The scaffold-based split method exhibits 0.6 kcal/mol, which shows that the proposed model guarantees reasonable accuracy even for extrapolated cases. Moreover, the proposed model shows an excellent transferability for enlarging training data due to its solvent-non-specific nature. Analysis of the atomistic interaction map shows there is a great potential that our proposed model reproduces group contributions on the solvation energy, which makes us believe that the proposed model not only provides the predicted target property, but also gives us more detailed physicochemical insights.1. Introduction 1 2. Delfos: Deep Learning Model for Prediction of Solvation Free Energies in Generic Organic Solvents 7 2.1. Methods 7 2.1.1. Embedding of Chemical Contexts 7 2.1.2. Encoder-Predictor Network 9 2.2. Results and Discussions 13 2.2.1. Computational Setup and Results 13 2.2.2. Transferability of the Model for New Compounds 17 2.2.3. Visualization of Attention Mechanism 26 3. Group Contribution Method for the Solvation Energy Estimation with Vector Representations of Atom 29 3.1. Model Description 29 3.1.1. Word Embedding 29 3.1.2. Network Architecture 33 3.2. Results and Discussions 39 3.2.1. Computational Details 39 3.2.2. Prediction Accuracy 42 3.2.3. Model Transferability 44 3.2.4. Group Contributions of Solvation Energy 49 4. Empirical Structure-Property Relationship Model for Liquid Transport Properties 55 5. Concluding Remarks 61 A. Analyzing Kinetic Trapping as a First-Order Dynamical Phase Transition in the Ensemble of Stochastic Trajectories 65 A1. Introduction 65 A2. Theory 68 A3. Lattice Gas Model 70 A4. Mathematical Model 73 A5. Dynamical Phase Transitions 75 A6. Conclusion 82 B. Reaction-Path Thermodynamics of the Michaelis-Menten Kinetics 85 B1. Introduction 85 B2. Reaction Path Thermodynamics 88 B3. Fixed Observation Time 94 B4. Conclusions 101Docto

SNU Open Repository and Archive

What is hidden in the darkness? Deep-learning assisted large-scale protein family curation uncovers novel protein families and folds

Author: Abdullah Minhal
Akdel Mehmet
Andreeva Antonina
Bateman Alex
Brodiazhenko Tetiana
Durairaj Janani
Hauryliuk Vasili
Mets Toomas
Pereira Joana
Schwede Torsten
Studer Gabriel
Tenson Tanel
Waterhouse Andrew M.
Publication venue: Cold Spring Harbor Laboratory
Publication date: 19/03/2023
Field of study

Driven by the development and upscaling of fast genome sequencing and assembly pipelines, the number of protein-coding sequences deposited in public protein sequence databases is increasing exponentially. Recently, the dramatic success of deep learning-based approaches applied to protein structure prediction has done the same for protein structures. We are now entering a new era in protein sequence and structure annotation, with hundreds of millions of predicted protein structures made available through the AlphaFold database. These models cover most of the catalogued natural proteins, including those difficult to annotate for function or putative biological role based on standard, homology-based approaches. In this work, we quantified how much of such "dark matter" of the natural protein universe was structurally illuminated by AlphaFold2 and modelled this diversity as an interactive sequence similarity network that can be navigated at https://uniprot3d.org/atlas/AFDB90v4 . In the process, we discovered multiple novel protein families by searching for novelties from sequence, structure, and semantic perspectives. We added a number of them to Pfam, and experimentally demonstrate that one of these belongs to a novel superfamily of toxin-antitoxin systems, TumE-TumA. This work highlights the role of large-scale, evolution-driven protein comparison efforts in combination with structural similarities, genomic context conservation, and deep-learning based function prediction tools for the identification of novel protein families, aiding not only annotation and classification efforts but also the curation and prioritisation of target proteins for experimental characterisation

edoc