1,382 research outputs found
MR-GNN: Multi-Resolution and Dual Graph Neural Network for Predicting Structured Entity Interactions
Predicting interactions between structured entities lies at the core of
numerous tasks such as drug regimen and new material design. In recent years,
graph neural networks have become attractive. They represent structured
entities as graphs and then extract features from each individual graph using
graph convolution operations. However, these methods have some limitations: i)
their networks only extract features from a fix-sized subgraph structure (i.e.,
a fix-sized receptive field) of each node, and ignore features in substructures
of different sizes, and ii) features are extracted by considering each entity
independently, which may not effectively reflect the interaction between two
entities. To resolve these problems, we present MR-GNN, an end-to-end graph
neural network with the following features: i) it uses a multi-resolution based
architecture to extract node features from different neighborhoods of each
node, and, ii) it uses dual graph-state long short-term memory networks
(L-STMs) to summarize local features of each graph and extracts the interaction
features between pairwise graphs. Experiments conducted on real-world datasets
show that MR-GNN improves the prediction of state-of-the-art methods.Comment: Accepted by IJCAI 201
TeachOpenCADD: a teaching platform for computer-aided drug design using open source packages and data
Owing to the increase in freely available software and data for cheminformatics and structural bioinformatics, research for computer-aided drug design (CADD) is more and more built on modular, reproducible, and easy-to-share pipelines. While documentation for such tools is available, there are only a few freely accessible examples that teach the underlying concepts focused on CADD, especially addressing users new to the field. Here, we present TeachOpenCADD, a teaching platform developed by students for students, using open source compound and protein data as well as basic and CADD-related Python packages. We provide interactive Jupyter notebooks for central CADD topics, integrating theoretical background and practical code. TeachOpenCADD is freely available on GitHub: https://github.com/volkamerlab/TeachOpenCAD
A customizable multi-agent system for distributed data mining
We present a general Multi-Agent System framework for
distributed data mining based on a Peer-to-Peer model. Agent
protocols are implemented through message-based asynchronous
communication. The framework adopts a dynamic load balancing
policy that is particularly suitable for irregular search algorithms. A modular design allows a separation of the general-purpose system protocols and software components from the specific data mining algorithm. The experimental evaluation has been carried out on a parallel frequent subgraph mining algorithm, which has shown good scalability performances
μ¬μΈ΅νμ΅μ μ΄μ©ν μ‘체κ³μ μ±μ§ μμΈ‘
νμλ
Όλ¬Έ(λ°μ¬)--μμΈλνκ΅ λνμ :μμ°κ³Όνλν ννλΆ,2020. 2. μ μ°μ€.μ΅κ·Ό κΈ°κ³νμ΅ κΈ°μ μ κΈκ²©ν λ°μ κ³Ό μ΄μ νν λΆμΌμ λν μ μ©μ λ€μν ννμ μ±μ§μ λν ꡬ쑰-μ±μ§ μ λ κ΄κ³λ₯Ό κΈ°λ°μΌλ‘ ν μμΈ‘ λͺ¨νμ κ°λ°μ κ°μνκ³ μλ€. μ©λ§€ν μμ μλμ§λ κ·Έλ¬ν κΈ°κ³νμ΅μ μ μ© μμ€ νλμ΄λ©° λ€μν μ©λ§€ λ΄μ ννλ°μμμ μ€μν μν μ νλ κ·Όλ³Έμ μ±μ§ μ€ νλμ΄λ€. λ³Έ μ°κ΅¬μμ μ°λ¦¬λ λͺ©νλ‘ νλ μ©λ§€ν μμ μλμ§λ₯Ό μμκ°μ μνΈμμ©μΌλ‘λΆν° ꡬν μ μλ μλ‘μ΄ μ¬μΈ΅νμ΅ κΈ°λ° μ©λ§€ν λͺ¨νμ μκ°νλ€. μ μλ μ¬μΈ΅νμ΅ λͺ¨νμ κ³μ° κ³Όμ μ μ©λ§€μ μ©μ§ λΆμμ λν λΆνΈν ν¨μκ° κ° μμμ λΆμλ€μ ꡬ쑰μ μ±μ§μ λν λ²‘ν° ννμ μΆμΆνλ©°, μ΄λ₯Ό ν λλ‘ μμκ° μνΈμμ©μ 볡μ‘ν νΌμ
νΈλ‘ μ κ²½λ§ λμ 벑ν°κ°μ κ°λ¨ν λ΄μ μΌλ‘ ꡬν μ μλ€. 952κ°μ§μ μ κΈ°μ©μ§κ³Ό 147κ°μ§μ μ κΈ°μ©λ§€λ₯Ό ν¬ν¨νλ 6,493κ°μ§μ μ€νμΉλ₯Ό ν λλ‘ κΈ°κ³νμ΅ λͺ¨νμ κ΅μ°¨ κ²μ¦ μνμ μ€μν κ²°κ³Ό, νκ· μ λ μ€μ°¨ κΈ°μ€ 0.2 kcal/mol μμ€μΌλ‘ λ§€μ° λμ μ νλλ₯Ό κ°μ§λ€. μ€μΊν΄λ-κΈ°λ° κ΅μ°¨ κ²μ¦μ κ²°κ³Ό μμ 0.6 kcal/mol μμ€μΌλ‘, μΈμ½μΌλ‘ λΆλ₯ν μ μλ λΉκ΅μ μλ‘μ΄ λΆμ ꡬ쑰μ λν μμΈ‘μ λν΄μλ μ°μν μ νλλ₯Ό 보μΈλ€. λν, μ μλ νΉμ κΈ°κ³νμ΅ λͺ¨νμ κ·Έ ꡬ쑰 μ νΉμ μ©λ§€μ νΉνλμ§ μμκΈ° λλ¬Έμ λμ μλμ±μ κ°μ§λ©° νμ΅μ μ΄μ©ν λ°μ΄ν°μ μλ₯Ό λμ΄λ λ° μ©μ΄νλ€. μμκ° μνΈμμ©μ λν λΆμμ ν΅ν΄ μ μλ μ¬μΈ΅νμ΅ λͺ¨ν μ©λ§€ν μμ μλμ§μ λν κ·Έλ£Ή-κΈ°μ¬λλ₯Ό μ μ¬νν μ μμμ μ μ μμΌλ©°, κΈ°κ³νμ΅μ ν΅ν΄ λ¨μν λͺ©νλ‘ νλ μ±μ§λ§μ μμΈ‘νλ κ²μ λμ΄ λμ± μμΈν 물리ννμ μ΄ν΄λ₯Ό νλ κ²μ΄ κ°λ₯ν κ²μ΄λΌ κΈ°λν μ μλ€.Recent advances in machine learning technologies and their chemical applications lead to the developments of diverse structure-property relationship based prediction models for various chemical properties; the free energy of solvation is one of them and plays a dominant role as a fundamental measure of solvation chemistry. Here, we introduce a novel machine learning-based solvation model, which calculates the target solvation free energy from pairwise atomistic interactions. The novelty of our proposed solvation model involves rather simple architecture: two encoding function extracts vector representations of the atomic and the molecular features from the given chemical structure, while the inner product between two atomistic features calculates their interactions, instead of black-boxed perceptron networks. The cross-validation result on 6,493 experimental measurements for 952 organic solutes and 147 organic solvents achieves an outstanding performance, which is 0.2 kcal/mol in MUE. The scaffold-based split method exhibits 0.6 kcal/mol, which shows that the proposed model guarantees reasonable accuracy even for extrapolated cases. Moreover, the proposed model shows an excellent transferability for enlarging training data due to its solvent-non-specific nature. Analysis of the atomistic interaction map shows there is a great potential that our proposed model reproduces group contributions on the solvation energy, which makes us believe that the proposed model not only provides the predicted target property, but also gives us more detailed physicochemical insights.1. Introduction 1
2. Delfos: Deep Learning Model for Prediction of Solvation Free Energies in Generic Organic Solvents 7
2.1. Methods 7
2.1.1. Embedding of Chemical Contexts 7
2.1.2. Encoder-Predictor Network 9
2.2. Results and Discussions 13
2.2.1. Computational Setup and Results 13
2.2.2. Transferability of the Model for New Compounds 17
2.2.3. Visualization of Attention Mechanism 26
3. Group Contribution Method for the Solvation Energy Estimation with Vector Representations of Atom 29
3.1. Model Description 29
3.1.1. Word Embedding 29
3.1.2. Network Architecture 33
3.2. Results and Discussions 39
3.2.1. Computational Details 39
3.2.2. Prediction Accuracy 42
3.2.3. Model Transferability 44
3.2.4. Group Contributions of Solvation Energy 49
4. Empirical Structure-Property Relationship Model for Liquid Transport Properties 55
5. Concluding Remarks 61
A. Analyzing Kinetic Trapping as a First-Order Dynamical Phase Transition in the Ensemble of Stochastic Trajectories 65
A1. Introduction 65
A2. Theory 68
A3. Lattice Gas Model 70
A4. Mathematical Model 73
A5. Dynamical Phase Transitions 75
A6. Conclusion 82
B. Reaction-Path Thermodynamics of the Michaelis-Menten Kinetics 85
B1. Introduction 85
B2. Reaction Path Thermodynamics 88
B3. Fixed Observation Time 94
B4. Conclusions 101Docto
What is hidden in the darkness? Deep-learning assisted large-scale protein family curation uncovers novel protein families and folds
Driven by the development and upscaling of fast genome sequencing and assembly pipelines, the number of protein-coding sequences deposited in public protein sequence databases is increasing exponentially. Recently, the dramatic success of deep learning-based approaches applied to protein structure prediction has done the same for protein structures. We are now entering a new era in protein sequence and structure annotation, with hundreds of millions of predicted protein structures made available through the AlphaFold database. These models cover most of the catalogued natural proteins, including those difficult to annotate for function or putative biological role based on standard, homology-based approaches. In this work, we quantified how much of such "dark matter" of the natural protein universe was structurally illuminated by AlphaFold2 and modelled this diversity as an interactive sequence similarity network that can be navigated at https://uniprot3d.org/atlas/AFDB90v4 . In the process, we discovered multiple novel protein families by searching for novelties from sequence, structure, and semantic perspectives. We added a number of them to Pfam, and experimentally demonstrate that one of these belongs to a novel superfamily of toxin-antitoxin systems, TumE-TumA. This work highlights the role of large-scale, evolution-driven protein comparison efforts in combination with structural similarities, genomic context conservation, and deep-learning based function prediction tools for the identification of novel protein families, aiding not only annotation and classification efforts but also the curation and prioritisation of target proteins for experimental characterisation
- β¦