1,382 research outputs found

    MR-GNN: Multi-Resolution and Dual Graph Neural Network for Predicting Structured Entity Interactions

    Full text link
    Predicting interactions between structured entities lies at the core of numerous tasks such as drug regimen and new material design. In recent years, graph neural networks have become attractive. They represent structured entities as graphs and then extract features from each individual graph using graph convolution operations. However, these methods have some limitations: i) their networks only extract features from a fix-sized subgraph structure (i.e., a fix-sized receptive field) of each node, and ignore features in substructures of different sizes, and ii) features are extracted by considering each entity independently, which may not effectively reflect the interaction between two entities. To resolve these problems, we present MR-GNN, an end-to-end graph neural network with the following features: i) it uses a multi-resolution based architecture to extract node features from different neighborhoods of each node, and, ii) it uses dual graph-state long short-term memory networks (L-STMs) to summarize local features of each graph and extracts the interaction features between pairwise graphs. Experiments conducted on real-world datasets show that MR-GNN improves the prediction of state-of-the-art methods.Comment: Accepted by IJCAI 201

    TeachOpenCADD: a teaching platform for computer-aided drug design using open source packages and data

    Get PDF
    Owing to the increase in freely available software and data for cheminformatics and structural bioinformatics, research for computer-aided drug design (CADD) is more and more built on modular, reproducible, and easy-to-share pipelines. While documentation for such tools is available, there are only a few freely accessible examples that teach the underlying concepts focused on CADD, especially addressing users new to the field. Here, we present TeachOpenCADD, a teaching platform developed by students for students, using open source compound and protein data as well as basic and CADD-related Python packages. We provide interactive Jupyter notebooks for central CADD topics, integrating theoretical background and practical code. TeachOpenCADD is freely available on GitHub: https://github.com/volkamerlab/TeachOpenCAD

    A customizable multi-agent system for distributed data mining

    Get PDF
    We present a general Multi-Agent System framework for distributed data mining based on a Peer-to-Peer model. Agent protocols are implemented through message-based asynchronous communication. The framework adopts a dynamic load balancing policy that is particularly suitable for irregular search algorithms. A modular design allows a separation of the general-purpose system protocols and software components from the specific data mining algorithm. The experimental evaluation has been carried out on a parallel frequent subgraph mining algorithm, which has shown good scalability performances

    μ‹¬μΈ΅ν•™μŠ΅μ„ μ΄μš©ν•œ μ•‘μ²΄κ³„μ˜ μ„±μ§ˆ 예츑

    Get PDF
    ν•™μœ„λ…Όλ¬Έ(박사)--μ„œμšΈλŒ€ν•™κ΅ λŒ€ν•™μ› :μžμ—°κ³Όν•™λŒ€ν•™ ν™”ν•™λΆ€,2020. 2. μ •μ—°μ€€.졜근 κΈ°κ³„ν•™μŠ΅ 기술의 κΈ‰κ²©ν•œ λ°œμ „κ³Ό 이의 ν™”ν•™ 뢄야에 λŒ€ν•œ μ μš©μ€ λ‹€μ–‘ν•œ 화학적 μ„±μ§ˆμ— λŒ€ν•œ ꡬ쑰-μ„±μ§ˆ μ •λŸ‰ 관계λ₯Ό 기반으둜 ν•œ 예츑 λͺ¨ν˜•μ˜ κ°œλ°œμ„ κ°€μ†ν•˜κ³  μžˆλ‹€. μš©λ§€ν™” 자유 μ—λ„ˆμ§€λŠ” κ·ΈλŸ¬ν•œ κΈ°κ³„ν•™μŠ΅μ˜ 적용 μ˜ˆμ€‘ ν•˜λ‚˜μ΄λ©° λ‹€μ–‘ν•œ 용맀 λ‚΄μ˜ ν™”ν•™λ°˜μ‘μ—μ„œ μ€‘μš”ν•œ 역할을 ν•˜λŠ” 근본적 μ„±μ§ˆ 쀑 ν•˜λ‚˜μ΄λ‹€. λ³Έ μ—°κ΅¬μ—μ„œ μš°λ¦¬λŠ” λͺ©ν‘œλ‘œ ν•˜λŠ” μš©λ§€ν™” 자유 μ—λ„ˆμ§€λ₯Ό μ›μžκ°„μ˜ μƒν˜Έμž‘μš©μœΌλ‘œλΆ€ν„° ꡬ할 수 μžˆλŠ” μƒˆλ‘œμš΄ μ‹¬μΈ΅ν•™μŠ΅ 기반 μš©λ§€ν™” λͺ¨ν˜•μ„ μ†Œκ°œν•œλ‹€. μ œμ•ˆλœ μ‹¬μΈ΅ν•™μŠ΅ λͺ¨ν˜•μ˜ 계산 과정은 μš©λ§€μ™€ 용질 λΆ„μžμ— λŒ€ν•œ λΆ€ν˜Έν™” ν•¨μˆ˜κ°€ 각 μ›μžμ™€ λΆ„μžλ“€μ˜ ꡬ쑰적 μ„±μ§ˆμ— λŒ€ν•œ 벑터 ν‘œν˜„μ„ μΆ”μΆœν•˜λ©°, 이λ₯Ό ν† λŒ€λ‘œ μ›μžκ°„ μƒν˜Έμž‘μš©μ„ λ³΅μž‘ν•œ νΌμ…‰νŠΈλ‘  신경망 λŒ€μ‹  λ²‘ν„°κ°„μ˜ κ°„λ‹¨ν•œ λ‚΄μ μœΌλ‘œ ꡬ할 수 μžˆλ‹€. 952κ°€μ§€μ˜ 유기용질과 147κ°€μ§€μ˜ 유기용맀λ₯Ό ν¬ν•¨ν•˜λŠ” 6,493κ°€μ§€μ˜ μ‹€ν—˜μΉ˜λ₯Ό ν† λŒ€λ‘œ κΈ°κ³„ν•™μŠ΅ λͺ¨ν˜•μ˜ ꡐ차 검증 μ‹œν—˜μ„ μ‹€μ‹œν•œ κ²°κ³Ό, 평균 μ ˆλŒ€ 였차 κΈ°μ€€ 0.2 kcal/mol μˆ˜μ€€μœΌλ‘œ 맀우 높은 정확도λ₯Ό 가진닀. μŠ€μΊν΄λ“œ-기반 ꡐ차 κ²€μ¦μ˜ κ²°κ³Ό μ—­μ‹œ 0.6 kcal/mol μˆ˜μ€€μœΌλ‘œ, μ™Έμ‚½μœΌλ‘œ λΆ„λ₯˜ν•  수 μžˆλŠ” 비ꡐ적 μƒˆλ‘œμš΄ λΆ„μž ꡬ쑰에 λŒ€ν•œ μ˜ˆμΈ‘μ— λŒ€ν•΄μ„œλ„ μš°μˆ˜ν•œ 정확도λ₯Ό 보인닀. λ˜ν•œ, μ œμ•ˆλœ νŠΉμ • κΈ°κ³„ν•™μŠ΅ λͺ¨ν˜•μ€ κ·Έ ꡬ쑰 상 νŠΉμ • μš©λ§€μ— νŠΉν™”λ˜μ§€ μ•Šμ•˜κΈ° λ•Œλ¬Έμ— 높은 양도성을 가지며 ν•™μŠ΅μ— μ΄μš©ν•  λ°μ΄ν„°μ˜ 수λ₯Ό λŠ˜μ΄λŠ” 데 μš©μ΄ν•˜λ‹€. μ›μžκ°„ μƒν˜Έμž‘μš©μ— λŒ€ν•œ 뢄석을 톡해 μ œμ•ˆλœ μ‹¬μΈ΅ν•™μŠ΅ λͺ¨ν˜• μš©λ§€ν™” 자유 μ—λ„ˆμ§€μ— λŒ€ν•œ κ·Έλ£Ή-기여도λ₯Ό 잘 μž¬ν˜„ν•  수 μžˆμŒμ„ μ•Œ 수 있으며, κΈ°κ³„ν•™μŠ΅μ„ 톡해 λ‹¨μˆœνžˆ λͺ©ν‘œλ‘œ ν•˜λŠ” μ„±μ§ˆλ§Œμ„ μ˜ˆμΈ‘ν•˜λŠ” 것을 λ„˜μ–΄ λ”μš± μƒμ„Έν•œ 물리화학적 이해λ₯Ό ν•˜λŠ” 것이 κ°€λŠ₯ν•  것이라 κΈ°λŒ€ν•  수 μžˆλ‹€.Recent advances in machine learning technologies and their chemical applications lead to the developments of diverse structure-property relationship based prediction models for various chemical properties; the free energy of solvation is one of them and plays a dominant role as a fundamental measure of solvation chemistry. Here, we introduce a novel machine learning-based solvation model, which calculates the target solvation free energy from pairwise atomistic interactions. The novelty of our proposed solvation model involves rather simple architecture: two encoding function extracts vector representations of the atomic and the molecular features from the given chemical structure, while the inner product between two atomistic features calculates their interactions, instead of black-boxed perceptron networks. The cross-validation result on 6,493 experimental measurements for 952 organic solutes and 147 organic solvents achieves an outstanding performance, which is 0.2 kcal/mol in MUE. The scaffold-based split method exhibits 0.6 kcal/mol, which shows that the proposed model guarantees reasonable accuracy even for extrapolated cases. Moreover, the proposed model shows an excellent transferability for enlarging training data due to its solvent-non-specific nature. Analysis of the atomistic interaction map shows there is a great potential that our proposed model reproduces group contributions on the solvation energy, which makes us believe that the proposed model not only provides the predicted target property, but also gives us more detailed physicochemical insights.1. Introduction 1 2. Delfos: Deep Learning Model for Prediction of Solvation Free Energies in Generic Organic Solvents 7 2.1. Methods 7 2.1.1. Embedding of Chemical Contexts 7 2.1.2. Encoder-Predictor Network 9 2.2. Results and Discussions 13 2.2.1. Computational Setup and Results 13 2.2.2. Transferability of the Model for New Compounds 17 2.2.3. Visualization of Attention Mechanism 26 3. Group Contribution Method for the Solvation Energy Estimation with Vector Representations of Atom 29 3.1. Model Description 29 3.1.1. Word Embedding 29 3.1.2. Network Architecture 33 3.2. Results and Discussions 39 3.2.1. Computational Details 39 3.2.2. Prediction Accuracy 42 3.2.3. Model Transferability 44 3.2.4. Group Contributions of Solvation Energy 49 4. Empirical Structure-Property Relationship Model for Liquid Transport Properties 55 5. Concluding Remarks 61 A. Analyzing Kinetic Trapping as a First-Order Dynamical Phase Transition in the Ensemble of Stochastic Trajectories 65 A1. Introduction 65 A2. Theory 68 A3. Lattice Gas Model 70 A4. Mathematical Model 73 A5. Dynamical Phase Transitions 75 A6. Conclusion 82 B. Reaction-Path Thermodynamics of the Michaelis-Menten Kinetics 85 B1. Introduction 85 B2. Reaction Path Thermodynamics 88 B3. Fixed Observation Time 94 B4. Conclusions 101Docto

    What is hidden in the darkness? Deep-learning assisted large-scale protein family curation uncovers novel protein families and folds

    Get PDF
    Driven by the development and upscaling of fast genome sequencing and assembly pipelines, the number of protein-coding sequences deposited in public protein sequence databases is increasing exponentially. Recently, the dramatic success of deep learning-based approaches applied to protein structure prediction has done the same for protein structures. We are now entering a new era in protein sequence and structure annotation, with hundreds of millions of predicted protein structures made available through the AlphaFold database. These models cover most of the catalogued natural proteins, including those difficult to annotate for function or putative biological role based on standard, homology-based approaches. In this work, we quantified how much of such "dark matter" of the natural protein universe was structurally illuminated by AlphaFold2 and modelled this diversity as an interactive sequence similarity network that can be navigated at https://uniprot3d.org/atlas/AFDB90v4 . In the process, we discovered multiple novel protein families by searching for novelties from sequence, structure, and semantic perspectives. We added a number of them to Pfam, and experimentally demonstrate that one of these belongs to a novel superfamily of toxin-antitoxin systems, TumE-TumA. This work highlights the role of large-scale, evolution-driven protein comparison efforts in combination with structural similarities, genomic context conservation, and deep-learning based function prediction tools for the identification of novel protein families, aiding not only annotation and classification efforts but also the curation and prioritisation of target proteins for experimental characterisation
    • …
    corecore