20 research outputs found

    Geometry-Complete Perceptron Networks for 3D Molecular Graphs

    Full text link
    The field of geometric deep learning has had a profound impact on the development of innovative and powerful graph neural network architectures. Disciplines such as computer vision and computational biology have benefited significantly from such methodological advances, which has led to breakthroughs in scientific domains such as protein structure prediction and design. In this work, we introduce GCPNet, a new geometry-complete, SE(3)-equivariant graph neural network designed for 3D graph representation learning. We demonstrate the state-of-the-art utility and expressiveness of our method on six independent datasets designed for three distinct geometric tasks: protein-ligand binding affinity prediction, protein structure ranking, and Newtonian many-body systems modeling. Our results suggest that GCPNet is a powerful, general method for capturing complex geometric and physical interactions within 3D graphs for downstream prediction tasks. The source code, data, and instructions to train new models or reproduce our results are freely available on GitHub.Comment: 7 pages, 1 figure, 3 tables. Under review. Code available at https://github.com/BioinfoMachineLearning/GCPNe

    Geometry-Complete Diffusion for 3D Molecule Generation

    Full text link
    Denoising diffusion probabilistic models (DDPMs) have recently taken the field of generative modeling by storm, pioneering new state-of-the-art results in disciplines such as computer vision and computational biology for diverse tasks ranging from text-guided image generation to structure-guided protein design. Along this latter line of research, methods such as those of Hoogeboom et al. 2022 have been proposed for unconditionally generating 3D molecules using equivariant graph neural networks (GNNs) within a DDPM framework. Toward this end, we propose GCDM, a geometry-complete diffusion model that achieves new state-of-the-art results for 3D molecule diffusion generation by leveraging the representation learning strengths offered by GNNs that perform geometry-complete message-passing. Our results with GCDM also offer preliminary insights into how physical inductive biases impact the generative dynamics of molecular DDPMs. The source code, data, and instructions to train new models or reproduce our results are freely available at https://github.com/BioinfoMachineLearning/bio-diffusion.Comment: 13 pages, 1 figure, 3 tables. Under review. Code available at https://github.com/BioinfoMachineLearning/bio-diffusio

    Towards Joint Sequence-Structure Generation of Nucleic Acid and Protein Complexes with SE(3)-Discrete Diffusion

    Full text link
    Generative models of macromolecules carry abundant and impactful implications for industrial and biomedical efforts in protein engineering. However, existing methods are currently limited to modeling protein structures or sequences, independently or jointly, without regard to the interactions that commonly occur between proteins and other macromolecules. In this work, we introduce MMDiff, a generative model that jointly designs sequences and structures of nucleic acid and protein complexes, independently or in complex, using joint SE(3)-discrete diffusion noise. Such a model has important implications for emerging areas of macromolecular design including structure-based transcription factor design and design of noncoding RNA sequences. We demonstrate the utility of MMDiff through a rigorous new design benchmark for macromolecular complex generation that we introduce in this work. Our results demonstrate that MMDiff is able to successfully generate micro-RNA and single-stranded DNA molecules while being modestly capable of joint modeling DNA and RNA molecules in interaction with multi-chain protein complexes. Source code: https://github.com/Profluent-Internships/MMDiff.Comment: 15 pages, 11 figures, presented at the NeurIPS 2023 Machine Learning in Structural Biology (MLSB) workshop. Code available at https://github.com/Profluent-Internships/MMDif

    Low Cost Gunshot Detection using Deep Learning on the Raspberry Pi

    Get PDF
    Many cities using gunshot detection technology depend on expensive systems that ultimately rely on humans differentiating between gunshots and non-gunshots, such as ShotSpotter. Thus, a scalable gunshot detection system that is low in cost and high in accuracy would be advantageous for a variety of cities across the globe, in that it would favorably promote the delegation of tasks typically worked by humans to machines. A repository of audio data was created from sound clips collected from online audio databases as well as from clips recorded using a USB microphone in residential areas and at a gun range. One-dimensional as well as two-dimensional convolutional neural networks were then trained on this sound data, and spectrograms created from this sound data, to recognize gunshots. These models were deployed to a Raspberry Pi 3 Model B+ with a short message service modem and a USB microphone attached, using a software pipeline to continuously analyze discrete two-second chunks of audio and alert a set of phone numbers if a gunshot is detected in that chunk. Testing found that a majority-rules ensemble of our one-dimensional and two-dimensional models fared best, with an accuracy above 99% on validation data as well as when distinguishing gunshots from fireworks. Besides increasing the safety standards for a city's residents, the findings generated by this research project expand the current state of knowledge regarding sound-based applications of convolutional neural networks

    Three principles for the progress of immersive technologies in healthcare training and education

    Get PDF

    Impact of AlphaFold on Structure Prediction of Protein Complexes: The CASP15-CAPRI Experiment

    Get PDF
    We present the results for CAPRI Round 54, the 5th joint CASP-CAPRI protein assembly prediction challenge. The Round offered 37 targets, including 14 homo-dimers, 3 homo-trimers, 13 hetero-dimers including 3 antibody-antigen complexes, and 7 large assemblies. On average ~70 CASP and CAPRI predictor groups, including more than 20 automatics servers, submitted models for each target. A total of 21941 models submitted by these groups and by 15 CAPRI scorer groups were evaluated using the CAPRI model quality measures and the DockQ score consolidating these measures. The prediction performance was quantified by a weighted score based on the number of models of acceptable quality or higher submitted by each group among their 5 best models. Results show substantial progress achieved across a significant fraction of the 60+ participating groups. High-quality models were produced for about 40% for the targets compared to 8% two years earlier, a remarkable improvement resulting from the wide use of the AlphaFold2 and AlphaFold-Multimer software. Creative use was made of the deep learning inference engines affording the sampling of a much larger number of models and enriching the multiple sequence alignments with sequences from various sources. Wide use was also made of the AlphaFold confidence metrics to rank models, permitting top performing groups to exceed the results of the public AlphaFold-Multimer version used as a yard stick. This notwithstanding, performance remained poor for complexes with antibodies and nanobodies, where evolutionary relationships between the binding partners are lacking, and for complexes featuring conformational flexibility, clearly indicating that the prediction of protein complexes remains a challenging problem

    Impact of AlphaFold on structure prediction of protein complexes: The CASP15-CAPRI experiment

    Get PDF
    We present the results for CAPRI Round 54, the 5th joint CASP-CAPRI protein assembly prediction challenge. The Round offered 37 targets, including 14 homodimers, 3 homo-trimers, 13 heterodimers including 3 antibody-antigen complexes, and 7 large assemblies. On average ~70 CASP and CAPRI predictor groups, including more than 20 automatics servers, submitted models for each target. A total of 21 941 models submitted by these groups and by 15 CAPRI scorer groups were evaluated using the CAPRI model quality measures and the DockQ score consolidating these measures. The prediction performance was quantified by a weighted score based on the number of models of acceptable quality or higher submitted by each group among their five best models. Results show substantial progress achieved across a significant fraction of the 60+ participating groups. High-quality models were produced for about 40% of the targets compared to 8% two years earlier. This remarkable improvement is due to the wide use of the AlphaFold2 and AlphaFold2-Multimer software and the confidence metrics they provide. Notably, expanded sampling of candidate solutions by manipulating these deep learning inference engines, enriching multiple sequence alignments, or integration of advanced modeling tools, enabled top performing groups to exceed the performance of a standard AlphaFold2-Multimer version used as a yard stick. This notwithstanding, performance remained poor for complexes with antibodies and nanobodies, where evolutionary relationships between the binding partners are lacking, and for complexes featuring conformational flexibility, clearly indicating that the prediction of protein complexes remains a challenging problem

    1974-11-22 Kentucky Composers and Performers: Alex Holland

    No full text
    Kentucky Composers and Performers, with host Frederick Mueller and this production featuring Alex Holland recorded on November 22, 1974 and aired on December 4, 1974

    DIPS-Plus: The enhanced database of interacting protein structures for interface prediction

    No full text
    Abstract In this work, we expand on a dataset recently introduced for protein interface prediction (PIP), the Database of Interacting Protein Structures (DIPS), to present DIPS-Plus, an enhanced, feature-rich dataset of 42,112 complexes for machine learning of protein interfaces. While the original DIPS dataset contains only the Cartesian coordinates for atoms contained in the protein complex along with their types, DIPS-Plus contains multiple residue-level features including surface proximities, half-sphere amino acid compositions, and new profile hidden Markov model (HMM)-based sequence features for each amino acid, providing researchers a curated feature bank for training protein interface prediction methods. We demonstrate through rigorous benchmarks that training an existing state-of-the-art (SOTA) model for PIP on DIPS-Plus yields new SOTA results, surpassing the performance of some of the latest models trained on residue-level and atom-level encodings of protein complexes to date
    corecore