8 research outputs found

    EnzyNet: enzyme classification using 3D convolutional neural networks on spatial representation

    Full text link
    During the past decade, with the significant progress of computational power as well as ever-rising data availability, deep learning techniques became increasingly popular due to their excellent performance on computer vision problems. The size of the Protein Data Bank has increased more than 15 fold since 1999, which enabled the expansion of models that aim at predicting enzymatic function via their amino acid composition. Amino acid sequence however is less conserved in nature than protein structure and therefore considered a less reliable predictor of protein function. This paper presents EnzyNet, a novel 3D-convolutional neural networks classifier that predicts the Enzyme Commission number of enzymes based only on their voxel-based spatial structure. The spatial distribution of biochemical properties was also examined as complementary information. The 2-layer architecture was investigated on a large dataset of 63,558 enzymes from the Protein Data Bank and achieved an accuracy of 78.4% by exploiting only the binary representation of the protein shape. Code and datasets are available at https://github.com/shervinea/enzynet.Comment: 11 pages, 6 figure

    Exploraci贸n de m茅todos de clasificaci贸n de prote铆nas repetidas basado en su informaci贸n estructural utilizando aprendizaje de m谩quina

    Get PDF
    En la actualidad, existen m茅todos complejos para la clasificaci贸n e identificaci贸n de prote铆nas repetidas a partir de su estructura, los cuales implican un uso intenso y costoso de recursos computacionales. Debido a ello, en el presente trabajo de investigaci贸n se busca explorar soluciones alternativas y complementarias a otros sistemas en la etapa de clasificaci贸n de prote铆nas repetidas con t茅cnicas del 谩rea de estudio de aprendizaje de m谩quina. Estas t茅cnicas son conocidas por ser efectivas y r谩pidas para la sistematizaci贸n de varios procedimientos de clasificaci贸n, segmentaci贸n y transformaci贸n de datos con la condici贸n de que se disponga de una cantidad considerable de datos. De esa forma, en consecuencia de la cantidad de datos estructurales que se han generado en los 煤ltimos a帽os en el ambito de las prote铆nas y las prote铆nas repetidas, es posible utilizar t茅cnicas de aprendizaje de m谩quina para la clasificaci贸n de las mismas. Por ello, en este trabajo, a partir de un an谩lisis a los datos que se poseen en la actualidad y una revisi贸n sistem谩tica de la literatura, se proponen posibles soluciones que utilizan aprendizaje de m谩quina para la clasificaci贸n automatizada y r谩pida de prote铆nas repetidas a partir de su estructura. De estas posibles soluciones, se concluye que es posible la implementaci贸n de un clasificador con m煤ltiples entradas utilizando informaci贸n de los 谩ngulos de torsi贸n y distancia entre amino谩cidos de una prote铆na, la cual va a ser implementada y evaluada en un trabajo futuro.Trabajo de investigaci贸

    Predicting environmentally responsive transgenerational differential DNA methylated regions (epimutations) in the genome using a hybrid deep-machine learning approach

    Get PDF
    Background Deep learning is an active bioinformatics artificial intelligence field that is useful in solving many biological problems, including predicting altered epigenetics such as DNA methylation regions. Deep learning (DL) can learn an informative representation that addresses the need for defining relevant features. However, deep learning models are computationally expensive, and they require large training datasets to achieve good classification performance. Results One approach to addressing these challenges is to use a less complex deep learning network for feature selection and Machine Learning (ML) for classification. In the current study, we introduce a hybrid DL-ML approach that uses a deep neural network for extracting molecular features and a non-DL classifier to predict environmentally responsive transgenerational differential DNA methylated regions (DMRs), termed epimutations, based on the extracted DL-based features. Various environmental toxicant induced epigenetic transgenerational inheritance sperm epimutations were used to train the model on the rat genome DNA sequence and use the model to predict transgenerational DMRs (epimutations) across the entire genome. Conclusion The approach was also used to predict potential DMRs in the human genome. Experimental results show that the hybrid DL-ML approach outperforms deep learning and traditional machine learning methods

    Prediction of protein function using a deep convolutional neural network ensemble

    Get PDF
    International audienceBackground. The availability of large databases containing high resolution three-dimensional (3D) models of proteins in conjunction with functional annotation allows the exploitation of advanced supervised machine learning techniques for automatic protein function prediction. Methods. In this work, novel shape features are extracted representing protein structure in the form of local (per amino acid) distribution of angles and amino acid distances, respectively. Each of the multi-channel feature maps is introduced into a deep convolutional neural network (CNN) for function prediction and the outputs are fused through support vector machines or a correlation-based k-nearest neighbor classifier. Two different architectures are investigated employing either one CNN per multi-channel feature set, or one CNN per image channel. Results. Cross validation experiments on single-functional enzymes (n = 44,661) from the PDB database achieved 90.1% correct classification, demonstrating an improvement over previous results on the same dataset when sequence similarity was not considered. Discussion. The automatic prediction of protein function can provide quick annotations on extensive datasets opening the path for relevant applications, such as pharmacological target identification. The proposed method shows promise for structure-based protein function prediction, but sufficient data may not yet be available to properly assess the method's performance on non-homologous proteins and thus reduce the confounding factor of evolutionary relationships
    corecore