13 research outputs found

    DeepRibo : a neural network for precise gene annotation of prokaryotes by combining ribosome profiling signal and binding site patterns

    Get PDF
    Annotation of gene expression in prokaryotes of-ten finds itself corrected due to small variations ofthe annotated gene regions observed between differ-ent (sub)-species. It has become apparent that tradi-tional sequence alignment algorithms, used for thecuration of genomes, are not able to map the fullcomplexity of the genomic landscape. We presentDeepRibo, a novel neural network utilizing featuresextracted from ribosome profiling information andbinding site sequence patterns that shows to be aprecise tool for the delineation and annotation of ex-pressed genes in prokaryotes. The neural networkcombines recurrent memory cells and convolutionallayers, adapting the information gained from boththe high-throughput ribosome profiling data and ri-bosome binding translation initiation sequence re-gion into one model. DeepRibo is designed as a sin-gle model trained on a variety of ribosome profil-ing experiments, used for the identification of openreading frames in prokaryotes withoutaprioriknowl-edge of the translational landscape. Through exten-sive validation of the model trained on various setsof data, multiple species sequence similarity, massspectrometry and Edman degradation verified pro-teins, the effectiveness of DeepRibo is highlighted

    Predictive design of sigma factor-specific promoters

    Get PDF
    To engineer synthetic gene circuits, molecular building blocks are developed which can modulate gene expression without interference, mutually or with the host's cell machinery. As the complexity of gene circuits increases, automated design tools and tailored building blocks to ensure perfect tuning of all components in the network are required. Despite the efforts to develop prediction tools that allow forward engineering of promoter transcription initiation frequency (TIF), such a tool is still lacking. Here, we use promoter libraries of E. coli sigma factor 70 (sigma (70))- and B. subtilis sigma (B)-, sigma (F)- and sigma (W)-dependent promoters to construct prediction models, capable of both predicting promoter TIF and orthogonality of the sigma -specific promoters. This is achieved by training a convolutional neural network with high-throughput DNA sequencing data from fluorescence-activated cell sorted promoter libraries. This model functions as the base of the online promoter design tool (ProD), providing tailored promoters for tailored genetic systems. Automated design tools and tailored subunits are beneficial in fine-tuning all components of a complex genetic circuit. Here the authors create E. coli and B. subtilis promoter libraries using FACS and HTS, from which an online promoter design tool has been developed using CNN

    Deep learning techniques for genome processing and annotation tasks in prokaryotes

    No full text

    Novel transformer networks for improved sequence labeling in genomics

    No full text
    Glycosylation of proteins in eukaryote cells is an important and complicated post-translation modification due to its pivotal role and association with crucial physiological functions within most of the proteins. Identification of glycosylation sites in a polypeptide chain is not an easy task due to multiple impediments. Analytical identification of these sites is expensive and laborious. There is a dire need to develop a reliable computational method for precise determination of such sites which can help researchers to save time and effort. Herein, we propose a novel predictor namely iGlycoS-PseAAC by integrating the Chou's Pseudo Amino Acid Composition (PseAAC) and relative/absolute position-based features. The self-consistency results show that the accuracy revealed by the model using the benchmark dataset for prediction of O-linked glycosylation having serine sites is 98.8 percent. The overall accuracy of predictor achieved through 10-fold cross validation by combining the positive and negative results is 97.2 percent. The overall accuracy achieved through Jackknife test is 96.195 percent by aggregating of all the prediction results. Thus the proposed predictor can help in predicting the O-linked glycosylated serine sites in an efficient and accurate way. The overall results show that the accuracy of the iGlycoS-PseAAC is higher than the existing tools

    Explainability in transformer models for functional genomics

    No full text
    The effectiveness of deep learning methods can be largely attributed to the automated extraction of relevant features from raw data. In the field of functional genomics, this generally concerns the automatic selection of relevant nucleotide motifs from DNA sequences. To benefit from automated learning methods, new strategies are required that unveil the decision-making process of trained models. In this paper, we present a new approach that has been successful in gathering insights on the transcription process in Escherichia coli. This work builds upon a transformer-based neural network framework designed for prokaryotic genome annotation purposes. We find that the majority of subunits (attention heads) of the model are specialized towards identifying transcription factors and are able to successfully characterize both their binding sites and consensus sequences, uncovering both well-known and potentially novel elements involved in the initiation of the transcription process. With the specialization of the attention heads occurring automatically, we believe transformer models to be of high interest towards the creation of explainable neural networks in this field

    CpG transformer for imputation of single-cell methylomes

    No full text
    Motivation: The adoption of current single-cell DNA methylation sequencing protocols is hindered by incomplete coverage, outlining the need for effective imputation techniques. The task of imputing single-cell (methylation) data requires models to build an understanding of underlying biological processes. Results: We adapt the transformer neural network architecture to operate on methylation matrices through combining axial attention with sliding window self-attention. The obtained CpG Transformer displays state-of-the-art performances on a wide range of scBS-seq and scRRBS-seq datasets. Furthermore, we demonstrate the interpretability of CpG Transformer and illustrate its rapid transfer learning properties, allowing practitioners to train models on new datasets with a limited computational and time budget

    The quest for deus ex machina : harnessing the power of machine learning for synthetic biology

    No full text
    Machine learning is nowadays an ever-present part of many aspects of modern life and has increasingly been used in the field of synthetic biology as well. Examples of studies that successfully harnessed the superb pattern recognition abilities of machine learning algorithms range from a molecular to a big-scale production level and include the prediction of transcription factor activity, enzyme expression balancing, gene annotation and the prediction of production parameters. However, a closer look reveals the lack of a standard for such published models, with no known guidelines on what metrics and analyses should be included in a publication. Studies are often highlighting the fact that the community needs more data in machine-readable format, but it has rarely been discussed how such a study should be conducted and analyzed to obtain a meaningful, robust, high quality and predictive model. Here, we present a guideline specifically aimed at synthetic biologists who wish to use machine learning in their research, in particular on smaller, in-house collected datasets. We discuss key aspects on how to evaluate and interpret a model’s performance with focus on regression, and common problems and pitfalls that arise during the workflow. Together with the increasing availability of vast datasets, the implementation of such guidelines can contribute to the strive for standardization and strong application of engineering principles in the synthetic biology community
    corecore