73 research outputs found

    Do Natural Proteins Differ from Random Sequences Polypeptides? Natural vs. Random Proteins Classification Using an Evolutionary Neural Network

    Get PDF
    Are extant proteins the exquisite result of natural selection or are they random sequences slightly edited by evolution? This question has puzzled biochemists for long time and several groups have addressed this issue comparing natural protein sequences to completely random ones coming to contradicting conclusions. Previous works in literature focused on the analysis of primary structure in an attempt to identify possible signature of evolutionary editing. Conversely, in this work we compare a set of 762 natural proteins with an average length of 70 amino acids and an equal number of completely random ones of comparable length on the basis of their structural features. We use an ad hoc Evolutionary Neural Network Algorithm (ENNA) in order to assess whether and to what extent natural proteins are edited from random polypeptides employing 11 different structure-related variables (i.e. net charge, volume, surface area, coil, alpha helix, beta sheet, percentage of coil, percentage of alpha helix, percentage of beta sheet, percentage of secondary structure and surface hydrophobicity). The ENNA algorithm is capable to correctly distinguish natural proteins from random ones with an accuracy of 94.36%. Furthermore, we study the structural features of 32 random polypeptides misclassified as natural ones to unveil any structural similarity to natural proteins. Results show that random proteins misclassified by the ENNA algorithm exhibit a significant fold similarity to portions or subdomains of extant proteins at atomic resolution. Altogether, our results suggest that natural proteins are significantly edited from random polypeptides and evolutionary editing can be readily detected analyzing structural features. Furthermore, we also show that the ENNA, employing simple structural descriptors, can predict whether a protein chain is natural or random

    Explorative visual analytics on interval-based genomic data and their metadata

    Get PDF
    Background: With the wide-spreading of public repositories of NGS processed data, the availability of user-friendly and effective tools for data exploration, analysis and visualization is becoming very relevant. These tools enable interactive analytics, an exploratory approach for the seamless "sense-making" of data through on-the-fly integration of analysis and visualization phases, suggested not only for evaluating processing results, but also for designing and adapting NGS data analysis pipelines. Results: This paper presents abstractions for supporting the early analysis of NGS processed data and their implementation in an associated tool, named GenoMetric Space Explorer (GeMSE). This tool serves the needs of the GenoMetric Query Language, an innovative cloud-based system for computing complex queries over heterogeneous processed data. It can also be used starting from any text files in standard BED, BroadPeak, NarrowPeak, GTF, or general tab-delimited format, containing numerical features of genomic regions; metadata can be provided as text files in tab-delimited attribute-value format. GeMSE allows interactive analytics, consisting of on-the-fly cycling among steps of data exploration, analysis and visualization that help biologists and bioinformaticians in making sense of heterogeneous genomic datasets. By means of an explorative interaction support, users can trace past activities and quickly recover their results, seamlessly going backward and forward in the analysis steps and comparative visualizations of heatmaps. Conclusions: GeMSE effective application and practical usefulness is demonstrated through significant use cases of biological interest. GeMSE is available at http://www.bioinformatics.deib.polimi.it/GeMSE/ , and its source code is available at https://github.com/Genometric/GeMSEunder GPLv3 open-source license

    Integrative Gene Regulatory Network Analysis Reveals Light-Induced Regional Gene Expression Phase Shift Programs in the Mouse Suprachiasmatic Nucleus

    Get PDF
    We use the multigenic pattern of gene expression across suprachiasmatic nuclei (SCN) regions and time to understand the dynamics within the SCN in response to a circadian phase-resetting light pulse. Global gene expression studies of the SCN indicate that circadian functions like phase resetting are complex multigenic processes. While the molecular dynamics of phase resetting are not well understood, it is clear they involve a “functional gene expression program”, e.g., the coordinated behavior of functionally related genes in space and time. In the present study we selected a set of 89 of these functionally related genes in order to further understand this multigenic program. By use of high-throughput qPCR we studied 52 small samples taken by anatomically precise laser capture from within the core and shell SCN regions, and taken at time points with and without phase resetting light exposure. The results show striking regional differences in light response to be present in the mouse SCN. By using network-based analyses, we are able to establish a highly specific multigenic correlation between genes expressed in response to light at night and genes normally activated during the day. The light pulse triggers a complex and highly coordinated network of gene regulation. The largest differences marking neuroanatomical location are in transmitter receptors, and the largest time-dependent differences occur in clock-related genes. Nighttime phase resetting appears to recruit transcriptional regulatory processes normally active in the day. This program, or mechanism, causes the pattern of core region gene expression to transiently shift to become more like that of the shell region

    Escherichia coli genome-wide promoter analysis: Identification of additional AtoC binding target elements

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Studies on bacterial signal transduction systems have revealed complex networks of functional interactions, where the response regulators play a pivotal role. The AtoSC system of <it>E. coli </it>activates the expression of <it>atoDAEB </it>operon genes, and the subsequent catabolism of short-chain fatty acids, upon acetoacetate induction. Transcriptome and phenotypic analyses suggested that <it>atoSC </it>is also involved in several other cellular activities, although we have recently reported a palindromic repeat within the <it>atoDAEB </it>promoter as the single, <it>cis</it>-regulatory binding site of the AtoC response regulator. In this work, we used a computational approach to explore the presence of yet unidentified AtoC binding sites within other parts of the <it>E. coli </it>genome.</p> <p>Results</p> <p>Through the implementation of a computational <it>de novo </it>motif detection workflow, a set of candidate motifs was generated, representing putative AtoC binding targets within the <it>E. coli </it>genome. In order to assess the biological relevance of the motifs and to select for experimental validation of those sequences related robustly with distinct cellular functions, we implemented a novel approach that applies Gene Ontology Term Analysis to the motif hits and selected those that were qualified through this procedure. The computational results were validated using Chromatin Immunoprecipitation assays to assess the <it>in vivo </it>binding of AtoC to the predicted sites. This process verified twenty-two additional AtoC binding sites, located not only within intergenic regions, but also within gene-encoding sequences.</p> <p>Conclusions</p> <p>This study, by tracing a number of putative AtoC binding sites, has indicated an AtoC-related cross-regulatory function. This highlights the significance of computational genome-wide approaches in elucidating complex patterns of bacterial cell regulation.</p

    Proteomics-Based Systems Biology Modeling of Bovine Germinal Vesicle Stage Oocyte and Cumulus Cell Interaction

    Get PDF
    BACKGROUND: Oocytes are the female gametes which establish the program of life after fertilization. Interactions between oocyte and the surrounding cumulus cells at germinal vesicle (GV) stage are considered essential for proper maturation or 'programming' of oocytes, which is crucial for normal fertilization and embryonic development. However, despite its importance, little is known about the molecular events and pathways involved in this bidirectional communication. METHODOLOGY/PRINCIPAL FINDINGS: We used differential detergent fractionation multidimensional protein identification technology (DDF-Mud PIT) on bovine GV oocyte and cumulus cells and identified 811 and 1247 proteins in GV oocyte and cumulus cells, respectively; 371 proteins were significantly differentially expressed between each cell type. Systems biology modeling, which included Gene Ontology (GO) and canonical genetic pathway analysis, showed that cumulus cells have higher expression of proteins involved in cell communication, generation of precursor metabolites and energy, as well as transport than GV oocytes. Our data also suggests a hypothesis that oocytes may depend on the presence of cumulus cells to generate specific cellular signals to coordinate their growth and maturation. CONCLUSIONS/SIGNIFICANCE: Systems biology modeling of bovine oocytes and cumulus cells in the context of GO and protein interaction networks identified the signaling pathways associated with the proteins involved in cell-to-cell signaling biological process that may have implications in oocyte competence and maturation. This first comprehensive systems biology modeling of bovine oocytes and cumulus cell proteomes not only provides a foundation for signaling and cell physiology at the GV stage of oocyte development, but are also valuable for comparative studies of other stages of oocyte development at the molecular level

    Spinning Gland Transcriptomics from Two Main Clades of Spiders (Order: Araneae) - Insights on Their Molecular, Anatomical and Behavioral Evolution

    Get PDF
    Characterized by distinctive evolutionary adaptations, spiders provide a comprehensive system for evolutionary and developmental studies of anatomical organs, including silk and venom production. Here we performed cDNA sequencing using massively parallel sequencers (454 GS-FLX Titanium) to generate ∼80,000 reads from the spinning gland of Actinopus spp. (infraorder: Mygalomorphae) and Gasteracantha cancriformis (infraorder: Araneomorphae, Orbiculariae clade). Actinopus spp. retains primitive characteristics on web usage and presents a single undifferentiated spinning gland while the orbiculariae spiders have seven differentiated spinning glands and complex patterns of web usage. MIRA, Celera Assembler and CAP3 software were used to cluster NGS reads for each spider. CAP3 unigenes passed through a pipeline for automatic annotation, classification by biological function, and comparative transcriptomics. Genes related to spider silks were manually curated and analyzed. Although a single spidroin gene family was found in Actinopus spp., a vast repertoire of specialized spider silk proteins was encountered in orbiculariae. Astacin-like metalloproteases (meprin subfamily) were shown to be some of the most sampled unigenes and duplicated gene families in G. cancriformis since its evolutionary split from mygalomorphs. Our results confirm that the evolution of the molecular repertoire of silk proteins was accompanied by the (i) anatomical differentiation of spinning glands and (ii) behavioral complexification in the web usage. Finally, a phylogenetic tree was constructed to cluster most of the known spidroins in gene clades. This is the first large-scale, multi-organism transcriptome for spider spinning glands and a first step into a broad understanding of spider web systems biology and evolution

    Functional transcription factor target discovery via compendia of binding and expression profiles

    Get PDF
    Genome-wide experiments to map the DNA-binding locations of transcription-associated factors (TFs) have shown that the number of genes bound by a TF far exceeds the number of possible direct target genes. Distinguishing functional from non-functional binding is therefore a major challenge in the study of transcriptional regulation. We hypothesized that functional targets can be discovered by correlating binding and expression profiles across multiple experimental conditions. To test this hypothesis, we obtained ChIP-seq and RNA-seq data from matching cell types from the human ENCODE resource, considered promoter-proximal and distal cumulative regulatory models to map binding sites to genes, and used a combination of linear and non-linear measures to correlate binding and expression data. We found that a high degree of correlation between a gene's TF-binding and expression profiles was significantly more predictive of the gene being differentially expressed upon knockdown of that TF, compared to using binding sites in the cell type of interest only. Remarkably, TF targets predicted from correlation across a compendium of cell types were also predictive of functional targets in other cell types. Finally, correlation across a time course of ChIP-seq and RNA-seq experiments was also predictive of functional TF targets in that tissue.Comment: 15 pages + 8 pages supplementary material; 6 figures, 6 supplementary figures, 5 supplementary table
    corecore