86 research outputs found
The Escherichia coli transcriptome mostly consists of independently regulated modules
Underlying cellular responses is a transcriptional regulatory network (TRN) that modulates gene expression. A useful description of the TRN would decompose the transcriptome into targeted effects of individual transcriptional regulators. Here, we apply unsupervised machine learning to a diverse compendium of over 250 high-quality Escherichia coli RNA-seq datasets to identify 92 statistically independent signals that modulate the expression of specific gene sets. We show that 61 of these transcriptomic signals represent the effects of currently characterized transcriptional regulators. Condition-specific activation of signals is validated by exposure of E. coli to new environmental conditions. The resulting decomposition of the transcriptome provides: a mechanistic, systems-level, network-based explanation of responses to environmental and genetic perturbations; a guide to gene and regulator function discovery; and a basis for characterizing transcriptomic differences in multiple strains. Taken together, our results show that signal summation describes the composition of a model prokaryotic transcriptome
Co- and post-translational translocation through the protein-conducting channel:analogous mechanisms at work?
Many proteins are translocated across, or integrated into, membranes. Both functions are fulfilled by the 'translocon/translocase', which contains a membrane-embedded proteinconducting channel (PCC) and associated soluble factors that drive translocation and insertion reactions using nucleotide triphosphates as fuel. This perspective focuses on reinterpreting existing experimental data in light of a recently proposed PCC model comprising a front-to-front dimer of SecY or Sec61 heterotrimeric complexes. In this new framework, we propose (i) a revised model for SRP-SR-mediated docking of the ribosome-nascent polypeptide to the PCC; (ii) that the dynamic interplay between protein substrate, soluble factors and PCC controls the opening and closing of a transmembrane channel across, and/or a lateral gate into, the membrane; and (iii) that co-and post-translational translocation, involving the ribosome and SecA, respectively, not only converge at the PCC but also use analogous mechanisms for coordinating protein translocation
Factor analysis for gene regulatory networks and transcription factor activity profiles
BACKGROUND: Most existing algorithms for the inference of the structure of gene regulatory networks from gene expression data assume that the activity levels of transcription factors (TFs) are proportional to their mRNA levels. This assumption is invalid for most biological systems. However, one might be able to reconstruct unobserved activity profiles of TFs from the expression profiles of target genes. A simple model is a two-layer network with unobserved TF variables in the first layer and observed gene expression variables in the second layer. TFs are connected to regulated genes by weighted edges. The weights, known as factor loadings, indicate the strength and direction of regulation. Of particular interest are methods that produce sparse networks, networks with few edges, since it is known that most genes are regulated by only a small number of TFs, and most TFs regulate only a small number of genes. RESULTS: In this paper, we explore the performance of five factor analysis algorithms, Bayesian as well as classical, on problems with biological context using both simulated and real data. Factor analysis (FA) models are used in order to describe a larger number of observed variables by a smaller number of unobserved variables, the factors, whereby all correlation between observed variables is explained by common factors. Bayesian FA methods allow one to infer sparse networks by enforcing sparsity through priors. In contrast, in the classical FA, matrix rotation methods are used to enforce sparsity and thus to increase the interpretability of the inferred factor loadings matrix. However, we also show that Bayesian FA models that do not impose sparsity through the priors can still be used for the reconstruction of a gene regulatory network if applied in conjunction with matrix rotation methods. Finally, we show the added advantage of merging the information derived from all algorithms in order to obtain a combined result. CONCLUSION: Most of the algorithms tested are successful in reconstructing the connectivity structure as well as the TF profiles. Moreover, we demonstrate that if the underlying network is sparse it is still possible to reconstruct hidden activity profiles of TFs to some degree without prior connectivity information
Domain Organization of Long Signal Peptides of Single-Pass Integral Membrane Proteins Reveals Multiple Functional Capacity
Targeting signals direct proteins to their extra - or intracellular destination such as the plasma membrane or cellular organelles. Here we investigated the structure and function of exceptionally long signal peptides encompassing at least 40 amino acid residues. We discovered a two-domain organization (“NtraC model”) in many long signals from vertebrate precursor proteins. Accordingly, long signal peptides may contain an N-terminal domain (N-domain) and a C-terminal domain (C-domain) with different signal or targeting capabilities, separable by a presumably turn-rich transition area (tra). Individual domain functions were probed by cellular targeting experiments with fusion proteins containing parts of the long signal peptide of human membrane protein shrew-1 and secreted alkaline phosphatase as a reporter protein. As predicted, the N-domain of the fusion protein alone was shown to act as a mitochondrial targeting signal, whereas the C-domain alone functions as an export signal. Selective disruption of the transition area in the signal peptide impairs the export efficiency of the reporter protein. Altogether, the results of cellular targeting studies provide a proof-of-principle for our NtraC model and highlight the particular functional importance of the predicted transition area, which critically affects the rate of protein export. In conclusion, the NtraC approach enables the systematic detection and prediction of cryptic targeting signals present in one coherent sequence, and provides a structurally motivated basis for decoding the functional complexity of long protein targeting signals
PROlocalizer: integrated web service for protein subcellular localization prediction
Subcellular localization is an important protein property, which is related to function, interactions and other features. As experimental determination of the localization can be tedious, especially for large numbers of proteins, a number of prediction tools have been developed. We developed the PROlocalizer service that integrates 11 individual methods to predict altogether 12 localizations for animal proteins. The method allows the submission of a number of proteins and mutations and generates a detailed informative document of the prediction and obtained results. PROlocalizer is available at http://bioinf.uta.fi/PROlocalizer/
The Plasmodium Export Element Revisited
We performed a bioinformatical analysis of protein export elements (PEXEL) in the putative proteome of the malaria parasite Plasmodium falciparum. A protein family-specific conservation of physicochemical residue profiles was found for PEXEL-flanking sequence regions. We demonstrate that the family members can be clustered based on the flanking regions only and display characteristic hydrophobicity patterns. This raises the possibility that the flanking regions may contain additional information for a family-specific role of PEXEL. We further show that signal peptide cleavage results in a positional alignment of PEXEL from both proteins with, and without, a signal peptide
A Metabolomic Approach to the Study of Wine Micro-Oxygenation
Wine micro-oxygenation is a globally used treatment and its effects were studied here by analysing by untargeted LC-MS the wine metabolomic fingerprint. Eight different procedural variations, marked by the addition of oxygen (four levels) and iron (two levels) were applied to Sangiovese wine, before and after malolactic fermentation
Large-Scale Discovery and Characterization of Protein Regulatory Motifs in Eukaryotes
The increasing ability to generate large-scale, quantitative proteomic data has brought with it the challenge of analyzing such data to discover the sequence elements that underlie systems-level protein behavior. Here we show that short, linear protein motifs can be efficiently recovered from proteome-scale datasets such as sub-cellular localization, molecular function, half-life, and protein abundance data using an information theoretic approach. Using this approach, we have identified many known protein motifs, such as phosphorylation sites and localization signals, and discovered a large number of candidate elements. We estimate that ∼80% of these are novel predictions in that they do not match a known motif in both sequence and biological context, suggesting that post-translational regulation of protein behavior is still largely unexplored. These predicted motifs, many of which display preferential association with specific biological pathways and non-random positioning in the linear protein sequence, provide focused hypotheses for experimental validation
Expression and Characterization of Drosophila Signal Peptide Peptidase-Like (sppL), a Gene That Encodes an Intramembrane Protease
Intramembrane proteases of the Signal Peptide Peptidase (SPP) family play important roles in developmental, metabolic and signaling pathways. Although vertebrates have one SPP and four SPP-like (SPPL) genes, we found that insect genomes encode one Spp and one SppL. Characterization of the Drosophila sppL gene revealed that the predicted SppL protein is a highly conserved structural homolog of the vertebrate SPPL3 proteases, with a predicted nine-transmembrane topology, an active site containing aspartyl residues within a transmembrane region, and a carboxy-terminal PAL domain. SppL protein localized to both the Golgi and ER. Whereas spp is an essential gene that is required during early larval stages and whereas spp loss-of-function reduced the unfolded protein response (UPR), sppL loss of function had no apparent phenotype. This was unexpected given that genetic knockdown phenotypes in other organisms suggested significant roles for Spp-related proteases
Identification of Single- and Multiple-Class Specific Signature Genes from Gene Expression Profiles by Group Marker Index
Informative genes from microarray data can be used to construct prediction model and investigate biological mechanisms. Differentially expressed genes, the main targets of most gene selection methods, can be classified as single- and multiple-class specific signature genes. Here, we present a novel gene selection algorithm based on a Group Marker Index (GMI), which is intuitive, of low-computational complexity, and efficient in identification of both types of genes. Most gene selection methods identify only single-class specific signature genes and cannot identify multiple-class specific signature genes easily. Our algorithm can detect de novo certain conditions of multiple-class specificity of a gene and makes use of a novel non-parametric indicator to assess the discrimination ability between classes. Our method is effective even when the sample size is small as well as when the class sizes are significantly different. To compare the effectiveness and robustness we formulate an intuitive template-based method and use four well-known datasets. We demonstrate that our algorithm outperforms the template-based method in difficult cases with unbalanced distribution. Moreover, the multiple-class specific genes are good biomarkers and play important roles in biological pathways. Our literature survey supports that the proposed method identifies unique multiple-class specific marker genes (not reported earlier to be related to cancer) in the Central Nervous System data. It also discovers unique biomarkers indicating the intrinsic difference between subtypes of lung cancer. We also associate the pathway information with the multiple-class specific signature genes and cross-reference to published studies. We find that the identified genes participate in the pathways directly involved in cancer development in leukemia data. Our method gives a promising way to find genes that can involve in pathways of multiple diseases and hence opens up the possibility of using an existing drug on other diseases as well as designing a single drug for multiple diseases
- …
