12 research outputs found

    Computational analysis of the interaction between the transcriptional factors and the predicted secreted proteome of the yeast Kluyveromyces lactis

    No full text
    O banco de dados da Kluyveromyces lactis constituído de 5327 seqüências de proteínas (http://cbi.labri.fr/Genolevures) foi submetido a quatro algoritmos de predição para identificar o potencial secretome extracelular. O primeiro,SignalP v3 (http://www.cbs.dtu.dk/services/SignalP-3.0), que identifica a presença de peptideo sinal na porção N-terminal e o sítio de clivagem da peptidase sinalagrupou 698 proteínas.Deste grupo, o Phobius(http://phobius.sbc.su.se), que prevê a topologia de domínios transmembranas a partir das sequencias primárias, indicou 260 sem domínios transmembranas.Outros dois algoritmos, big-PI predictor(http://mendel.imp.ac.at/gpi/gpi_server.html),capaz reconhecer marcas de de ancoras GPI (Glicosilfosfatidilinositol) e WoLF PSORT(http://www.genscript.com/psort/wolf_psort.html)capaz de identificar assinaturas para a localização em compartimentos subcelulares apontaram 236 proteínas sem ancoras GPI e 101 endereçadas ao meio extracelular. Como controle positivo, os mesmos algoritmos foram testados e predisseram corretamente 95 proteínas de leveduras Saccharomycetes encontradas nos bancos de dados públicos (NCBI e UNIProt) e anotadas como extracelulares. Como controle negativo foram preditas como intracelular 95 seqüências aleatórias do banco de dados da K. lactis. O grupo controle positivo e o grupo predito foram comparados pelo teste estatístico T2 de Hotelling. Não foram evidenciadas diferenças significativas entre os valores das médias dos grupos.A condição fisiológicana qual estas proteínas extracelulares são expressas foi analisada relacionando suas seqüências promotoras com osfatores transcricionais ortólogos da Saccharomyces cerevisiae. A metodologia aplicada foi o "Yeastract" (http://www.yeastract.com) que localiza sítios de ligação ao DNA dos fatores transcricionais de S. cerevisiaenas seqüências promotoras dos ORFs das proteínas preditas como extracelulares. A condição fisiológica que favorece a expressão para o meio extracelular foi obtida pela pesquisa dos termos descritos pelo "Gene Ontology"(http://www.geneontology.org). Os fatores transcricionais que mais se relacionam com as seqüências preditas foram aqueles associados com resposta a estresse. Também foi indicado que o estresse ácido e limitação de nitrogênio (aminoácidos) exercem influência na expressão das proteínas extracelulares.In this work we have created an in silico system to address secretion of a desired protein among the genome data of Kluyveromyces lactis. The completed K. lactis genome sequencing has provided a tool to construct such a system. In order to explore a potential K. lactis extracellular secretome, four computational prediction algorithms have been applied: SignalP (presence or absence of an N-terminal signal peptide and clivage site), Phobius (transmembrane topology), big-PI Predictor (GPI modification site) and WolfPsort (subcellular addressing, including extracellular prediction). These algorithms have correctly predicted 95 yeast secreted proteins sought in public databases (NCBI, UNIProt and MIPS). They have also predicted as intracellular the same number (i.e. 95) of random sequences found in K. lactis database. The K. lactis database consists of 5327 sequences (http://cbi.labri.fr/Genolevures). When analyzed by SignalP 3.0, it has pointed out 698 putative proteins with N-terminal signal peptides. In this group, 260 were predicted by Phobius to have no transmembrane domains and 236 were found by the big-PI Predictor to have no GPI modifications site. Finally, the predicted K. lactis secretome was estimated to consist of up to 101 sequences by WolfPSORT which eliminates proteins with subcellular targeting. In order to validate theses analysis, both groups of predicted and annotated extracellular ivproteins were compared by Hotelling s T2 test. The analysis has shown no differences between the mean values of these two groups. The physiological significance of those potential extracellular proteins was similarly investigated by analyzing the relationship between the S. cerevisiae transcriptional regulators ortologues in K. lactis and the putative promoters (i.e. 1 KB upstream) of those extracellular proteins. It was applied the methodology proposed by Yeastract which search for elements such as binding sites that indicates associations between transcriptional factor and target genes. The physiological condition favoring protein expression in extracellular medium was obtained by searching Gene Ontology (http://www.geneontology.org). It has been shown that most of the transcriptional regulators of K. lactis extracellular proteins are related to stress response, especially presence of drugs into the medium. Also pH stress and limiting nitrogen can induce the extracellular proteins.Conselho Nacional de Desenvolvimento Científico e Tecnológic

    Computational analysis of the interaction between the transcriptional factors and the predicted secreted proteome of the yeast Kluyveromyces lactis

    No full text
    O banco de dados da Kluyveromyces lactis constituído de 5327 seqüências de proteínas (http://cbi.labri.fr/Genolevures) foi submetido a quatro algoritmos de predição para identificar o potencial secretome extracelular. O primeiro,SignalP v3 (http://www.cbs.dtu.dk/services/SignalP-3.0), que identifica a presença de peptideo sinal na porção N-terminal e o sítio de clivagem da peptidase sinalagrupou 698 proteínas.Deste grupo, o Phobius(http://phobius.sbc.su.se), que prevê a topologia de domínios transmembranas a partir das sequencias primárias, indicou 260 sem domínios transmembranas.Outros dois algoritmos, big-PI predictor(http://mendel.imp.ac.at/gpi/gpi_server.html),capaz reconhecer marcas de de ancoras GPI (Glicosilfosfatidilinositol) e WoLF PSORT(http://www.genscript.com/psort/wolf_psort.html)capaz de identificar assinaturas para a localização em compartimentos subcelulares apontaram 236 proteínas sem ancoras GPI e 101 endereçadas ao meio extracelular. Como controle positivo, os mesmos algoritmos foram testados e predisseram corretamente 95 proteínas de leveduras Saccharomycetes encontradas nos bancos de dados públicos (NCBI e UNIProt) e anotadas como extracelulares. Como controle negativo foram preditas como intracelular 95 seqüências aleatórias do banco de dados da K. lactis. O grupo controle positivo e o grupo predito foram comparados pelo teste estatístico T2 de Hotelling. Não foram evidenciadas diferenças significativas entre os valores das médias dos grupos.A condição fisiológicana qual estas proteínas extracelulares são expressas foi analisada relacionando suas seqüências promotoras com os fatores transcricionais ortólogos da Saccharomyces cerevisiae. A metodologia aplicada foi o "Yeastract" (http://www.yeastract.com) que localiza sítios de ligação ao DNA dos fatores transcricionais de S. cerevisiaenas seqüências promotoras dos ORFs das proteínas preditas como extracelulares. A condição fisiológica que favorece a expressão para o meio extracelular foi obtida pela pesquisa dos termos descritos pelo "Gene Ontology"(http://www.geneontology.org). Os fatores transcricionais que mais se relacionam com as seqüências preditas foram aqueles associados com resposta a estresse. Também foi indicado que o estresse ácido e limitação de nitrogênio (aminoácidos) exercem influência na expressão das proteínas extracelulares.In this work we have created an in silico system to address secretion of a desired protein among the genome data of Kluyveromyces lactis. The completed K. lactis genome sequencing has provided a tool to construct such a system. In order to explore a potential K. lactis extracellular secretome, four computational prediction algorithms have been applied: SignalP (presence or absence of an N-terminal signal peptide and clivage site), Phobius (transmembrane topology), big-PI Predictor (GPI modification site) and WolfPsort (subcellular addressing, including extracellular prediction). These algorithms have correctly predicted 95 yeast secreted proteins sought in public databases (NCBI, UNIProt and MIPS). They have also predicted as intracellular the same number (i.e. 95) of random sequences found in K. lactis database. The K. lactis database consists of 5327 sequences (http://cbi.labri.fr/Genolevures). When analyzed by SignalP 3.0, it has pointed out 698 putative proteins with N-terminal signal peptides. In this group, 260 were predicted by Phobius to have no transmembrane domains and 236 were found by the big-PI Predictor to have no GPI modifications site. Finally, the predicted K. lactis secretome was estimated to consist of up to 101 sequences by WolfPSORT which eliminates proteins with subcellular targeting. In order to validate theses analysis, both groups of predicted and annotated extracellular proteins were compared by Hotelling s T2 test. The analysis has shown no differences between the mean values of these two groups. The physiological significance of those potential extracellular proteins was similarly investigated by analyzing the relationship between the S. cerevisiae transcriptional regulators ortologues in K. lactis and the putative promoters (i.e. 1 KB upstream) of those extracellular proteins. It was applied the methodology proposed by Yeastract which search for elements such as binding sites that indicates associations between transcriptional factor and target genes. The physiological condition favoring protein expression in extracellular medium was obtained by searching Gene Ontology (http://www.geneontology.org). It has been shown that most of the transcriptional regulators of K. lactis extracellular proteins are related to stress response, especially presence of drugs into the medium. Also pH stress and limiting nitrogen can induce the extracellular proteins.Conselho Nacional de Desenvolvimento Científico e Tecnológic

    Expressão Diferencial Gênica (EDG) por sequenciamento de RNA e o desenvolvimento de um sistema que integra esses métodos

    No full text
    Os begomovirus são geminivirus transmitidos pela mosca branca, e causam severos sintomas em cultivares com um grande impacto econômico na agricultura de regiões tropicais e subtropicais. Com as recentes mudanças climáticas é esperado uma forte alteração na distribuição da mosca branca ao redor do globo, tornando-a uma das mais sérias ameaças à agricultura. No tomateiro este fato ainda é mais preocupante, pois há uma complexa população de espécies emergentes de begomovirus que infectam estas plantas. Neste presente trabalho, nós propomos o estudo de um novo mecanismo regulatório presente em células vegetais que responde a infecção viral. Por meio da mutação no receptor imune NIK, o qual é alvo da proteína viral NSP, nós promovemos a ativação de um mecanismo de resposta antiviral que confere uma tolerância eficaz a diferentes espécies de begomovirus. Os nossos resultados também melhoraram o entendimento sobre o mecanismo de defesa intermediada pelo receptor NIK. Por meio de uma comparação usando quatro técnicas de agrupamentos hierárquico com quatro diferentes normalizadores presente no pacote edgeR, os perfis transcricionais do mutante T474D, que é o receptor NIK na sua forma ativa, com os das plantas infectas, mostraram que as plantas T474D mimetizaram o perfil das plantas infectadas, pois estes dois grupos se agruparam com um alto grau de confiabilidade, enquanto as plantas NIK induzidas e selvagens se diferenciaram em um grupo a parte. Além disso, a eliminação dos genes diferencialmente expressos (DE) do mutante T474D em todos os dados brutos dos tratamentos reforçaram que a resposta do mutante T474D mimetiza a planta com a infecção viral, pois os genótipos das plantas infectadas se agruparam com as selvagens também com um alto grau de confiabilidade. Portanto, estes resultados indicam que a infecção viral induziu a resposta antiviral mediada por NIK. Também foi empregado quatro diferentes métodos de expressão diferencial e um método ivde enriquecimento de grupos gênicos (GSEA) nos dados de RNA-seq. Estes revelaram que a expressão ectópica do mutante T474D causa uma massiva down regulação de genes relacionados a tradução e uma up regulação de genes associados ao sistema imune. A down regulação mediada por T474D dos genes relacionados a tradução foi associado a uma supressão global na produção de proteínas, diminuindo assim a tradução dos polissomos do mRNA viral, aumentando, então a tolerância a begomovirus. Coletivamente nossos dados indicam que a sinalização antiviral mediada por NIK promove a resposta de defesa pela (i) supressão global da tradução e (ii) up regulação dos genes relacionado ao sistema imune da planta. O grande volume de dados provenientes do sequenciamento de RNA (RNA-seq) por meio de técnicas que geram uma grande quantidade de dados tal como os vindos de tecnologias de sequenciadores de nova geração, já estão disponíveis para a maioria dos laboratórios de pesquisa, e portanto está rapidamente se tornando uma ferramenta chave nos experimentos de expressão gênica. Os dados de RNA- seq são trabalhados da seguinte forma: os fragmentos (pequenas sequencias geradas pela tecnologia atual de RNA-seq) são mapeadas (alinhadas) com sequencias de referencia que podem ser o genoma ou transcriptoma, então uma tabela de contagemcontendo o número de fragmentos por gene é gerada e posteriormente analisada com os métodos de expressão diferencial. Na verdade este protocolo pode ser um trabalho árduo para pessoas que não têm muita experiência no ambiente R. Para encontrar genes cuja a expressão estão estatisticamente diferentes entre os tratamentos, além de avaliar o significado biológico através da anotação, muitos scripts do R precisam ser criados e as análises serem rodadas de forma correta. A variedade de opções contidas nas metodologias para a expressão gênica diferencial do ambiente R/Bioconductor fazem a tarefa de analisar esses tipos de dados ainda mais complicada. Portanto, nós desenvolvemos uma plataforma que facilita esse tipo de análise, fazendo com que as interações entre o usuário e o ambiente seja rápida e amigável. Ainda este sistema permite a possibilidade de combinar diferentes p- valores usando técnicas inspiradas na meta análise. Os métodos de expressão diferencial disponíveis são: edgeR, DESeq2, baySeq e NBPSeq. De fato, por meio de poucos passos uma análise poderá ser completada. Um diretório contendo o projeto que são informações das análises e todos os arquivos gerados incluindo os scripts serão armazenados juntamente com um banco de vdados em SQLite contendo os genes diferencialmente expressos com seu valor de expressão e anotação. Este programa é livre e de código aberto, permitindo assim quaisquer contribuições. Está plataforma é completamente livre.Conselho Nacional de Desenvolvimento Científico e TecnológicoBegomoviruses (whitefly-transmitted geminiviruses) cause severe diseases of high economic impact on a variety of agriculturally relevant crops in tropical and subtropical areas. Current climate changes are expected to alter more the whitefly distribution along the globe posing a major threat to agriculture worldwide. This is particularly true for the case of tomato plants which are inflicted by a complex population of emergent species of tomato-infecting begomoviruses. Here we uncovered a novel regulatory mechanism of plant cells to fight plant DNA virus infection. By mutating the immune receptor NIK, which is a target of the begomovirus protein NSP, we promoted the activation of an antiviral defense response in tomato, which was effective to confer tolerance to different species of begomoviruses. Our results also shed light on the mechanism underlying the NIK-mediated defense. A comparison of the gain-of- function mutant (T474D)-induced transcriptome with infected WT transcriptome using a combination of four different clustering methods and four different normalization factors provided by the edgeR package revealed that the T474D- induced transcriptome mimicked the infected transcriptome as they clustered together with high confidence and they differ from the normal NIK-induced expression profile. Furthermore, the elimination of the mock T474D DE genes from the raw of all treatments further indicates that the expression profile induced by the T474D mutant mimics greatly the response to the viral infection, as the mock- and infected-induced transcriptomes from each genotype clustered together with high significance. These results indicate that the viral infection was the trigger of the NIK-mediated antiviral response. Furthermore, we employed four different methods for DGE analysis and the enrichment GSEA method to statistically analyze the RNA-seq data, which revealed that ectopic expression of T474D causes a massive down-regulation of the translation-related genes and up-regulatiom of immune system-associated genes. The T474D-mediated down-regulation of translation-related genes was associated with suppression of global protein, decreased viral mRNA loading in viiactively translating polaysomes and enhanced tolerance against begomoviruses. Collectively our data indicate that the NIK-mediated antiviral signaling promotes a defense response by (i) suppressing global translation and (ii) up-regulating immune defense-related genes. The high-volume of RNA sequencing data provided by many high-throughput techniques like the next generation RNA sequencing technology (RNA-seq) is now within reach of any research laboratory and is quickly becoming established as a key research tool in any global gene expression experiments. In a RNA-seq workflow, the reads (short sequence generated by RNA-seq technology) are mapped (aligned) to a reference sequences data sets (transcriptome or genome), a counting table (number of reads per gene) can be set up and, then, a further downstream analysis can be executed to recover the biological meaning of the experiment. Actually, this protocol can be an arduous work for a person who is not an R experienced user. To find out which genes are statistically different in the expression profile among treatments and evaluate the biological meaning though annotation, many R scripts must be created and properly run. The variety of options for differential gene expression (DEG) methodology available on the R/Bioconductor makes this task even more troublesome. Our platform was designed to fill these gaps and make these iterations faster and easier. Yet, it reaches further expectations by allowing the combination of the p-values generated by the DGE methods: edgeR, DESeq2, baySeq and NBPSeq. Inspired on the meta analysis we used a combined p-value calculated by the Fisher's method, weighted Z-test, truncated product method, binomial test or a simple intersection (average or median) for helping the decision of the statistically significative DE genes based on these multiple DEG methods. To accomplish this goal, the friendly interface interacts with a low level R scripts to perform an RNA-Seq analysis without using directly a bunch of scripts. Indeed, by a few steps the analysis will be completely performed. A directory with the project name will be used to save all the generated files and store a SQLite database containing the DE genes with their expression values and annotation. As the program has the goal to be completely free and open to contributions, all the programs methods were designed to be a transparent system to the end user. This platform is completely free

    Integrated analysis of individual codon contribution to protein biosynthesis reveals a new approach to improving the basis of rational gene design

    No full text
    Gene codon optimization may be impaired by the misinterpretation of frequency and optimality of codons. Although recent studies have revealed the effects of codon usage bias (CUB) on protein biosynthesis, an integrated perspective of the biological role of individual codons remains unknown. Unlike other previous studies, we show, through an integrated framework that attributes of codons such as frequency, optimality and positional dependency should be combined to unveil individual codon contribution for protein biosynthesis. We designed a codon quantification method for assessing CUB as a function of position within genes with a novel constraint: the relativity of position-dependent codon usage shaped by coding sequence length. Thus, we propose a new way of identifying the enrichment, depletion and non-uniform positional distribution of codons in different regions of yeast genes. We clustered codons that shared attributes of frequency and optimality. The cluster of non-optimal codons with rare occurrence displayed two remarkable characteristics: higher codon decoding time than frequent–non-optimal cluster and enrichment at the 5′-end region, where optimal codons with the highest frequency are depleted. Interestingly, frequent codons with non-optimal adaptation to tRNAs are uniformly distributed in the Saccharomyces cerevisiae genes, suggesting their determinant role as a speed regulator in protein elongation

    Integrated analysis of individual codon contribution to protein biosynthesis reveals a new approach to improving the basis of rational gene design

    No full text
    Gene codon optimization may be impaired by the misinterpretation of frequency and optimality of codons. Although recent studies have revealed the effects of codon usage bias (CUB) on protein biosynthesis, an integrated perspective of the biological role of individual codons remains unknown. Unlike other previous studies, we show, through an integrated framework that attributes of codons such as frequency, optimality and positional dependency should be combined to unveil individual codon contribution for protein biosynthesis. We designed a codon quantification method for assessing CUB as a function of position within genes with a novel constraint: the relativity of position-dependent codon usage shaped by coding sequence length. Thus, we propose a new way of identifying the enrichment, depletion and non-uniform positional distribution of codons in different regions of yeast genes. We clustered codons that shared attributes of frequency and optimality. The cluster of non-optimal codons with rare occurrence displayed two remarkable characteristics: higher codon decoding time than frequent–non-optimal cluster and enrichment at the 5′-end region, where optimal codons with the highest frequency are depleted. Interestingly, frequent codons with non-optimal adaptation to tRNAs are uniformly distributed in the Saccharomyces cerevisiae genes, suggesting their determinant role as a speed regulator in protein elongation

    Inference of differentially expressed genes using generalized linear mixed models in a pairwise fashion

    No full text
    Background Technological advances involving RNA-Seq and Bioinformatics allow quantifying the transcriptional levels of genes in cells, tissues, and cell lines, permitting the identification of Differentially Expressed Genes (DEGs). DESeq2 and edgeR are well-established computational tools used for this purpose and they are based upon generalized linear models (GLMs) that consider only fixed effects in modeling. However, the inclusion of random effects reduces the risk of missing potential DEGs that may be essential in the context of the biological phenomenon under investigation. The generalized linear mixed models (GLMM) can be used to include both effects. Methods We present DEGRE (Differentially Expressed Genes with Random Effects), a user-friendly tool capable of inferring DEGs where fixed and random effects on individuals are considered in the experimental design of RNA-Seq research. DEGRE preprocesses the raw matrices before fitting GLMMs on the genes and the derived regression coefficients are analyzed using the Wald statistical test. DEGRE offers the Benjamini-Hochberg or Bonferroni techniques for P-value adjustment. Results The datasets used for DEGRE assessment were simulated with known identification of DEGs. These have fixed effects, and the random effects were estimated and inserted to measure the impact of experimental designs with high biological variability. For DEGs’ inference, preprocessing effectively prepares the data and retains overdispersed genes. The biological coefficient of variation is inferred from the counting matrices to assess variability before and after the preprocessing. The DEGRE is computationally validated through its performance by the simulation of counting matrices, which have biological variability related to fixed and random effects. DEGRE also provides improved assessment measures for detecting DEGs in cases with higher biological variability. We show that the preprocessing established here effectively removes technical variation from those matrices. This tool also detects new potential candidate DEGs in the transcriptome data of patients with bipolar disorder, presenting a promising tool to detect more relevant genes. Conclusions DEGRE provides data preprocessing and applies GLMMs for DEGs’ inference. The preprocessing allows efficient remotion of genes that could impact the inference. Also, the computational and biological validation of DEGRE has shown to be promising in identifying possible DEGs in experiments derived from complex experimental designs. This tool may help handle random effects on individuals in the inference of DEGs and presents a potential for discovering new interesting DEGs for further biological investigation

    Genomic growth curves of an outbred pig population

    Full text link
    In the current post-genomic era, the genetic basis of pig growth can be understood by assessing SNP marker effects and genomic breeding values (GEBV) based on estimates of these growth curve parameters as phenotypes. Although various statistical methods, such as random regression (RR-BLUP) and Bayesian LASSO (BL), have been applied to genomic selection (GS), none of these has yet been used in a growth curve approach. In this work, we compared the accuracies of RR-BLUP and BL using empirical weight-age data from an outbred F2 (Brazilian Piau X commercial) population. The phenotypes were determined by parameter estimates using a nonlinear logistic regression model and the halothane gene was considered as a marker for evaluating the assumptions of the GS methods in relation to the genetic variation explained by each locus. BL yielded more accurate values for all of the phenotypes evaluated and was used to estimate SNP effects and GEBV vectors. The latter allowed the construction of genomic growth curves, which showed substantial genetic discrimination among animals in the final growth phase. The SNP effect estimates allowed identification of the most relevant markers for each phenotype, the positions of which were coincident with reported QTL regions for growth traits

    Expression of myogenes in longissimus dorsi muscle during prenatal development in commercial and local Piau pigs

    No full text
    Abstract This study used qRT-PCR to examine variation in the expression of 13 myogenes during muscle development in four prenatal periods (21, 40, 70 and 90 days post-insemination) in commercial (the three-way Duroc, Landrace and Large-White cross) and local Piau pig breeds that differ in muscle mass. There was no variation in the expression of the CHD8, EID2B, HIF1AN, IKBKB, RSPO3, SOX7 and SUFU genes at the various prenatal ages or between breeds. The MAP2K1 and RBM24 genes showed similar expression between commercial and Piau pigs but greater expression (p < 0.05) in at least one prenatal period. Pair-wise comparisons of prenatal periods in each breed showed that only the CSRP3, LEF1, MRAS and MYOG genes had higher expression (p < 0.05) in at least one prenatal period in commercial and Piau pigs. Overall, these results identified the LEF1 gene as a primary candidate to account for differences in muscle mass between the pig breeds since activation of this gene may lead to greater myoblast fusion in the commercial breed compared to Piau pigs. Such fusion could explain the different muscularity between breeds in the postnatal periods
    corecore