5 research outputs found

    Leveraging expression and network data for protein function prediction

    Get PDF
    2012 Summer.Includes bibliographical references.Protein function prediction is one of the prominent problems in bioinformatics today. Protein annotation is slowly falling behind as more and more genomes are being sequenced. Experimental methods are expensive and time consuming, which leaves computational methods to fill the gap. While computational methods are still not accurate enough to be used without human supervision, this is the goal. The Gene Ontology (GO) is a collection of terms that are the standard for protein function annotations. Because of the structure of GO, protein function prediction is a hierarchical multi-label classification problem. The classification method used in this thesis is GOstruct, which performs structured predictions that take into account all GO terms. GOstruct has been shown to work well, but there are still improvements to be made. In this thesis, I work to improve predictions by building new kernels from the data that are used by GOstruct. To do this, I find key representations of the data that help define what kernels perform best on the variety of data types. I apply this methodology to function prediction in two model organisms, Saccharomyces cerevisiae and Mus musculus, and found better methods for interpreting the data

    An approach to improved microbial eukaryotic genome annotation

    Full text link
    Les nouvelles technologies de séquençage d’ADN ont accélérées la vitesse à laquelle les données génomiques sont générées. Par contre, une fois séquencées et assemblées, un défi continu est l'annotation structurelle précise de ces nouvelles séquences génomiques. Par le séquençage et l'assemblage du transcriptome (RNA-Seq) du même organisme, la précision de l'annotation génomique peut être améliorée, car les lectures de RNA-Seq et les transcrits assemblés fournissent des informations précises sur la structure des gènes. Plusieurs pipelines bio-informatiques actuelles incorporent des informations provenant du RNA-Seq ainsi que des données de similarité des séquences protéiques, pour automatiser l'annotation structurelle d’un génome de manière que la qualité se rapproche à celle de l'annotation par des experts. Les pipelines suivent généralement un flux de travail similaire. D'abord, les régions répétitives sont identifiées afin d'éviter de fausser les alignements de séquences et les prédictions de gènes. Deuxièmement, une base de données est construite contenant les données expérimentales telles que l’alignement des lectures de séquences, des transcrits et des protéines, ce qui informe les prédictions de gènes basées sur les Modèles de Markov Cachés généralisés. La dernière étape est de consolider les alignements de séquences et les prédictions de gènes dans un consensus de haute qualité. Or, les pipelines existants sont complexes et donc susceptibles aux biais et aux erreurs, ce qui peut empoisonner les prédictions de gènes et la construction de modèles consensus. Nous avons développé une approche améliorée pour l'annotation des génomes eucaryotes microbiens. Notre approche comprend deux aspects principaux. Le premier est axé sur la création d'un ensemble d'évidences extrinsèques le plus complet et diversifié afin de mieux informer les prédictions de gènes. Le deuxième porte sur la construction du consensus du modèle de gènes en utilisant les évidences extrinsèques et les prédictions par MMC, tel que l'influence de leurs biais potentiel soit réduite. La comparaison de notre nouvel outil avec trois pipelines populaires démontre des gains significatifs de sensibilité et de spécificité des modèles de gènes, de transcrits, d'exons et d'introns dans l’annotation structural de génomes d’eucaryotes microbiens.New sequencing technologies have considerably accelerated the rate at which genomic data is being generated. One ongoing challenge is the accurate structural annotation of those novel genomes once sequenced and assembled, in particular if the organism does not have close relatives with well-annotated genomes. Whole-transcriptome sequencing (RNA-Seq) and assembly—both of which share similarities to whole-genome sequencing and assembly, respectively—have been shown to dramatically increase the accuracy of gene annotation. Read coverage, inferred splice junctions and assembled transcripts can provide valuable information about gene structure. Several annotation pipelines have been developed to automate structural annotation by incorporating information from RNA-Seq, as well as protein sequence similarity data, with the goal of reaching the accuracy of an expert curator. Annotation pipelines follow a similar workflow. The first step is to identify repetitive regions to prevent misinformed sequence alignments and gene predictions. The next step is to construct a database of evidence from experimental data such as RNA-Seq mapping and assembly, and protein sequence alignments, which are used to inform the generalised Hidden Markov Models of gene prediction software. The final step is to consolidate sequence alignments and gene predictions into a high-confidence consensus set. Thus, automated pipelines are complex, and therefore susceptible to incomplete and erroneous use of information, which can poison gene predictions and consensus model building. Here, we present an improved approach to microbial eukaryotic genome annotation. Its conception was based on identifying and mitigating potential sources of error and bias that are present in available pipelines. Our approach has two main aspects. The first is to create a more complete and diverse set of extrinsic evidence to better inform gene predictions. The second is to use extrinsic evidence in tandem with predictions such that the influence of their respective biases in the consensus gene models is reduced. We benchmarked our new tool against three known pipelines, showing significant gains in gene, transcript, exon and intron sensitivity and specificity in the genome annotation of microbial eukaryotes

    Effect of the host tree genotype and the fire recurrence on fungal communities in Mediterranean pine forests

    Full text link
    Tesis doctoral inédita leída en la Universidad Autónoma de Madrid, Facultad de Ciencias, Departamento de Biología. Fecha de lectura:10-03-2017Esta tesis tiene embargado el acceso al texto completo hasta el 10-03-201

    Studies into host macrophage transcriptional control by the African Swine Fever Virus protein A238L

    Get PDF
    African swine fever virus (ASFV) is a large double-stranded DNA virus which causes a lethal haemorrhagic fever in domestic pigs. This virus primarily infects cells from the monocyte/macrophage lineage and its ability to manipulate the function of these cells is key to the pathogenesis of this disease. ASFV encodes several proteins involved in immune evasion. One of these proteins, A238L, has been shown to inhibit host macrophage gene transcription. This protein has been shown to interact with several cellular proteins involved in signal transduction: a serine/threonine protein phosphatase, calcinerurin (CaN), the transcription factor NF-кB, and most recently the transcriptional co-activator CREB binding protein (CBP/P300). However its exact mechanism of action is not fully understood. Previous work has been limited to the investigation of individual signaling pathways and/or the expression of individual host genes. The aim of this study was to investigate the global effect of A238L on host macrophage gene transcription and also to carry out further investigation into the mechanism by which this protein functions. To determine the global effect of A238L on host macrophage gene transcription differential gene expression between porcine cells expressing A238L and control cells was examined using a porcine oligonucleotide microarray. These results demonstrated that A238L was a potent inhibitor of host macrophage gene expression. Functional characterisation of the annotated genes showed that a large proportion of A238L down-regulated genes are typically induced in response to cell stress. Significantly, genes regulated by the I kappa B kinase (IKK), mitogen-activated protein kinase (MAPK) and janus kinase/signal transducers and activators of transcription (JAK/STAT) signaling pathways were all shown to be down regulated by A238L. Genes associated with the MAPK pathways were particularly enriched. The transcription of A238L-regulated genes is controlled by numerous different transcription factors, including NF-кB. All of the transcription factors identified interact with the transcription co-activator CBP/P300. This provides a common link between these factors, and indicates that A238L may target CBP/P300 to inhibit gene transcription. This observation supports recent work demonstrating that A238L interacts with and inhibits CBP/P300 function. To explore the potential mechanisms involved in the nuclear localisation of A238L, ASFV-infected Vero cells, expressing A238L under the control of its own promoter, were examined under a range of conditions using confocal microscopy. The results demonstrated that A238L was actively imported into the nucleus and exported by a CRM 1 mediated pathway, although a pool of A238L protein remained in the cytoplasm. Sequence analysis of A238L identified the presence of two putative nuclear localisation signals (NLS-1 and NLS-2). NLS-2 was located within A238L’s CaN docking motif. Mutation of these motifs indicated that both NLS-1 and NLS-2 are active and exhibit functional redundancy. Mutation of the CaN docking motif alone, in the presence of intact NLS-2, resulted in a dramatic increase in the nuclear localisation of A238L. These results are consistent with a model in which A238L functions within both the nucleus and the cytoplasm and suggest that binding of CaN to A238L masks NLS-2, contributing to the cytoplasmic retention of A238L

    The characterisation of trypanosomal type 1 DnaJ-like proteins

    Get PDF
    Trypanosomes are protozoans, of which many are parasitic, and possess complex lifecycles which alternate between mammalian and arthropod hosts. As is the case with most organisms, molecular chaperones and heat shock proteins are encoded within the genomes of these protozoans. These proteins are an integral part of maintaining the structural integrity of proteins during normal and stress conditions. Heat shock protein 40 (Hsp40) is a co-chaperone of heat shock protein 70 (Hsp70) and in some cases can act as a chaperone. These proteins work together to bind non-native polypeptide structures to prevent unfolded protein aggregrate formation in times of stress, translocate proteins across organelle membranes, and transport unsalvageable proteins to proteolytic degradation by the cellular proteasome. Hsp40s are divided into four types based on their domain structure. Analysis of the nuclear genomes of eight trypanosomatid species revealed that less than 10 of the approximate 70 Hsp40 sequences per genome were Type 1 Hsp40s, many of which contained putative orthologues in the other seven trypanosomatid genomes. One of these Type 1 Hsp40s from T b. brucei, Trypanosoma brucei DnaJ 2 (Tbj2), was functionally characterised in T brucei brucei. RNA interference knockdown of expression in T brucei brucei showed that cells deficient in Tbj2 displayed a severe inhibition of the growth of the cell population. The levels of the Tbj2 protein population in T brucei brucei cells increases after exposure to 42°c and the protein was found to have a generalized cytoplasmic subcellular localization at 37°c. These findings provide evidence that Tbj2 is an orthologue of Yeast DnaJ 1 (Y dj l), an essential S. cerevisiae protein. Hsp40s interact with their partner Hsp70s through their J-domain. The amino acids of the J-domain important for a functional interaction with Hsp70 were examined in Trypanosoma cruzi DnaJ 2 (Tcj2) (the orthologue of Tbj2) and T cruzi DnaJ protein 3 (Tcj3) by testing their ability to substitute for Y dj l in Saccharomyces cerevisae and for DnaJ in Escherichia coli. In both systems, the positively charged amino acids of Helix II and III of the J-domain disrupted the functional interaction of these Hsp40s with their partner Hsp70s. Substitutions in Helix I and IV of the J-domains of Tcj2 and Tcj3 produced varied results in the two different systems, possibly suggesting that these helices serve to define with which Hsp70s a given Hsp40 can interact. The inability of an Hsp40 and an Hsp70 to interact functionally does not necessarily mean a total absence of physical interaction between these proteins. The amino acid substitution of the histidine in the HPD motif (H34Q) of the J-domain of Tcj2 and Tcj3 removed the ability of these proteins to interact functionally with S. cerevisiae Hsp70 (Ssal) in vivo. However, preliminary binding studies using the quartz crystal microbalance with dissipation monitoring (QCM-D) show that Tcj2 and Tcj2(H34Q) both physically interact with M sativa Hsp70 in vitro. This study is the first report to provide evidence that certain trypanosoma! Type 1 Hsp40s are essential proteins. Futhermore, the interaction of these Hsp40s with Hsp70 identified important features of the functional interface of this chaperone machinery
    corecore