5 research outputs found
Leveraging expression and network data for protein function prediction
2012 Summer.Includes bibliographical references.Protein function prediction is one of the prominent problems in bioinformatics today. Protein annotation is slowly falling behind as more and more genomes are being sequenced. Experimental methods are expensive and time consuming, which leaves computational methods to fill the gap. While computational methods are still not accurate enough to be used without human supervision, this is the goal. The Gene Ontology (GO) is a collection of terms that are the standard for protein function annotations. Because of the structure of GO, protein function prediction is a hierarchical multi-label classification problem. The classification method used in this thesis is GOstruct, which performs structured predictions that take into account all GO terms. GOstruct has been shown to work well, but there are still improvements to be made. In this thesis, I work to improve predictions by building new kernels from the data that are used by GOstruct. To do this, I find key representations of the data that help define what kernels perform best on the variety of data types. I apply this methodology to function prediction in two model organisms, Saccharomyces cerevisiae and Mus musculus, and found better methods for interpreting the data
An approach to improved microbial eukaryotic genome annotation
Les nouvelles technologies de séquençage d’ADN ont accélérées la vitesse à laquelle les
données génomiques sont générées. Par contre, une fois séquencées et assemblées, un défi
continu est l'annotation structurelle précise de ces nouvelles séquences génomiques. Par le
séquençage et l'assemblage du transcriptome (RNA-Seq) du même organisme, la précision de
l'annotation génomique peut être améliorée, car les lectures de RNA-Seq et les transcrits
assemblés fournissent des informations précises sur la structure des gènes. Plusieurs pipelines
bio-informatiques actuelles incorporent des informations provenant du RNA-Seq ainsi que des
données de similarité des séquences protéiques, pour automatiser l'annotation structurelle d’un
génome de manière que la qualité se rapproche à celle de l'annotation par des experts. Les
pipelines suivent généralement un flux de travail similaire. D'abord, les régions répétitives sont
identifiées afin d'éviter de fausser les alignements de séquences et les prédictions de gènes.
Deuxièmement, une base de données est construite contenant les données expérimentales telles
que l’alignement des lectures de séquences, des transcrits et des protéines, ce qui informe les
prédictions de gènes basées sur les Modèles de Markov Cachés généralisés. La dernière étape
est de consolider les alignements de séquences et les prédictions de gènes dans un consensus de
haute qualité. Or, les pipelines existants sont complexes et donc susceptibles aux biais et aux
erreurs, ce qui peut empoisonner les prédictions de gènes et la construction de modèles
consensus. Nous avons développé une approche améliorée pour l'annotation des génomes
eucaryotes microbiens. Notre approche comprend deux aspects principaux. Le premier est axé
sur la création d'un ensemble d'évidences extrinsèques le plus complet et diversifié afin de mieux
informer les prédictions de gènes. Le deuxième porte sur la construction du consensus du modèle
de gènes en utilisant les évidences extrinsèques et les prédictions par MMC, tel que l'influence
de leurs biais potentiel soit réduite. La comparaison de notre nouvel outil avec trois pipelines
populaires démontre des gains significatifs de sensibilité et de spécificité des modèles de gènes,
de transcrits, d'exons et d'introns dans l’annotation structural de génomes d’eucaryotes
microbiens.New sequencing technologies have considerably accelerated the rate at which genomic data is
being generated. One ongoing challenge is the accurate structural annotation of those novel
genomes once sequenced and assembled, in particular if the organism does not have close
relatives with well-annotated genomes. Whole-transcriptome sequencing (RNA-Seq) and
assembly—both of which share similarities to whole-genome sequencing and assembly,
respectively—have been shown to dramatically increase the accuracy of gene annotation. Read
coverage, inferred splice junctions and assembled transcripts can provide valuable information
about gene structure. Several annotation pipelines have been developed to automate structural
annotation by incorporating information from RNA-Seq, as well as protein sequence similarity
data, with the goal of reaching the accuracy of an expert curator. Annotation pipelines follow a
similar workflow. The first step is to identify repetitive regions to prevent misinformed sequence
alignments and gene predictions. The next step is to construct a database of evidence from
experimental data such as RNA-Seq mapping and assembly, and protein sequence alignments,
which are used to inform the generalised Hidden Markov Models of gene prediction software.
The final step is to consolidate sequence alignments and gene predictions into a high-confidence
consensus set. Thus, automated pipelines are complex, and therefore susceptible to incomplete
and erroneous use of information, which can poison gene predictions and consensus model
building. Here, we present an improved approach to microbial eukaryotic genome annotation.
Its conception was based on identifying and mitigating potential sources of error and bias that
are present in available pipelines. Our approach has two main aspects. The first is to create a
more complete and diverse set of extrinsic evidence to better inform gene predictions. The
second is to use extrinsic evidence in tandem with predictions such that the influence of their
respective biases in the consensus gene models is reduced. We benchmarked our new tool
against three known pipelines, showing significant gains in gene, transcript, exon and intron
sensitivity and specificity in the genome annotation of microbial eukaryotes
Effect of the host tree genotype and the fire recurrence on fungal communities in Mediterranean pine forests
Tesis doctoral inédita leÃda en la Universidad Autónoma de Madrid, Facultad de Ciencias, Departamento de BiologÃa. Fecha de lectura:10-03-2017Esta tesis tiene embargado el acceso al texto completo hasta el 10-03-201
Studies into host macrophage transcriptional control by the African Swine Fever Virus protein A238L
African swine fever virus (ASFV) is a large double-stranded DNA virus which causes
a lethal haemorrhagic fever in domestic pigs. This virus primarily infects cells from
the monocyte/macrophage lineage and its ability to manipulate the function of these
cells is key to the pathogenesis of this disease. ASFV encodes several proteins
involved in immune evasion. One of these proteins, A238L, has been shown to inhibit
host macrophage gene transcription. This protein has been shown to interact with
several cellular proteins involved in signal transduction: a serine/threonine protein
phosphatase, calcinerurin (CaN), the transcription factor NF-кB, and most recently the
transcriptional co-activator CREB binding protein (CBP/P300). However its exact
mechanism of action is not fully understood. Previous work has been limited to the
investigation of individual signaling pathways and/or the expression of individual host
genes. The aim of this study was to investigate the global effect of A238L on host
macrophage gene transcription and also to carry out further investigation into the
mechanism by which this protein functions.
To determine the global effect of A238L on host macrophage gene transcription
differential gene expression between porcine cells expressing A238L and control cells
was examined using a porcine oligonucleotide microarray. These results demonstrated
that A238L was a potent inhibitor of host macrophage gene expression. Functional
characterisation of the annotated genes showed that a large proportion of A238L
down-regulated genes are typically induced in response to cell stress. Significantly,
genes regulated by the I kappa B kinase (IKK), mitogen-activated protein kinase
(MAPK) and janus kinase/signal transducers and activators of transcription
(JAK/STAT) signaling pathways were all shown to be down regulated by A238L.
Genes associated with the MAPK pathways were particularly enriched. The
transcription of A238L-regulated genes is controlled by numerous different
transcription factors, including NF-кB. All of the transcription factors identified
interact with the transcription co-activator CBP/P300. This provides a common link
between these factors, and indicates that A238L may target CBP/P300 to inhibit gene
transcription. This observation supports recent work demonstrating that A238L
interacts with and inhibits CBP/P300 function.
To explore the potential mechanisms involved in the nuclear localisation of A238L,
ASFV-infected Vero cells, expressing A238L under the control of its own promoter,
were examined under a range of conditions using confocal microscopy. The results
demonstrated that A238L was actively imported into the nucleus and exported by a
CRM 1 mediated pathway, although a pool of A238L protein remained in the
cytoplasm. Sequence analysis of A238L identified the presence of two putative
nuclear localisation signals (NLS-1 and NLS-2). NLS-2 was located within A238L’s
CaN docking motif. Mutation of these motifs indicated that both NLS-1 and NLS-2
are active and exhibit functional redundancy. Mutation of the CaN docking motif
alone, in the presence of intact NLS-2, resulted in a dramatic increase in the nuclear
localisation of A238L. These results are consistent with a model in which A238L
functions within both the nucleus and the cytoplasm and suggest that binding of CaN
to A238L masks NLS-2, contributing to the cytoplasmic retention of A238L
The characterisation of trypanosomal type 1 DnaJ-like proteins
Trypanosomes are protozoans, of which many are parasitic, and possess complex lifecycles which alternate between mammalian and arthropod hosts. As is the case with most organisms, molecular chaperones and heat shock proteins are encoded within the genomes of these protozoans. These proteins are an integral part of maintaining the structural integrity of proteins during normal and stress conditions. Heat shock protein 40 (Hsp40) is a co-chaperone of heat shock protein 70 (Hsp70) and in some cases can act as a chaperone. These proteins work together to bind non-native polypeptide structures to prevent unfolded protein aggregrate formation in times of stress, translocate proteins across organelle membranes, and transport unsalvageable proteins to proteolytic degradation by the cellular proteasome. Hsp40s are divided into four types based on their domain structure. Analysis of the nuclear genomes of eight trypanosomatid species revealed that less than 10 of the approximate 70 Hsp40 sequences per genome were Type 1 Hsp40s, many of which contained putative orthologues in the other seven trypanosomatid genomes. One of these Type 1 Hsp40s from T b. brucei, Trypanosoma brucei DnaJ 2 (Tbj2), was functionally characterised in T brucei brucei. RNA interference knockdown of expression in T brucei brucei showed that cells deficient in Tbj2 displayed a severe inhibition of the growth of the cell population. The levels of the Tbj2 protein population in T brucei brucei cells increases after exposure to 42°c and the protein was found to have a generalized cytoplasmic subcellular localization at 37°c. These findings provide evidence that Tbj2 is an orthologue of Yeast DnaJ 1 (Y dj l), an essential S. cerevisiae protein. Hsp40s interact with their partner Hsp70s through their J-domain. The amino acids of the J-domain important for a functional interaction with Hsp70 were examined in Trypanosoma cruzi DnaJ 2 (Tcj2) (the orthologue of Tbj2) and T cruzi DnaJ protein 3 (Tcj3) by testing their ability to substitute for Y dj l in Saccharomyces cerevisae and for DnaJ in Escherichia coli. In both systems, the positively charged amino acids of Helix II and III of the J-domain disrupted the functional interaction of these Hsp40s with their partner Hsp70s. Substitutions in Helix I and IV of the J-domains of Tcj2 and Tcj3 produced varied results in the two different systems, possibly suggesting that these helices serve to define with which Hsp70s a given Hsp40 can interact. The inability of an Hsp40 and an Hsp70 to interact functionally does not necessarily mean a total absence of physical interaction between these proteins. The amino acid substitution of the histidine in the HPD motif (H34Q) of the J-domain of Tcj2 and Tcj3 removed the ability of these proteins to interact functionally with S. cerevisiae Hsp70 (Ssal) in vivo. However, preliminary binding studies using the quartz crystal microbalance with dissipation monitoring (QCM-D) show that Tcj2 and Tcj2(H34Q) both physically interact with M sativa Hsp70 in vitro. This study is the first report to provide evidence that certain trypanosoma! Type 1 Hsp40s are essential proteins. Futhermore, the interaction of these Hsp40s with Hsp70 identified important features of the functional interface of this chaperone machinery