1,894 research outputs found

    Computational identification of transposable elements in the mouse genome

    Get PDF
    Repeat sequences cover about 39 percent of the mouse genome and completion of sequencing of the mouse genome [1] has enabled extensive research on the role of repeat sequences in mammalian genomics. This research covers the identification of Transposable elements (TEs) within the mouse transcriptome, based on available sequence information on mouse cDNAs (complementary DNAs) from GenBank [28]. The transcripts are screened for repeats using RepeatMasker [23], whose results are sieved to retain only Interspersed repeats (IRS). Using various bioinformatics software tools as well as tailor made programming, the research establishes: (i) the absolute location coordinates of the TEs on the transcript. (ii) The location of the IRs with respect to the 5’UTR, CDS and 3’UTR sequence features. (iii) The quality of alignment of the TE’s consensus sequence on the transcripts where they exist, (iv) the frequencies and distributions of the TEs on the cDNAs, (v) descriptions of the types and roles of transcripts containing TEs. This information has been collated and stored in a relational database (MTEDB) at http://warta.bio.psu.edu/htt_doc/M TEDB/homepage.htm)

    Pseudo–Messenger RNA: Phantoms of the Transcriptome

    Get PDF
    The mammalian transcriptome harbours shadowy entities that resist classification and analysis. In analogy with pseudogenes, we define pseudo–messenger RNA to be RNA molecules that resemble protein-coding mRNA, but cannot encode full-length proteins owing to disruptions of the reading frame. Using a rigorous computational pipeline, which rules out sequencing errors, we identify 10,679 pseudo–messenger RNAs (approximately half of which are transposon-associated) among the 102,801 FANTOM3 mouse cDNAs: just over 10% of the FANTOM3 transcriptome. These comprise not only transcribed pseudogenes, but also disrupted splice variants of otherwise protein-coding genes. Some may encode truncated proteins, only a minority of which appear subject to nonsense-mediated decay. The presence of an excess of transcripts whose only disruptions are opal stop codons suggests that there are more selenoproteins than currently estimated. We also describe compensatory frameshifts, where a segment of the gene has changed frame but remains translatable. In summary, we survey a large class of non-standard but potentially functional transcripts that are likely to encode genetic information and effect biological processes in novel ways. Many of these transcripts do not correspond cleanly to any identifiable object in the genome, implying fundamental limits to the goal of annotating all functional elements at the genome sequence level

    Doctor of Philosophy

    Get PDF
    dissertationWhole genome sequencing projects have expanded our understanding of evolution, organism development, and human disease. Now advances in secondgeneration technologies are making whole genome sequencing routine even for small laboratories. However, advances in annotation technology have not kept pace with genome sequencing, and annotation has become the major bottleneck for many genome projects (especially those with limited bioinformatics expertise). At the same time, challenges associated with genomics research extend beyond merely annotating genomes, as annotations must be subjected to diverse downstream analyses, the complexities of which can confound smaller research groups. Additionally, with improvements in genome assembly and the wide availability of next generation transcriptome data (mRNA-seq), researchers have the opportunity to re-annotate previously published genomes, which creates new difficulties for data integration and management that are not well addressed by existing tools. In response to the challenges facing second-generation genome projects, I have developed the annotation pipeline MAKER2 together with accessory software for downstream analysis and data management. The MAKER2 annotation pipeline finds repeats within a genome, aligns ESTs and cDNAs, identifies sites of protein homology, and produces database-ready gene annotations in association with supporting evidence. However MAKER2 can go beyond structural annotation to identify and integrate functional annotations. MAKER2 also provides researchers iv with the capability to re-annotate legacy genome datasets and to incorporate mRNAseq. Additionally, MAKER2 supports distributed parallelization on computer clusters, thus providing a scalable solution for datasets of any size. Annotations produced by MAKER2 can be directly loaded into many popular downstream annotation analysis and management tools from the Generic Model Organism Database Project. By using MAKER2 with these tools, research groups can quickly build genome annotations, perform analyses, and distribute their data to the wider scientific community. Here I describe the internal architecture of MAKER2, and document its computational capabilities. I also describe my work to annotate and analyze eight emerging model organism genomes in collaboration with their associated genome projects. Thus, in the course of my thesis work, I have addressed a specific need within the scientific community for easy-to-use annotation and analysis tools while also expanding our understanding of evolution and biology

    Sequencing, Analysis, and Annotation of Expressed Sequence Tags for \u3ci\u3eCamelus dromedarius\u3c/i\u3e

    Get PDF
    Despite its economical, cultural, and biological importance, there has not been a large scale sequencing project to date for Camelus dromedarius. With the goal of sequencing complete DNA of the organism, we first established and sequenced camel EST libraries, generating 70,272 reads. Following trimming, chimera check, repeat masking, cluster and assembly, we obtained 23,602 putative gene sequences, out of which over 4,500 potentially novel or fast evolving gene sequences do not carry any homology to other available genomes. Functional annotation of sequences with similarities in nucleotide and protein databases has been obtained using Gene Ontology classification. Comparison to available full length cDNA sequences and Open Reading Frame (ORF) analysis of camel sequences that exhibit homology to known genes show more than 80% of the contigs with an ORF\u3e300 bp and ~40% hits extending to the start codons of full length cDNAs suggesting successful characterization of camel genes. Similarity analyses are done separately for different organisms including human, mouse, bovine, and rat. Accompanying web portal, CAGBASE (http://camel.kacst.edu.sa/), hosts a relational database containing annotated EST sequences and analysis tools with possibility to add sequences from public domain. We anticipate our results to provide a home base for genomic studies of camel and other comparative studies enabling a starting point for whole genome sequencing of the organism

    PvRON2, a new Plasmodium vivax rhoptry neck antigen

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Rhoptries are specialized organelles from parasites belonging to the phylum <it>Apicomplexa</it>; they secrete their protein content during invasion of host target cells and are sorted into discrete subcompartments within rhoptry neck or bulb. This distribution is associated with these proteins' role in tight junction (TJ) and parasitophorous vacuole (PV) formation, respectively.</p> <p>Methods</p> <p><it>Plasmodium falciparum </it>RON2 amino acid sequence was used as bait for screening the codifying gene for the homologous protein in the <it>Plasmodium vivax </it>genome. Gene synteny, as well as identity and similarity values, were determined for <it>ron2 </it>and its flanking genes among <it>P. falciparum</it>, <it>P. vivax </it>and other malarial parasite genomes available at PlasmoDB and Sanger Institute databases. <it>Pvron2 </it>gene transcription was determined by RT-PCR of cDNA obtained from the <it>P. vivax </it>VCG-1 strain. Protein expression and localization were assessed by Western blot and immunofluorescence using polyclonal anti-<it>Pv</it>RON2 antibodies. Co-localization was confirmed using antibodies directed towards specific microneme and rhoptry neck proteins.</p> <p>Results and discussion</p> <p>The first <it>P. vivax </it>rhoptry neck protein (named here <it>Pv</it>RON2) has been identified in this study. <it>Pv</it>RON2 is a 2,204 residue-long protein encoded by a single 6,615 bp exon containing a hydrophobic signal sequence towards the amino-terminus, a transmembrane domain towards the carboxy-terminus and two coiled coil α-helical motifs; these are characteristic features of several previously described vaccine candidates against malaria. This protein also contains two tandem repeats within the interspecies variable sequence possibly involved in evading a host's immune system. <it>Pv</it>RON2 is expressed in late schizonts and localized in rhoptry necks similar to what has been reported for <it>Pf</it>RON2, which suggests its participation during target cell invasion.</p> <p>Conclusions</p> <p>The identification and partial characterization of the first <it>P. vivax </it>rhoptry neck protein are described in the present study. This protein is homologous to <it>Pf</it>RON2 which has previously been shown to be associated with <it>Pf</it>AMA-1, suggesting a similar role for <it>Pv</it>RON2.</p

    Organisation of transcriptomes : searching for regulatory DNA elements involved in the correlated expression of genomic neighbours

    Get PDF
    Since the thesis that every gene acts as a single unit which transcription is solely regulated by promoter-binding transcription factors (TF) - irrespective of the surrounding genomic landscape - has been rejected, transcriptional regulation of genes has become a field of ever-growing complexity. Factors like the “state” of chromatin and DNA positioning inside the nucleus have been shown to have a major impact on the activation and repression of the transcription of genes. Furthermore it was discovered that the expression of individual adjacent genes in the genome is not independent, but genomic neighbours are co-expressed more often than what would be expected by chance. These neighbours form clusters of co-expressed genes that can be found all over the genome containing from two to several adjacent entities. In this thesis a possible explanation of this observation was investigated, namely the active alteration of chromatin state by possible interaction of transcription factors or other genomic features. Sequence analysis methods were used to search for possible DNA specific factors that could form “active chromatin hubs (ACH)” in the region of those o-expressed genes and therefore could lead to the revealed correlated expression. The thesis is based on our earlier analysis of the expression of genomic neighbours in mouse/human and proceeds these investigations

    A Global View of Cancer-Specific Transcript Variants by Subtractive Transcriptome-Wide Analysis

    Get PDF
    BACKGROUND: Alternative pre-mRNA splicing (AS) plays a central role in generating complex proteomes and influences development and disease. However, the regulation and etiology of AS in human tumorigenesis is not well understood. METHODOLOGY/PRINCIPAL FINDINGS: A Basic Local Alignment Search Tool database was constructed for the expressed sequence tags (ESTs) from all available databases of human cancer and normal tissues. An insertion or deletion in the alignment of EST/EST was used to identify alternatively spliced transcripts. Alignment of the ESTs with the genomic sequence was further used to confirm AS. Alternatively spliced transcripts in each tissue were then subtractively cross-screened to obtain tissue-specific variants. We systematically identified and characterized cancer/tissue-specific and alternatively spliced variants in the human genome based on a global view. We identified 15,093 cancer-specific variants of 9,989 genes from 27 types of human cancers and 14,376 normal tissue-specific variants of 7,240 genes from 35 normal tissues, which cover the main types of human tumors and normal tissues. Approximately 70% of these transcripts are novel. These data were integrated into a database HCSAS (http://202.114.72.39/database/human.html, pass:68756253). Moreover, we observed that the cancer-specific AS of both oncogenes and tumor suppressor genes are associated with specific cancer types. Cancer shows a preference in the selection of alternative splice-sites and utilization of alternative splicing types. CONCLUSIONS/SIGNIFICANCE: These features of human cancer, together with the discovery of huge numbers of novel splice forms for cancer-associated genes, suggest an important and global role of cancer-specific AS during human tumorigenesis. We advise the use of cancer-specific alternative splicing as a potential source of new diagnostic, prognostic, predictive, and therapeutic tools for human cancer. The global view of cancer-specific AS is not only useful for exploring the complexity of the cancer transcriptome but also widens the eyeshot of clinical research
    corecore