52 research outputs found

    Automated eukaryotic gene structure annotation using EVidenceModeler and the Program to Assemble Spliced Alignments

    Get PDF
    EVidenceModeler (EVM) is an automated annotation tool that predicts protein-coding regions, alternatively spliced transcripts and untranslated regions of eukaryotic genes

    The TIGR Rice Genome Annotation Resource: improvements and new features

    Get PDF
    In The Institute for Genomic Research Rice Genome Annotation project (), we have continued to update the rice genome sequence with new data and improve the quality of the annotation. In our current release of annotation (Release 4.0; January 12, 2006), we have identified 42 653 non-transposable element-related genes encoding 49 472 gene models as a result of the detection of alternative splicing. We have refined our identification methods for transposable element-related genes resulting in 13 237 genes that are related to transposable elements. Through incorporation of multiple transcript and proteomic expression data sets, we have been able to annotate 24 799 genes (31 739 gene models), representing ∼50% of the total gene models, as expressed in the rice genome. All structural and functional annotation is viewable through our Rice Genome Browser which currently supports 59 tracks. Enhanced data access is available through web interfaces, FTP downloads and a Data Extractor tool developed in order to support discrete dataset downloads

    Re-annotation of the Theileria parva genome refines 53% of the proteome and uncovers essential components of N-glycosylation, a conserved pathway in many organisms

    Get PDF
    The apicomplexan parasite Theileria parva causes a livestock disease called East coast fever (ECF), with millions of animals at risk in sub-Saharan East and Southern Africa, the geographic distribution of T. parva. Over a million bovines die each year of ECF, with a tremendous economic burden to pastoralists in endemic countries. Comprehensive, accurate parasite genome annotation can facilitate the discovery of novel chemotherapeutic targets for disease treatment, as well as elucidate the biology of the parasite. However, genome annotation remains a significant challenge because of limitations in the quality and quantity of the data being used to inform the location and function of protein-coding genes and, when RNA data are used, the underlying biological complexity of the processes involved in gene expression. Here, we apply our recently published RNAseq dataset derived from the schizont life-cycle stage of T. parva to update structural and functional gene annotations across the entire nuclear genome.; The re-annotation effort lead to evidence-supported updates in over half of all protein-coding sequence (CDS) predictions, including exon changes, gene merges and gene splitting, an increase in average CDS length of approximately 50 base pairs, and the identification of 128 new genes. Among the new genes identified were those involved in N-glycosylation, a process previously thought not to exist in this organism and a potentially new chemotherapeutic target pathway for treating ECF. Alternatively-spliced genes were identified, and antisense and multi-gene family transcription were extensively characterized.; The process of re-annotation led to novel insights into the organization and expression profiles of protein-coding sequences in this parasite, and uncovered a minimal N-glycosylation pathway that changes our current understanding of the evolution of this post-translational modification in apicomplexan parasites

    Genome-wide diversity and gene expression profiling of Babesia microti isolates identify polymorphic genes that mediate host-pathogen interactions

    Get PDF
    Babesia microti, a tick-transmitted, intraerythrocytic protozoan parasite circulating mainly among small mammals, is the primary cause of human babesiosis. While most cases are transmitted by Ixodes ticks, the disease may also be transmitted through blood transfusion and perinatally. A comprehensive analysis of genome composition, genetic diversity, and gene expression profiling of seven B. microti isolates revealed that genetic variation in isolates from the Northeast United States is almost exclusively associated with genes encoding the surface proteome and secretome of the parasite. Furthermore, we found that polymorphism is restricted to a small number of genes, which are highly expressed during infection. In order to identify pathogen-encoded factors involved in host-parasite interactions, we screened a proteome array comprised of 174 B. microti proteins, including several predicted members of the parasite secretome. Using this immuno-proteomic approach we identified several novel antigens that trigger strong host immune responses during the onset of infection. The genomic and immunological data presented herein provide the first insights into the determinants of B. microti interaction with its mammalian hosts and their relevance for understanding the selective pressures acting on parasite evolution

    Capture-based enrichment of Theileria parva DNA enables full genome assembly of first buffalo-derived strain and reveals exceptional intra-specific genetic diversity

    Get PDF
    Theileria parva is an economically important, intracellular, tick-transmitted parasite of cattle. A live vaccine against the parasite is effective against challenge from cattle-transmissible T. parva but not against genotypes originating from the African Cape buffalo, a major wildlife reservoir, prompting the need to characterize genome-wide variation within and between cattle- and buffalo-associated T. parva populations. Here, we describe a capture-based target enrichment approach that enables, for the first time, de novo assembly of nearly complete T. parva genomes derived from infected host cell lines. This approach has exceptionally high specificity and sensitivity and is successful for both cattle- and buffalo-derived T. parva parasites. De novo genome assemblies generated for cattle genotypes differ from the reference by ~54K single nucleotide polymorphisms (SNPs) throughout the 8.31 Mb genome, an average of 6.5 SNPs/kb. We report the first buffalo-derived T. parva genome, which is ~20 kb larger than the genome from the reference, cattle-derived, Muguga strain, and contains 25 new potential genes. The average non-synonymous nucleotide diversity (πN) per gene, between buffalo-derived T. parva and the Muguga strain, was 1.3%. This remarkably high level of genetic divergence is supported by an average Wright’s fixation index (FST), genome-wide, of 0.44, reflecting a degree of genetic differentiation between cattle- and buffalo-derived T. parva parasites more commonly seen between, rather than within, species. These findings present clear implications for vaccine development, further demonstrated by the ability to assemble nearly all known antigens in the buffalo-derived strain, which will be critical in design of next generation vaccines. The DNA capture approach used provides a clear advantage in specificity over alternative T. parva DNA enrichment methods used previously, such as those that utilize schizont purification, is less labor intensive, and enables in-depth comparative genomics in this apicomplexan parasite

    Rapid transcriptome sequencing of an invasive pest, the brown marmorated stink bug Halyomorpha halys

    Get PDF
    Halyomorpha halys (Stål) (Insecta:Hemiptera;Pentatomidae), commonly known as the Brown Marmorated Stink Bug (BMSB), is an invasive pest of the mid-Atlantic region of the United States, causing economically important damage to a wide range of crops. Native to Asia, BMSB was first observed in Allentown, PA, USA, in 1996, and this pest is now well-established throughout the US mid-Atlantic region and beyond. In addition to the serious threat BMSB poses to agriculture, BMSB has become a nuisance to homeowners, invading home gardens and congregating in large numbers in human-made structures, including homes, to overwinter. Despite its significance as an agricultural pest with limited control options, only 100 bp of BMSB sequence data was available in public databases when this project began. Transcriptome sequencing was undertaken to provide a molecular resource to the research community to inform the development of pest control strategies and to provide molecular data for population genetics studies of BMSB. Using normalized, strand-specific libraries, we sequenced pools of all BMSB life stages on the Illumina HiSeq. Trinity was used to assemble 200,000 putative transcripts in >100,000 components. A novel bioinformatic method that analyzed the strand-specificity of the data reduced this to 53,071 putative transcripts from 18,573 components. By integrating multiple other data types, we narrowed this further to 13,211 representative transcripts. Bacterial endosymbiont genes were identified in this dataset, some of which have a copy number consistent with being lateral gene transfers between endosymbiont genomes and Hemiptera, including ankyrin-repeat related proteins, lysozyme, and mannanase. Such genes and endosymbionts may provide novel targets for BMSB-specific biocontrol. This study demonstrates the utility of strand-specific sequencing in generating shotgun transcriptomes and that rapid sequencing shotgun transcriptomes is possible without the need for extensive inbreeding to generate homozygous lines. Such sequencing can provide a rapid response to pest invasions similar to that already described for disease epidemiology.https://doi.org/10.1186/1471-2164-15-73

    The Aspergillus Genome Database, a curated comparative genomics resource for gene, protein and sequence information for the Aspergillus research community

    Get PDF
    The Aspergillus Genome Database (AspGD) is an online genomics resource for researchers studying the genetics and molecular biology of the Aspergilli. AspGD combines high-quality manual curation of the experimental scientific literature examining the genetics and molecular biology of Aspergilli, cutting-edge comparative genomics approaches to iteratively refine and improve structural gene annotations across multiple Aspergillus species, and web-based research tools for accessing and exploring the data. All of these data are freely available at http://www.aspgd.org. We welcome feedback from users and the research community at [email protected]

    Horizontal gene transfer in Histophilus somni and its role in the evolution of pathogenic strain 2336, as determined by comparative genomic analyses

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Pneumonia and myocarditis are the most commonly reported diseases due to <it>Histophilus somni</it>, an opportunistic pathogen of the reproductive and respiratory tracts of cattle. Thus far only a few genes involved in metabolic and virulence functions have been identified and characterized in <it>H. somni </it>using traditional methods. Analyses of the genome sequences of several <it>Pasteurellaceae </it>species have provided insights into their biology and evolution. In view of the economic and ecological importance of <it>H. somni</it>, the genome sequence of pneumonia strain 2336 has been determined and compared to that of commensal strain 129Pt and other members of the <it>Pasteurellaceae</it>.</p> <p>Results</p> <p>The chromosome of strain 2336 (2,263,857 bp) contained 1,980 protein coding genes, whereas the chromosome of strain 129Pt (2,007,700 bp) contained only 1,792 protein coding genes. Although the chromosomes of the two strains differ in size, their average GC content, gene density (total number of genes predicted on the chromosome), and percentage of sequence (number of genes) that encodes proteins were similar. The chromosomes of these strains also contained a number of discrete prophage regions and genomic islands. One of the genomic islands in strain 2336 contained genes putatively involved in copper, zinc, and tetracycline resistance. Using the genome sequence data and comparative analyses with other members of the <it>Pasteurellaceae</it>, several <it>H. somni </it>genes that may encode proteins involved in virulence (<it>e.g</it>., filamentous haemaggutinins, adhesins, and polysaccharide biosynthesis/modification enzymes) were identified. The two strains contained a total of 17 ORFs that encode putative glycosyltransferases and some of these ORFs had characteristic simple sequence repeats within them. Most of the genes/loci common to both the strains were located in different regions of the two chromosomes and occurred in opposite orientations, indicating genome rearrangement since their divergence from a common ancestor.</p> <p>Conclusions</p> <p>Since the genome of strain 129Pt was ~256,000 bp smaller than that of strain 2336, these genomes provide yet another paradigm for studying evolutionary gene loss and/or gain in regard to virulence repertoire and pathogenic ability. Analyses of the complete genome sequences revealed that bacteriophage- and transposon-mediated horizontal gene transfer had occurred at several loci in the chromosomes of strains 2336 and 129Pt. It appears that these mobile genetic elements have played a major role in creating genomic diversity and phenotypic variability among the two <it>H. somni </it>strains.</p

    A framework for human microbiome research

    Get PDF
    A variety of microbial communities and their genes (the microbiome) exist throughout the human body, with fundamental roles in human health and disease. The National Institutes of Health (NIH)-funded Human Microbiome Project Consortium has established a population-scale framework to develop metagenomic protocols, resulting in a broad range of quality-controlled resources and data including standardized methods for creating, processing and interpreting distinct types of high-throughput metagenomic data available to the scientific community. Here we present resources from a population of 242 healthy adults sampled at 15 or 18 body sites up to three times, which have generated 5,177 microbial taxonomic profiles from 16S ribosomal RNA genes and over 3.5 terabases of metagenomic sequence so far. In parallel, approximately 800 reference strains isolated from the human body have been sequenced. Collectively, these data represent the largest resource describing the abundance and variety of the human microbiome, while providing a framework for current and future studies
    corecore