Search CORE

34 research outputs found

Global comparative analysis of ESTs from the southern cattle tick, Rhipicephalus (Boophilus) microplus

Author: Guerrero Felix D
Nene Vishvanath M
Pertea Geo
Wang Minghua
Publication venue: BioMed Central
Publication date: 01/01/2007
Field of study

Abstract Background The southern cattle tick, <it>Rhipicephalus (Boophilus) microplus</it>, is an economically important parasite of cattle and can transmit several pathogenic microorganisms to its cattle host during the feeding process. Understanding the biology and genomics of <it>R. microplus </it>is critical to developing novel methods for controlling these ticks. Results We present a global comparative genomic analysis of a gene index of <it>R. microplus </it>comprised of 13,643 unique transcripts assembled from 42,512 expressed sequence tags (ESTs), a significant fraction of the complement of <it>R. microplus </it>genes. The source material for these ESTs consisted of polyA RNA from various tissues, lifestages, and strains of <it>R. microplus</it>, including larvae exposed to heat, cold, host odor, and acaricide. Functional annotation using RPS-Blast analysis identified conserved protein domains in the conceptually translated gene index and assigned GO terms to those database transcripts which had informative BlastX hits. Blast Score Ratio and SimiTri analysis compared the conceptual transcriptome of the <it>R. microplus </it>database to other eukaryotic proteomes and EST databases, including those from 3 ticks. The most abundant protein domains in BmiGI were also analyzed by SimiTri methodology. Conclusion These results indicate that a large fraction of BmiGI entries have no homologs in other sequenced genomes. Analysis with the PartiGene annotation pipeline showed 64% of the members of BmiGI could not be assigned GO annotation, thus minimal information is available about a significant fraction of the tick genome. This highlights the important insights in tick biology which are likely to result from a tick genome sequencing project. Global comparative analysis identified some tick genes with unexpected phylogenetic relationships which detailed analysis attributed to gene losses in some members of the animal kingdom. Some tick genes were identified which had close orthologues to mammalian genes. Members of this group would likely be poor choices as targets for development of novel tick control technology.</p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Recommended from our members

TopHat2: accurate alignment of transcriptomes in the presence of insertions, deletions and gene fusions

Author: Kelley Ryan
Kim Daehwan
Pertea Geo
Pimentel Harold
Salzberg Steven L
Trapnell Cole
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2013
Field of study

TopHat is a popular spliced aligner for RNA-sequence (RNA-seq) experiments. In this paper, we describe TopHat2, which incorporates many significant enhancements to TopHat. TopHat2 can align reads of various lengths produced by the latest sequencing technologies, while allowing for variable-length indels with respect to the reference genome. In addition to de novo spliced alignment, TopHat2 can align reads across fusion breaks, which can occur after genomic translocations. TopHat2 combines the ability to identify novel splice sites with direct mapping to known transcripts, producing sensitive and accurate alignments, even for highly repetitive genomes or in the presence of pseudogenes. TopHat2 is available at http://ccb.jhu.edu/software/tophat

Harvard University - DASH

Springer - Publisher Connector

PubMed Central

eScholarship - University of California

Digital Repository at the University of Maryland

Genome assembly and characterization of a complex zfBED-NLR gene-containing disease resistance locus in Carolina Gold Select rice with Nanopore sequencing

Author: Bogdanove Adam
Leach Jan
Meyer Rachel
Moscou Matthew
Pertea Geo
Purugganan Michael
Read Andrew
Salzberg Steven L.
Triplett Lindsay
Zimin Aleksey
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 01/01/2020
Field of study

Long-read sequencing facilitates assembly of complex genomic regions. In plants, loci containing nucleotide-binding, leucine-rich repeat (NLR) disease resistance genes are an important example of such regions. NLR genes constitute one of the largest gene families in plants and are often clustered, evolving via duplication, contraction, and transposition. We recently mapped the Xo1 locus for resistance to bacterial blight and bacterial leaf streak, found in the American heirloom rice variety Carolina Gold Select, to a region that in the Nipponbare reference genome is NLR gene-rich. Here, toward identification of the Xo1 gene, we combined Nanopore and Illumina reads and generated a high-quality Carolina Gold Select genome assembly. We identified 529 complete or partial NLR genes and discovered, relative to Nipponbare, an expansion of NLR genes at the Xo1 locus. One of these has high sequence similarity to the cloned, functionally similar Xa1 gene. Both harbor an integrated zfBED domain, and the repeats within each protein are nearly perfect. Across diverse Oryzeae, we identified two sub-clades of NLR genes with these features, varying in the presence of the zfBED domain and the number of repeats. The Carolina Gold Select genome assembly also uncovered at the Xo1 locus a rice blast resistance gene and a gene encoding a polyphenol oxidase (PPO). PPO activity has been used as a marker for blast resistance at the locus in some varieties; however, the Carolina Gold Select sequence revealed a loss-of-function mutation in the PPO gene that breaks this association. Our results demonstrate that whole genome sequencing combining Nanopore and Illumina reads effectively resolves NLR gene loci. Our identification of an Xo1 candidate is an important step toward mechanistic characterization, including the role(s) of the zfBED domain. Finally, the Carolina Gold Select genome assembly will facilitate identification of other useful traits in this historically important variety. Author summary Plants lack adaptive immunity, and instead contain repeat-rich, disease resistance genes that evolve rapidly through duplication, recombination, and transposition. The number, variation, and often clustered arrangement of these genes make them challenging to sequence and catalog. The US heirloom rice variety Carolina Gold Select has resistance to two important bacterial diseases. Toward identifying the responsible gene(s), we combined long- and short-read sequencing technologies to assemble the whole genome and identify the resistance gene repertoire. We previously narrowed the location of the gene(s) to a region on chromosome four. The region in Carolina Gold Select is larger than in the rice reference genome (Nipponbare) and contains twice as many resistance genes. One shares unusual features with a known bacterial disease resistance gene, suggesting that it confers the resistance. Across diverse varieties and related species, we identified two widely-distributed groups of such genes. The results are an important step toward mechanistic characterization and deployment of the bacterial disease resistance. The genome assembly also identified a resistance gene for a fungal disease and predicted a marker phenotype used in breeding for resistance. Thus, the Carolina Gold Select genome assembly can be expected to aid in the identification and deployment of other valuable traits

Directory of Open Access Journals

University of East Anglia digital repository

Differential gene and transcript expression analysis of RNA-seq experiments with TopHat and Cufflinks

Author: Goff Loyal
Kelley David R.
Kim Daehwan
Pachter Lior
Pertea Geo
Pimentel Harold
Rinn John L.
Roberts Adam
Salzberg Steven L.
Trapnell Cole
Publication venue: Nature Publishing Group
Publication date: 01/03/2012
Field of study

Recent advances in high-throughput cDNA sequencing (RNA-seq) can reveal new genes and splice variants and quantify expression genome-wide in a single assay. The volume and complexity of data from RNA-seq experiments necessitate scalable, fast and mathematically principled analysis software. TopHat and Cufflinks are free, open-source software tools for gene discovery and comprehensive expression analysis of high-throughput mRNA sequencing (RNA-seq) data. Together, they allow biologists to identify new genes and new splice variants of known ones, as well as compare gene and transcript expression under two or more conditions. This protocol describes in detail how to use TopHat and Cufflinks to perform such analyses. It also covers several accessory tools and utilities that aid in managing data, including CummeRbund, a tool for visualizing RNA-seq analysis results. Although the procedure assumes basic informatics skills, these tools assume little to no background with RNA-seq analysis and are meant for novices and experts alike. The protocol begins with raw sequencing reads and produces a transcriptome assembly, lists of differentially expressed and regulated genes and transcripts, and publication-quality visualizations of analysis results. The protocol's execution time depends on the volume of transcriptome sequencing data and available computing resources but takes less than 1 d of computer time for typical experiments and ~1 h of hands-on time

PubMed Central

Caltech Authors

The TIGR Maize Database

Author: Barbazuk William Brad
Bennetzen Jeffrey
Chan Agnes P.
Cheung Foo
Lee Dan
Pertea Geo
Pontaroli Ana C.
Quackenbush John
Rabinowicz Pablo D.
SanMiguel Phillip
Whitelaw Cathy
Yuan Yinan
Zheng Li
Publication venue: Oxford University Press
Publication date: 28/12/2005
Field of study

Maize is a staple crop of the grass family and also an excellent model for plant genetics. Owing to the large size and repetitiveness of its genome, we previously investigated two approaches to accelerate gene discovery and genome analysis in maize: methylation filtration and high C(0)t selection. These techniques allow the construction of gene-enriched genomic libraries by minimizing repeat sequences due to either their methylation status or their copy number, yielding a 7-fold enrichment in genic sequences relative to a random genomic library. Approximately 900 000 gene-enriched reads from maize were generated and clustered into Assembled Zea mays (AZM) sequences. Here we report the current AZM release, which consists of ∼298 Mb representing 243 807 sequence assemblies and singletons. In order to provide a repository of publicly available maize genomic sequences, we have created the TIGR Maize Database (). In this resource, we have assembled and annotated the AZMs and used available sequenced markers to anchor AZMs to maize chromosomes. We have constructed a maize repeat database and generated draft sequence assemblies of 287 maize bacterial artificial chromosome (BAC) clone sequences, which we annotated along with 172 additional publicly available BAC clones. All sequences, assemblies and annotations are available at the project website via web interfaces and FTP downloads

Crossref

PubMed Central

The Douglas-Fir Genome Sequence Reveals Specialization of the Photosynthetic Apparatus in Pinaceae.

Author: Cardeno Charis
Casola Claudio
Crepeau Marc W
Cronn Richard
Gonzalez-Ibeas Daniel
Holt Carson
Koralewski Tomasz E
Langley Charles H
McGuire Patrick E
Neale David B
Paul Robin
Pertea Geo M
Puiu Daniela
Salzberg Steven L
Sezen U Uzay
Stevens Kristian A
Wegrzyn Jill L
Wheeler Nicholas C
Yandell Mark
Yorke James A
Zaman Sumaira
Zimin Aleksey V
Publication venue: eScholarship, University of California
Publication date: 01/09/2017
Field of study

A reference genome sequence for Pseudotsuga menziesii var. menziesii (Mirb.) Franco (Coastal Douglas-fir) is reported, thus providing a reference sequence for a third genus of the family Pinaceae. The contiguity and quality of the genome assembly far exceeds that of other conifer reference genome sequences (contig N50 = 44,136 bp and scaffold N50 = 340,704 bp). Incremental improvements in sequencing and assembly technologies are in part responsible for the higher quality reference genome, but it may also be due to a slightly lower exact repeat content in Douglas-fir vs. pine and spruce. Comparative genome annotation with angiosperm species reveals gene-family expansion and contraction in Douglas-fir and other conifers which may account for some of the major morphological and physiological differences between the two major plant groups. Notable differences in the size of the NDH-complex gene family and genes underlying the functional basis of shade tolerance/intolerance were observed. This reference genome sequence not only provides an important resource for Douglas-fir breeders and geneticists but also sheds additional light on the evolutionary processes that have led to the divergence of modern angiosperms from the more ancient gymnosperms

Directory of Open Access Journals

eScholarship - University of California

Transcript assembly and quantification by RNA-Seq reveals unannotated transcripts and isoform switching during cell differentiation

Author: Kwan Gordon
Mortazavi Ali
Pachter Lior
Pertea Geo
Salzberg Steven L.
Trapnell Cole
van Baren Marijke J.
Williams Brian A.
Wold Barbara J.
Publication venue: Nature Publishing Group
Publication date: 01/05/2010
Field of study

High-throughput mRNA sequencing (RNA-Seq) promises simultaneous transcript discovery and abundance estimation. However, this would require algorithms that are not restricted by prior gene annotations and that account for alternative transcription and splicing. Here we introduce such algorithms in an open-source software program called Cufflinks. To test Cufflinks, we sequenced and analyzed >430 million paired 75-bp RNA-Seq reads from a mouse myoblast cell line over a differentiation time series. We detected 13,692 known transcripts and 3,724 previously unannotated ones, 62% of which are supported by independent expression data or by homologous genes in other species. Over the time series, 330 genes showed complete switches in the dominant transcription start site (TSS) or splice isoform, and we observed more subtle shifts in 1,304 other genes. These results suggest that Cufflinks can illuminate the substantial regulatory flexibility and complexity in even this well-studied model of muscle development and that it can improve transcriptome-based genome annotation

Detection of lineage-specific evolutionary changes among primate species

Abstract Background Comparison of the human genome with other primates offers the opportunity to detect evolutionary events that created the diverse phenotypes among the primate species. Because the primate genomes are highly similar to one another, methods developed for analysis of more divergent species do not always detect signs of evolutionary selection. Results We have developed a new method, called DivE, specifically designed to find regions that have evolved either more or less rapidly than expected, for any clade within a set of very closely related species. Unlike some previous methods, DivE does not rely on rates of synonymous and nonsynonymous substitution, which enables it to detect evolutionary events in noncoding regions. We demonstrate using simulated data that DivE compares favorably to alternative methods, and we then apply DivE to the ENCODE regions in 14 primate species. We identify thousands of regions in these primates, ranging from 50 to >10000 bp in length, that appear to have experienced either constrained or accelerated rates of evolution. In particular, we detected 4942 regions that have potentially undergone positive selection in one or more primate species. Most of these regions occur outside of protein-coding genes, although we identified 20 proteins that have experienced positive selection. Conclusions DivE provides an easy-to-use method to predict both positive and negative selection in noncoding DNA, that is particularly well-suited to detecting lineage-specific selection in large genomes.</p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Digital Repository at the University of Maryland