2 research outputs found
Gene and repetitive sequence annotation in the Triticeae
The Triticeae tribe contains some of the world’s most important agricultural crops (wheat, barley and rye) and is perhaps, one of the most challenging for genome annotation because Triticeae genomes are primarily composed of repetitive sequences. Further complicating the challenge is the polyploidy found in wheat and particularly in the hexaploid bread wheat genome. Genomic sequence data are available for the Triticeae in the form of large collections of Expressed Sequence Tags (>1.5 million) and an increasing number of bacterial artificial chromosome clone sequences. Given that high repetitive sequence content in the Triticeae confounds annotation of protein-coding genes, repetitive sequences have been identified, annotated, and collated into public databases. Protein coding genes in the Triticeae are structurally annotated using a combination of ab initio gene finders and experimental evidence. Functional annotation of protein coding genes involves assessment of sequence similarity to known proteins, expression evidence, and the presence of domain and motifs. Annotation methods and tools for Triticeae genomic sequences have been adapted from existing plant genome annotation projects and were designed to allow for flexibility of single sequence annotation while allowing a whole community annotation effort to be developed. With the availability of an increasing number of annotated grass genomes, comparative genomics can be exploited to accelerate and enhance the quality of Triticeae sequences annotation. This chapter provides a brief overview of the Triticeae genomes features that are challenging for genome annotation and describes the resources and methods available for sequence annotation with a particular emphasis on problems caused by the repetitive fraction of these genomes