Abstract

The zebrafish genome, which consists of 25 linkage groups and is ~1.4Gb in size, is being sequenced, finished and analysed in its entirety at the Wellcome Trust Sanger Institute. The manual annotation is provided by the Human and Vertebrate Analysis and Annotation (HAVANA) group and is released at regular intervals onto the Vertebrate Genome Annotation (Vega) database ("http://vega.sanger.ac.uk":http://vega.sanger.ac.uk) and may be viewed as a DAS source in Ensembl ("http://www.ensembl.org/Danio_rerio":http://www.ensembl.org/Danio_rerio). 

Our annotation is compiled in close collaboration with the Zebrafish Information Network (ZFIN) ("http://zfin.org/":http://zfin.org/), which has enabled us to provide an accurate, dynamic and distinct resource for the zebrafish community as a whole.

Annotation is based on the reference genome sequence, which is derived from a minimal tile path assembly composed of clones that have been mapped, sequenced and meticulously finished to a sequence accuracy of over 99.9% per 100Kb. We expect to have 90% of the zebrafish genome to a finished standard by the end of 2009. Our approach to annotation uses two strategies. Firstly, the generation and annotation of gene lists comprising of cDNA (8995 in total) found in ZFIN that maps to our current reference assembly. And, secondly, by using clone by clone annotation, where we have annotated over 3200 genes, 1100 transcripts and 130 pseudogenes across 11 linkage groups and 3530 clones. As well as our on-going genome annotation we also welcome external annotation requests for specific genes and regions, which already include the annotation of 93 genes associated with human obesity and the scheduled annotation of the Major Histocompatability Complex, which will utilise reference sequence taken from libraries of a double haploid fish and complement our previous work on the human and mouse MHC already published.
 
External requests and any feedback, questions or requests can be sent to zfish-help [at] sanger.ac.uk

    Similar works