We have developed a novel method for estimating the parameters of hidden Markov models for gene finding in newly sequenced species. Our approach does not rely on curated training data sets, but instead uses extrinsic evidence (including paired-end ditags that have not been used in gene finding previously) and iterative training. This new method is particularly suitable for annotation of species with large evolutionary distance to the closest annotated species. We have used our approach to produce an initial annotation of more than 16 000 genes in the newly sequenced Schistosoma japonicum draft genome. We established the high quality of our predictions by comparison to full-length cDNAs (withdrawn from the extrinsic evidence) and to CEGMA core genes. We also evaluated the effectiveness of the new training procedure on Caenorhabditis elegans genome. ExonHunter and the newest parametric files for S. japonicum genome are available for download at www.bioinformatics.uwaterloo.ca/downloads/exonhunte
To submit an update or takedown request for this paper, please submit an Update/Correction/Removal Request.