Search CORE

6 research outputs found

Eval: A software package for analysis of genome annotations

Author: Brent Michael R
Keibler Evan
Publication venue: BioMed Central
Publication date: 01/01/2003
Field of study

SUMMARY: Eval is a flexible tool for analyzing the performance of gene annotation systems. It provides summaries and graphical distributions for many descriptive statistics about any set of annotations, regardless of their source. It also compares sets of predictions to standard annotations and to one another. Input is in the standard Gene Transfer Format (GTF). Eval can be run interactively or via the command line, in which case output options include easily parsable tab-delimited files. AVAILABILITY: To obtain the module package with documentation, go to and follow links for Resources, then Software. Please contact [email protected]

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Digital Commons@Becker

Comparison of mouse and human genomes followed by experimental verification yields an estimated 1,019 additional genes.

Author: Abril Ferrando Josep Francesc, 1970-
Agarwal Pankaj
Antonarakis Stylianos E
Brent Michael R.
Dermitzakis E.T.
Guigó Roderic
Keibler Evan
Lyle Robert
Parra Genís
Ponting Chris P
Reymond Alexandre
Ucla Catherine
Publication venue: 'Proceedings of the National Academy of Sciences'
Publication date: 26/01/2023
Field of study

A primary motivation for sequencing the mouse genome was to accelerate the discovery of mammalian genes by using sequence conservation between mouse and human to identify coding exons. Achieving this goal proved challenging because of the large proportion of the mouse and human genomes that is apparently conserved but apparently does not code for protein. We developed a two-stage procedure that exploits the mouse and human genome sequences to produce a set of genes with a much higher rate of experimental verification than previously reported prediction methods. RT-PCR amplification and direct sequencing applied to an initial sample of mouse predictions that do not overlap previously known genes verified the regions flanking one intron in 139 predictions, with verification rates reaching 76%. On average, the confirmed predictions show more restricted expression patterns than the mouse orthologs of known human genes, and two-thirds lack homologs in fish genomes, demonstrating the sensitivity of this dual-genome approach to hard-to-find genes. We verified 112 previously unknown homologs of known proteins, including two homeobox proteins relevant to developmental biology, an aquaporin, and a homolog of dystrophin. We estimate that transcription and splicing can be verified for >1,000 gene predictions identified by this method that do not overlap known genes. This is likely to constitute a significant fraction of the previously unknown, multiexon mammalian genes

Diposit Digital de la Universitat de Barcelona

The Treeterbi and Parallel Treeterbi algorithms: efficient, optimal decoding for ordinary, generalized and pair HMMs.

Author: Arumugam Manimozhiyan
Brent Michael R
Keibler Evan
Publication venue: 'Oxford University Press (OUP)'
Publication date: 01/03/2007
Field of study

Copenhagen University Research Information System

Sequence analysis The Treeterbi and Parallel Treeterbi algorithms: Efficient, optimal decoding for ordinary, generalized, and Pair HMMs

Author: Alex Bateman
Evan Keibler
Manimozhiyan Arumugam
Michael Brent
Publication venue
Publication date
Field of study

Motivation: Hidden Markov models (HMMs) and generalized HMMs been successfully applied to many problems, but the standard Viterbi Algorithm for computing the most probable interpretation of an input sequence (known as decoding) requires memory proportional to the length of the sequence, which can be prohibitive. Existing approaches to reducing memory usage either sacrifice optimality or trade increased running time for reduced memory. Results: We developed two novel decoding algorithms, Treeterbi and Parallel Treeterbi, and implemented them in the TWINSCAN/N-SCAN gene-prediction system. The worst case asymptotic space and time are the same as for standard Viterbi, but in practice, Treeterbi optimally decodes arbitrarily long sequences with generalized HMMs in bounded memory without increasing running time. Parallel Treeterbi uses the same ideas to split optimal decoding across processors, dividing latency to completion by approximately the number of available processors with constant average overhead per processor. Using these algorithms, we were able to optimally decode all human chromosomes with N-SCAN, which increased its accuracy relative to heuristic solutions. We also implemented Treeterbi for Pairagon, our pair HMM based cDNA-to-genome aligner. Availability: The TWINSCAN/N-SCAN/PAIRAGON open source software package is available fro

CiteSeerX

Leveraging the Mouse Genome for Gene Prediction in Human: From Whole-Genome Shotgun Reads to a Global Synteny Map

Author: Brent Michael R.
Flicek Paul
Hu Ping
Keibler Evan
Korf Ian
Publication venue: Cold Spring Harbor Laboratory Press
Publication date: 01/01/2003
Field of study

The availability of draft sequences for both the mouse and human genomes makes it possible, for the first time, to annotate whole mammalian genomes using comparative methods. TWINSCAN is a gene-prediction system that combines the methods of single-genome predictors like GENSCAN with information derived from genome comparison, thereby improving accuracy. Because TWINSCAN uses genomic sequence only, it is less biased toward highly and/or ubiquitously expressed genes than GENEWISE, GENOMESCAN, and other methods based on evidence derived from transcripts. We show that TWINSCAN improves gene prediction in human using intermediate products from various stages of the sequencing and analysis of the mouse genome, from low-redundancy, whole-genome shotgun reads to the draft assembly and the synteny map. TWINSCAN improves on the prior state of the art even when alignments from only 1X coverage of the mouse genome are available. Gene prediction accuracy improves steadily from 1X through 3X, more slowly from 3X to 4X, and relatively little thereafter. The assembly and the synteny map greatly speed the computations, however. Our human annotation using the mouse assembly is conservative, predicting only 25,622 genes, and appears to be one of the best de novo annotations of the human genome to date

Crossref

PubMed Central

BIOINFORMATICS ORIGINAL PAPER doi:10.1093/bioinformatics/btl659 Sequence analysis The Treeterbi and Parallel Treeterbi algorithms: efficient, optimal

Author: Evan Keibler
Manimozhiyan Arumugam
Michael R. Brent
Pair Hmms
Publication venue
Publication date
Field of study

Motivation: Hidden Markov models (HMMs) and generalized HMMs been successfully applied to many problems, but the standard Viterbi algorithm for computing the most probable interpretation of an input sequence (known as decoding) requires memory proportional to the length of the sequence, which can be prohibitive. Existing approaches to reducing memory usage either sacrifice optimality or trade increased running time for reduced memory. Results: We developed two novel decoding algorithms, Treeterbi and Parallel Treeterbi, and implemented them in the TWINSCAN/ N-SCAN gene-prediction system. The worst case asymptotic space and time are the same as for standard Viterbi, but in practice, Treeterbi optimally decodes arbitrarily long sequences with generalized HMMs in bounded memory without increasing running time. Parallel Treeterbi uses the same ideas to split optimal decoding across processors, dividing latency to completion by approximately the number of available processors with constant average overhead per processor. Using these algorithms, we were able to optimally decode all human chromosomes with N-SCAN, which increased its accuracy relative to heuristic solutions. We also implemented Treeterbi for Pairagon, our pair HMM based cDNA-to-genome aligner. Availability: The TWINSCAN/N-SCAN/PAIRAGON open source software package is available fro

CiteSeerX