Search CORE

23,488 research outputs found

A quick guide for student-driven community genome annotation

Author: Benoit Joshua B.
Brown Susan J.
D'elia Tom
Flores Mirella
Hosmani Prashant S.
Miller Sherry
Mueller Lukas A.
Munoz-Torres Monica
Saha Surya
Shippy Teresa
Wiersma-Koch Helen
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 16/10/2018
Field of study

High quality gene models are necessary to expand the molecular and genetic tools available for a target organism, but these are available for only a handful of model organisms that have undergone extensive curation and experimental validation over the course of many years. The majority of gene models present in biological databases today have been identified in draft genome assemblies using automated annotation pipelines that are frequently based on orthologs from distantly related model organisms. Manual curation is time consuming and often requires substantial expertise, but is instrumental in improving gene model structure and identification. Manual annotation may seem to be a daunting and cost-prohibitive task for small research communities but involving undergraduates in community genome annotation consortiums can be mutually beneficial for both education and improved genomic resources. We outline a workflow for efficient manual annotation driven by a team of primarily undergraduate annotators. This model can be scaled to large teams and includes quality control processes through incremental evaluation. Moreover, it gives students an opportunity to increase their understanding of genome biology and to participate in scientific research in collaboration with peers and senior researchers at multiple institutions

arXiv.org e-Print Archive

Directory of Open Access Journals

eScholarship - University of California

FigShare

Are we there yet? : reliably estimating the completeness of plant genome sequences

Author: Ruttink Tom
Vandepoele Klaas
Veeckman Elisabeth
Publication venue: 'American Society of Plant Biologists (ASPB)'
Publication date: 01/01/2016
Field of study

Genome sequencing is becoming cheaper and faster thanks to the introduction of next-generation sequencing techniques. Dozens of new plant genome sequences have been released in recent years, ranging from small to gigantic repeat-rich or polyploid genomes. Most genome projects have a dual purpose: delivering a contiguous, complete genome assembly and creating a full catalog of correctly predicted genes. Frequently, the completeness of a species' gene catalog is measured using a set of marker genes that are expected to be present. This expectation can be defined along an evolutionary gradient, ranging from highly conserved genes to species-specific genes. Large-scale population resequencing studies have revealed that gene space is fairly variable even between closely related individuals, which limits the definition of the expected gene space, and, consequently, the accuracy of estimates used to assess genome and gene space completeness. We argue that, based on the desired applications of a genome sequencing project, different completeness scores for the genome assembly and/or gene space should be determined. Using examples from several dicot and monocot genomes, we outline some pitfalls and recommendations regarding methods to estimate completeness during different steps of genome assembly and annotation

Ghent University Academic Bibliography

PubMed Central

HiTRACE: High-throughput robust analysis for capillary electrophoresis

Author: Bylund
Cormen
Cover
Das
Das
Deigan
Ewing
Ewing
Hanjoo Kim
Jinkyu Kim
Justine Hum
Kay
Kazmi
Kladwang
Kladwang
Laederach
Levenberg
Marquardt
Merino
Mitra
Nielsen
Oppenheim
Peattie
Pravdova
Rhiju Das
Robinson
Ruiz-Martinez
Seunghyun Park
Sungroh Yoon
Tijerina
Tomasi
Vasa
Walczak
Watts
Weeks
Wilkinson
Wipapat Kladwang
Wong
Woolley
Xi
Publication venue: 'Oxford University Press (OUP)'
Publication date: 01/01/2011
Field of study

Motivation: Capillary electrophoresis (CE) of nucleic acids is a workhorse technology underlying high-throughput genome analysis and large-scale chemical mapping for nucleic acid structural inference. Despite the wide availability of CE-based instruments, there remain challenges in leveraging their full power for quantitative analysis of RNA and DNA structure, thermodynamics, and kinetics. In particular, the slow rate and poor automation of available analysis tools have bottlenecked a new generation of studies involving hundreds of CE profiles per experiment. Results: We propose a computational method called high-throughput robust analysis for capillary electrophoresis (HiTRACE) to automate the key tasks in large-scale nucleic acid CE analysis, including the profile alignment that has heretofore been a rate-limiting step in the highest throughput experiments. We illustrate the application of HiTRACE on thirteen data sets representing 4 different RNAs, three chemical modification strategies, and up to 480 single mutant variants; the largest data sets each include 87,360 bands. By applying a series of robust dynamic programming algorithms, HiTRACE outperforms prior tools in terms of alignment and fitting quality, as assessed by measures including the correlation between quantified band intensities between replicate data sets. Furthermore, while the smallest of these data sets required 7 to 10 hours of manual intervention using prior approaches, HiTRACE quantitation of even the largest data sets herein was achieved in 3 to 12 minutes. The HiTRACE method therefore resolves a critical barrier to the efficient and accurate analysis of nucleic acid structure in experiments involving tens of thousands of electrophoretic bands.Comment: Revised to include Supplement. Availability: HiTRACE is freely available for download at http://hitrace.stanford.ed

arXiv.org e-Print Archive

CiteSeerX

Crossref

The Chlamydomonas genome project: A decade on

Author: Aksoy M
Blaby IK
Blaby-Haas CE
Dutcher S
Goodstein D
Grimwood J
Grossman A
Harris EH
Hom EFY
King S
Lopez D
Merchant SS
Porter M
Prochnik S
Schmutz J
Stanke M
Tourasse N
Umen J
Vallon O
Witman GB
Publication venue: eScholarship, University of California
Publication date: 01/10/2014
Field of study

The green alga Chlamydomonas reinhardtii is a popular unicellular organism for studying photosynthesis, cilia biogenesis, and micronutrient homeostasis. Ten years since its genome project was initiated an iterative process of improvements to the genome and gene predictions has propelled this organism to the forefront of the omics era. Housed at Phytozome, the plant genomics portal of the Joint Genome Institute (JGI), the most up-to-date genomic data include a genome arranged on chromosomes and high-quality gene models with alternative splice forms supported by an abundance of whole transcriptome sequencing (RNA-Seq) data. We present here the past, present, and future of Chlamydomonas genomics. Specifically, we detail progress on genome assembly and gene model refinement, discuss resources for gene annotations, functional predictions, and locus ID mapping between versions and, importantly, outline a standardized framework for naming genes

PubMed Central

eScholarship - University of California