Search CORE

114 research outputs found

Genic regions of a large salamander genome contain long introns and novel genes

Author: Bryant Susan V
Gardiner David M
Harkins Timothy T
Hunter Tony
Pao Gerald M
Putta Srikrishna
Smith Jeramiah J
Verma Inder M
Voss S Randal
Zhu Wei
Publication venue: BioMed Central
Publication date: 01/01/2009
Field of study

BACKGROUND: The basis of genome size variation remains an outstanding question because DNA sequence data are lacking for organisms with large genomes. Sixteen BAC clones from the Mexican axolotl (Ambystoma mexicanum: c-value = 32 x 10(9) bp) were isolated and sequenced to characterize the structure of genic regions. RESULTS: Annotation of genes within BACs showed that axolotl introns are on average 10x longer than orthologous vertebrate introns and they are predicted to contain more functional elements, including miRNAs and snoRNAs. Loci were discovered within BACs for two novel EST transcripts that are differentially expressed during spinal cord regeneration and skin metamorphosis. Unexpectedly, a third novel gene was also discovered while manually annotating BACs. Analysis of human-axolotl protein-coding sequences suggests there are 2% more lineage specific genes in the axolotl genome than the human genome, but the great majority (86%) of genes between axolotl and human are predicted to be 1:1 orthologs. Considering that axolotl genes are on average 5x larger than human genes, the genic component of the salamander genome is estimated to be incredibly large, approximately 2.8 gigabases! CONCLUSION: This study shows that a large salamander genome has a correspondingly large genic component, primarily because genes have incredibly long introns. These intronic sequences may harbor novel coding and non-coding sequences that regulate biological processes that are unique to salamanders

Crossref

Springer - Publisher Connector

PubMed Central

University of Kentucky

eScholarship - University of California

Transcriptome sequencing of the Microarray Quality Control (MAQC) RNA reference samples using next generation sequencing

Author: Clive Evans
Danielle Thierry-Mieg
Jean Thierry-Mieg
Kristal L Cooper
Mane Shrinivasrao
Oswald R Crasta
Otto Folkerts
Roderick V Jensen
Shrinivasrao P Mane
Stephen K Hutchison
Timothy T Harkins
Publication venue: BioMed Central
Publication date: 01/01/2009
Field of study

Abstract Background Transcriptome sequencing using next-generation sequencing platforms will soon be competing with DNA microarray technologies for global gene expression analysis. As a preliminary evaluation of these promising technologies, we performed deep sequencing of cDNA synthesized from the Microarray Quality Control (MAQC) reference RNA samples using Roche's 454 Genome Sequencer FLX. Results We generated more that 3.6 million sequence reads of average length 250 bp for the MAQC A and B samples and introduced a data analysis pipeline for translating cDNA read counts into gene expression levels. Using BLAST, 90% of the reads mapped to the human genome and 64% of the reads mapped to the RefSeq database of well annotated genes with e-values ≤ 10-20. We measured gene expression levels in the A and B samples by counting the numbers of reads that mapped to individual RefSeq genes in multiple sequencing runs to evaluate the MAQC quality metrics for reproducibility, sensitivity, specificity, and accuracy and compared the results with DNA microarrays and Quantitative RT-PCR (QRTPCR) from the MAQC studies. In addition, 88% of the reads were successfully aligned directly to the human genome using the AceView alignment programs with an average 90% sequence similarity to identify 137,899 unique exon junctions, including 22,193 new exon junctions not yet contained in the RefSeq database. Conclusion Using the MAQC metrics for evaluating the performance of gene expression platforms, the ExpressSeq results for gene expression levels showed excellent reproducibility, sensitivity, and specificity that improved systematically with increasing shotgun sequencing depth, and quantitative accuracy that was comparable to DNA microarrays and QRTPCR. In addition, a careful mapping of the reads to the genome using the AceView alignment programs shed new light on the complexity of the human transcriptome including the discovery of thousands of new splice variants.</p

Crossref

Springer

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Improving mammalian genome scaffolding using large insert mate-pair next-generation sequencing

Author: Cuppen Edwin
Goldstein Steve
Guryev Victor
Harkins Timothy T.
Kloosterman Wigard P.
Lansu Nico
Lee Clarence C.
Levandowsky Elizabeth
Ruzius Frans-Paul
Schwartz David C.
van Heesch Sebastiaan
Zhou Shiguo
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2013
Field of study

BACKGROUND: Paired-tag sequencing approaches are commonly used for the analysis of genome structure. However, mammalian genomes have a complex organization with a variety of repetitive elements that complicate comprehensive genome-wide analyses. RESULTS: Here, we systematically assessed the utility of paired-end and mate-pair (MP) next-generation sequencing libraries with insert sizes ranging from 170 bp to 25 kb, for genome coverage and for improving scaffolding of a mammalian genome (Rattus norvegicus). Despite a lower library complexity, large insert MP libraries (20 or 25 kb) provided very high physical genome coverage and were found to efficiently span repeat elements in the genome. Medium-sized (5, 8 or 15 kb) MP libraries were much more efficient for genome structure analysis than the more commonly used shorter insert paired-end and 3 kb MP libraries. Furthermore, the combination of medium- and large insert libraries resulted in a 3-fold increase in N50 in scaffolding processes. Finally, we show that our data can be used to evaluate and improve contig order and orientation in the current rat reference genome assembly. CONCLUSIONS: We conclude that applying combinations of mate-pair libraries with insert sizes that match the distributions of repetitive elements improves contig scaffolding and can contribute to the finishing of draft genomes

Proceedings - University of Groningen

Crossref

University of Groningen

Springer - Publisher Connector

ARTS repository - University of Groningen

PubMed Central

Dissertations of the University of Groningen

Exome Sequencing of a Multigenerational Human Pedigree

Author: Benjamin Boese
Cherylyn Almonte
Dale Hedges
Dan Burges
Eden Martin
Eric Powell
Jia Huang
Margaret A. Pericak-Vance
Mark A. Batzer
Mike Schmidt
Stephan Züchner
Stuart Young
Timothy T. Harkins
Xinmin Zhang
Publication venue: Public Library of Science
Publication date: 01/01/2009
Field of study

Over the next few years, the efficient use of next-generation sequencing (NGS) in human genetics research will depend heavily upon the effective mechanisms for the selective enrichment of genomic regions of interest. Recently, comprehensive exome capture arrays have become available for targeting approximately 33 Mb or ∼180,000 coding exons across the human genome. Selective genomic enrichment of the human exome offers an attractive option for new experimental designs aiming to quickly identify potential disease-associated genetic variants, especially in family-based studies. We have evaluated a 2.1 M feature human exome capture array on eight individuals from a three-generation family pedigree. We were able to cover up to 98% of the targeted bases at a long-read sequence read depth of ≥3, 86% at a read depth of ≥10, and over 50% of all targets were covered with ≥20 reads. We identified up to 14,284 SNPs and small indels per individual exome, with up to 1,679 of these representing putative novel polymorphisms. Applying the conservative genotype calling approach HCDiff, the average rate of detection of a variant allele based on Illumina 1 M BeadChips genotypes was 95.2% at ≥10x sequence. Further, we propose an advantageous genotype calling strategy for low covered targets that empirically determines cut-off thresholds at a given coverage depth based on existing genotype data. Application of this method was able to detect >99% of SNPs covered ≥8x. Our results offer guidance for “real-world” applications in human genetics and provide further evidence that microarray-based exome capture is an efficient and reliable method to enrich for chromosomal regions of interest in next-generation sequencing experiments

CiteSeerX

Public Library of Science (PLOS)

Crossref

Directory of Open Access Journals

PubMed Central

University of Miami: Scholarship Miami

Assessing the feasibility of GS FLX Pyrosequencing for sequencing the Atlantic salmon genome

Author: Boroevich Keith A
Bouffard Pascal
Chow William
Davidson William S
Desany Brian A
Harkins Timothy T
Jarvie Thomas P
Knight James R
Koop Ben F
Levenkova Natasha
Lubieniecki Krzysztof P
Quinn Nicole L
Publication venue: BioMed Central
Publication date: 01/01/2008
Field of study

Abstract Background With a whole genome duplication event and wealth of biological data, salmonids are excellent model organisms for studying evolutionary processes, fates of duplicated genes and genetic and physiological processes associated with complex behavioral phenotypes. It is surprising therefore, that no salmonid genome has been sequenced. Atlantic salmon (<it>Salmo salar</it>) is a good representative salmonid for sequencing given its importance in aquaculture and the genomic resources available. However, the size and complexity of the genome combined with the lack of a sequenced reference genome from a closely related fish makes assembly challenging. Given the cost and time limitations of Sanger sequencing as well as recent improvements to next generation sequencing technologies, we examined the feasibility of using the Genome Sequencer (GS) FLX pyrosequencing system to obtain the sequence of a salmonid genome. Eight pooled BACs belonging to a minimum tiling path covering ~1 Mb of the Atlantic salmon genome were sequenced by GS FLX shotgun and Long Paired End sequencing and compared with a ninth BAC sequenced by Sanger sequencing of a shotgun library. Results An initial assembly using only GS FLX shotgun sequences (average read length 248.5 bp) with ~30× coverage allowed gene identification, but was incomplete even when 126 Sanger-generated BAC-end sequences (~0.09× coverage) were incorporated. The addition of paired end sequencing reads (additional ~26× coverage) produced a final assembly comprising 175 contigs assembled into four scaffolds with 171 gaps. Sanger sequencing of the ninth BAC (~10.5× coverage) produced nine contigs and two scaffolds. The number of scaffolds produced by the GS FLX assembly was comparable to Sanger-generated sequencing; however, the number of gaps was much higher in the GS FLX assembly. Conclusion These results represent the first use of GS FLX paired end reads for <it>de novo </it>sequence assembly. Our data demonstrated that this improved the GS FLX assemblies; however, with respect to <it>de novo </it>sequencing of complex genomes, the GS FLX technology is limited to gene mining and establishing a set of ordered sequence contigs. Currently, for a salmonid reference sequence, it appears that a substantial portion of sequencing should be done using Sanger technology.</p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Simon Fraser University Institutional Repository

Large‐Scale Discovery of Gene‐Enriched SNPs

Author: Apurva Narechania
Bonnie L. Hurwitz
Doreen H. Ware
Edward S. Buckler
Edward S. Szekeres
Elhan S. Ersoz
George S. Grills
Mark H. Wright
Michael A. Gore
Pascal Bouffard
Rabinowicz P.D.
Thomas P. Jarvie
Timothy T. Harkins
Publication venue: 'Crop Science Society of America'
Publication date
Field of study

Crossref

Genome sequencing highlights the dynamic early history of dogs

Author: Adam H Freedman
Adam R Boyko
Adam Siepel
Alan Wilton
Belen Lorente-Galdos
Can Alkan
Carles Vilà
Carlos D Bustamante
Clarence Lee
Diego Ortega-Del Vecchyo
Elaine A Ostrander
Elaine A Ostrander
Eli Geffen
Eunjung Han
Farhad Hormozdiari
Heidi G Parker
Holly Beale
Ilan Gronau
John Novembre
Josip Kusak
Kevin Squire
Marco Galaverni
Oscar Ramirez
Pedro M Silva
Peter Marx
Rena M Schweizer
Robert K Wayne
Stanley F Nelson
Timothy T Harkins
Tomas Marques-Bonet
Vasisht Tadigotla
Zhenxin Fan
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 01/01/2014
Field of study

To identify genetic changes underlying dog domestication and reconstruct their early evolutionary history, we generated high-quality genome sequences from three gray wolves, one from each of the three putative centers of dog domestication, two basal dog lineages (Basenji and Dingo) and a golden jackal as an outgroup. Analysis of these sequences supports a demographic model in which dogs and wolves diverged through a dynamic process involving population bottlenecks in both lineages and post-divergence gene flow. In dogs, the domestication bottleneck involved at least a 16-fold reduction in population size, a much more severe bottleneck than estimated previously. A sharp bottleneck in wolves occurred soon after their divergence from dogs, implying that the pool of diversity from which dogs arose was substantially larger than represented by modern wolf populations. We narrow the plausible range for the date of initial dog domestication to an interval spanning 11-16 thousand years ago, predating the rise of agriculture. In light of this finding, we expand upon previous work regarding the increase in copy number of the amylase gene (AMY2B) in dogs, which is believed to have aided digestion of starch in agricultural refuse. We find standing variation for amylase copy number variation in wolves and little or no copy number increase in the Dingo and Husky lineages. In conjunction with the estimated timing of dog origins, these results provide additional support to archaeological finds, suggesting the earliest dogs arose alongside hunter-gathers rather than agriculturists. Regarding the geographic origin of dogs, we find that, surprisingly, none of the extant wolf lineages from putative domestication centers is more closely related to dogs, and, instead, the sampled wolves form a sister monophyletic clade. This result, in combination with dog-wolf admixture during the process of domestication, suggests that a re-evaluation of past hypotheses regarding dog origins is necessary

Cold Spring Harbor Laboratory Institutional Repository

Bilkent University Institutional Repository

Directory of Open Access Journals

PubMed Central

Digital.CSIC

UPF Digital Repository

FigShare