Search CORE

703 research outputs found

METHODS FOR HIGH-THROUGHPUT COMPARATIVE GENOMICS AND DISTRIBUTED SEQUENCE ANALYSIS

Author: Angiuoli Samuel Vincent
Publication venue
Publication date: 01/01/2011
Field of study

High-throughput sequencing has accelerated applications of genomics throughout the world. The increased production and decentralization of sequencing has also created bottlenecks in computational analysis. In this dissertation, I provide novel computational methods to improve analysis throughput in three areas: whole genome multiple alignment, pan-genome annotation, and bioinformatics workflows. To aid in the study of populations, tools are needed that can quickly compare multiple genome sequences, millions of nucleotides in length. I present a new multiple alignment tool for whole genomes, named Mugsy, that implements a novel method for identifying syntenic regions. Mugsy is computationally efficient, does not require a reference genome, and is robust in identifying a rich complement of genetic variation including duplications, rearrangements, and large-scale gain and loss of sequence in mixtures of draft and completed genome data. Mugsy is evaluated on the alignment of several dozen bacterial chromosomes on a single computer and was the fastest program evaluated for the alignment of assembled human chromosome sequences from four individuals. A distributed version of the algorithm is also described and provides increased processing throughput using multiple CPUs. Numerous individual genomes are sequenced to study diversity, evolution and classify pan-genomes. Pan-genome annotations contain inconsistencies and errors that hinder comparative analysis, even within a single species. I introduce a new tool, Mugsy-Annotator, that identifies orthologs and anomalous gene structure across a pan-genome using whole genome multiple alignments. Identified anomalies include inconsistently located translation initiation sites and disrupted genes due to draft genome sequencing or pseudogenes. An evaluation of pan-genomes indicates that such anomalies are common and alternative annotations suggested by the tool can improve annotation consistency and quality. Finally, I describe the Cloud Virtual Resource, CloVR, a desktop application for automated sequence analysis that improves usability and accessibility of bioinformatics software and cloud computing resources. CloVR is installed on a personal computer as a virtual machine and requires minimal installation, addressing challenges in deploying bioinformatics workflows. CloVR also seamlessly accesses remote cloud computing resources for improved processing throughput. In a case study, I demonstrate the portability and scalability of CloVR and evaluate the costs and resources for microbial sequence analysis

CiteSeerX

Digital Repository at the University of Maryland

Mugsy: fast multiple alignment of closely related whole genomes

Author: Ahn
Batzoglou
Blanchette
Bourque
Bradley
Bray
Chen
Corel
Darling
Darling
Deloger
Dewey
Dewey
Doring
Dubchak
Edgar
Edmonds
Elias
Ford
Gusfield
Hohl
IHGSC
Jacobson
Kent
Kurtz
Levy
Li
Margulies
Medini
Notredame
Paten
Paten
Pevzner
Raphael
Rausch
Rosenbloom
Samuel V. Angiuoli
Schwartz
Sherry
Steven L. Salzberg
Thompson
Treangen
Wang
Wheeler
Zhang
Publication venue: Oxford University Press
Publication date
Field of study

Motivation: The relative ease and low cost of current generation sequencing technologies has led to a dramatic increase in the number of sequenced genomes for species across the tree of life. This increasing volume of data requires tools that can quickly compare multiple whole-genome sequences, millions of base pairs in length, to aid in the study of populations, pan-genomes, and genome evolution

Crossref

PubMed Central

High-coverage sequencing and annotated assemblies of the budgerigar genome

Author: Aboukhalil R.
Bukovnik L.
Fedrigo O.
Ganapathy G.
Howard J. T.
Jarvis E. D.
Knight J. R.
Koren S.
Li B.
Li J.
Li Y.
Phillippy A. M.
Rasolonjatovo I.
Schatz M.
Schwartz D. C.
Wang T.
Ward J. M.
Warren W. C.
Winer R.
Wray G.
Xiong Y.
Zhang G.
Zhang Y.
Zhou S.
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2014
Field of study

BACKGROUND: Parrots belong to a group of behaviorally advanced vertebrates and have an advanced ability of vocal learning relative to other vocal-learning birds. They can imitate human speech, synchronize their body movements to a rhythmic beat, and understand complex concepts of referential meaning to sounds. However, little is known about the genetics of these traits. Elucidating the genetic bases would require whole genome sequencing and a robust assembly of a parrot genome. FINDINGS: We present a genomic resource for the budgerigar, an Australian Parakeet (Melopsittacus undulatus) -- the most widely studied parrot species in neuroscience and behavior. We present genomic sequence data that includes over 300x raw read coverage from multiple sequencing technologies and chromosome optical maps from a single male animal. The reads and optical maps were used to create three hybrid assemblies representing some of the largest genomic scaffolds to date for a bird; two of which were annotated based on similarities to reference sets of non-redundant human, zebra finch and chicken proteins, and budgerigar transcriptome sequence assemblies. The sequence reads for this project were in part generated and used for both the Assemblathon 2 competition and the first de novo assembly of a giga-scale vertebrate genome utilizing PacBio single-molecule sequencing. CONCLUSIONS: Across several quality metrics, these budgerigar assemblies are comparable to or better than the chicken and zebra finch genome assemblies built from traditional Sanger sequencing reads, and are sufficient to analyze regions that are difficult to sequence and assemble, including those not yet assembled in prior bird genomes, and promoter regions of genes differentially regulated in vocal learning brain regions. This work provides valuable data and material for genome technology development and for investigating the genomics of complex behavioral traits

Cold Spring Harbor Laboratory Institutional Repository

Springer - Publisher Connector

DukeSpace

PubMed Central

Digital Repository at the University of Maryland

University of Queensland eSpace

Construction of Red Fox Chromosomal Fragments from the Short-Read Genome Assembly

Author: Bastounes Estelle R.
Buch Ronak
Farré Marta
Feng Shaohong
Johnson Jennifer L.
Kim Jaebum
Kukekova Anna V.
Larkin Denis Mikhailovich
Liu Shiping
Rando Halie M.
Robson Michael P.
Trut Lyudmila N.
Won Naomi B.
Xiang Xueyan
Xiong Zijun
Zhang Guojie
Publication venue
Publication date: 01/01/2018
Field of study

The genome of a red fox (Vulpes vulpes) was recently sequenced and assembled using next-generation sequencing (NGS). The assembly is of high quality, with 94X coverage and a scaffold N50 of 11.8 Mbp, but is split into 676,878 scaffolds, some of which are likely to contain assembly errors. Fragmentation and misassembly hinder accurate gene prediction and downstream analysis such as the identification of loci under selection. Therefore, assembly of the genome into chromosome-scale fragments was an important step towards developing this genomic model. Scaffolds from the assembly were aligned to the dog reference genome and compared to the alignment of an outgroup genome (cat) against the dog to identify syntenic sequences among species. The program Reference-Assisted Chromosome Assembly (RACA) then integrated the comparative alignment with the mapping of the raw sequencing reads generated during assembly against the fox scaffolds. The 128 sequence fragments RACA assembled were compared to the fox meiotic linkage map to guide the construction of 40 chromosomal fragments. This computational approach to assembly was facilitated by prior research in comparative mammalian genomics, and the continued improvement of the red fox genome can in turn offer insight into canid and carnivore chromosome evolution. This assembly is also necessary for advancing genetic research in foxes and other canids

Aberystwyth Research Portal

Directory of Open Access Journals

Copenhagen University Research Information System

Kent Academic Repository

University of Queensland eSpace

Smith College: Smith ScholarWorks

Following Tetraploidy in Maize, a Short Deletion Mechanism Removed Genes Preferentially from One of the Two Homeologs

Author: A. H Paterson
B. C Thomas
B. J Haas
B. S Gaut
Brent S. Pedersen
C Simillion
D Lisch
D. A Petrov
D. R Schrider
Damon Lisch
E Lyons
E Lyons
E. R Liman
Eric Lyons
H Tang
J Lai
J. G Walling
J. L Pasieka
James C. Schnable
K. M Devos
Kenneth H. Wolfe
M Freeling
M Freeling
M Freeling
M Kasahara
M Lynch
M Lynch
M Lynch
Margaret R. Woodhouse
Michael Freeling
P SanMiguel
P. A Ziolkowski
P. S Schnable
R Fischer
R. J Langham
S Ahn
S Henikoff
S. F Altschul
Sb Needlema
Shabarinath Subramaniam
X Wang
Z Swigonova
Z. H Yang
Publication venue: Public Library of Science
Publication date: 29/06/2010
Field of study

Following genome duplication and selfish DNA expansion, maize used a heretofore unknown mechanism to shed redundant genes and functionless DNA with bias toward one of the parental genomes

Public Library of Science (PLOS)

Crossref

DigitalCommons@University of Nebraska

Directory of Open Access Journals

PubMed Central

Multiple whole genome alignments and novel biomedical applications at the VISTA portal

Author: Brudno Michael
Dubchak Inna
Minovitsky Simon
Poliakov Alexander
Ratnere Igor
Publication venue: Oxford University Press
Publication date: 01/02/2007
Field of study

The VISTA portal for comparative genomics is designed to give biomedical scientists a unified set of tools to lead them from the raw DNA sequences through the alignment and annotation to the visualization of the results. The VISTA portal also hosts the alignments of a number of genomes computed by our group, allowing users to study the regions of their interest without having to manually download the individual sequences. Here we describe various algorithmic and functional improvements implemented in the VISTA portal over the last 2 years. The VISTA Portal is accessible at http://genome.lbl.gov/vista

Crossref

PubMed Central

eScholarship - University of California

UNT Digital Library

An Integrative Method for Accurate Comparative Genome Mapping

Author: Eduardo P. C Rocha
Firas Swidan
Michael Shmoish
Pavel Pevzner
Ron Y Pinter
Publication venue: Public Library of Science
Publication date: 01/01/2006
Field of study

We present MAGIC, an integrative and accurate method for comparative genome mapping. Our method consists of two phases: preprocessing for identifying “maximal similar segments,” and mapping for clustering and classifying these segments. MAGIC's main novelty lies in its biologically intuitive clustering approach, which aims towards both calculating reorder-free segments and identifying orthologous segments. In the process, MAGIC efficiently handles ambiguities resulting from duplications that occurred before the speciation of the considered organisms from their most recent common ancestor. We demonstrate both MAGIC's robustness and scalability: the former is asserted with respect to its initial input and with respect to its parameters' values. The latter is asserted by applying MAGIC to distantly related organisms and to large genomes. We compare MAGIC to other comparative mapping methods and provide detailed analysis of the differences between them. Our improvements allow a comprehensive study of the diversity of genetic repertoires resulting from large-scale mutations, such as indels and duplications, including explicitly transposable and phagic elements. The strength of our method is demonstrated by detailed statistics computed for each type of these large-scale mutations. MAGIC enabled us to conduct a comprehensive analysis of the different forces shaping prokaryotic genomes from different clades, and to quantify the importance of novel gene content introduced by horizontal gene transfer relative to gene duplication in bacterial genome evolution. We use these results to investigate the breakpoint distribution in several prokaryotic genomes

Public Library of Science (PLOS)

CiteSeerX

Crossref

Directory of Open Access Journals

PubMed Central

FluentDNA: Nucleotide Visualization of Whole Genomes, Annotations, and Alignments

Author: Arakawa
Bially
Bierkandt
Buels
Conti
Cortesi
Deschavanne
Egea
Gardner
Gómez
Haverkort
Hennig
Hossain
Imbeault
Jakubowska
Joseph
Katoh
Kaya
Kemkemer
Khouri-Saba
Khurana
Krzywinski
Kuhn
Laetsch
Leader
Lechat
Lieberman-Aiden
Lyons
Mehta
Miller
Neugebauer
Rasmussen
Resig
Robinson
Rombouts
Sagan
Schatz
Seaman
Serov
Smit
Stahl
Sussillo
Swarbreck
Sánchez
Waterhouse
Wright
Yachdav
Zerbino
Publication venue: 'Frontiers Media SA'
Publication date: 30/04/2020
Field of study

Researchers seldom look at naked genome assemblies: instead the attributes of DNA sequences are mediated through statistics, annotations and high level summaries. Here we present software that visualizes the bare sequences of whole genome assemblies in a zoomable interface. This can assist in detection of chromosome architecture and contamination by the naked eye through changes in color patterns, in the absence of any other annotation. When available, annotations can be visualized alongside or on top of the naked sequence. Genome alignments can also be visualized, laying two genomes side by side in an alignment and highlighting their differences at nucleotide resolution. FluentDNA gives researchers direct visualization of whole genome assemblies, annotations and alignments, for quality control, hypothesis generation, and communicating results

Crossref

Shared Research Repository

Queen Mary Research Online

Evolution of genes and genomes on the Drosophila phylogeny

Author: Clark Andrew G.
Pachter Lior
Publication venue: Nature Publishing Group
Publication date: 08/11/2007
Field of study

Comparative analysis of multiple genomes in a phylogenetic framework dramatically improves the precision and sensitivity of evolutionary inference, producing more robust results than single-genome analyses can provide. The genomes of 12 Drosophila species, ten of which are presented here for the first time (sechellia, simulans, yakuba, erecta, ananassae, persimilis, willistoni, mojavensis, virilis and grimshawi), illustrate how rates and patterns of sequence divergence across taxa can illuminate evolutionary processes on a genomic scale. These genome sequences augment the formidable genetic tools that have made Drosophila melanogaster a pre-eminent model for animal genetics, and will further catalyse fundamental research on mechanisms of development, cell biology, genetics, disease, neurobiology, behaviour, physiology and evolution. Despite remarkable similarities among these Drosophila species, we identified many putatively non-neutral changes in protein-coding genes, non-coding RNA genes, and cis-regulatory regions. These may prove to underlie differences in the ecology and behaviour of these diverse species

Caltech Authors

A Bayesian Approach for Fast and Accurate Gene Tree Reconstruction

Author: Adams
Altschul
Arvestad
Butler
Chen
Ciccarelli
Clark
Dehal
Doyon
Dujon
Durand
Edgar
Eisen
Felsenstein
Gao
Gascuel
Hahn
Hahn
Hasegawa
Huerta-Cepas
Kellis
Kellis
Li
Li
M. D. Rasmussen
M. Kellis
Massey
Noonan
Page
Rannala
Richards
Rokas
Ronquist
Saitou
Sanderson
Shimodaira
Wapinski
Wolfe
Zmasek
Zmasek
Publication venue: 'Oxford University Press (OUP)'
Publication date: 01/01/2010
Field of study

Supplementary tables S1, sections 2.1–2.3, and figures S1–S11 are available at Molecular Biology and Evolution online (http://www.mbe.oxfordjournals.org/).Recent sequencing and computing advances have enabled phylogenetic analyses to expand to both entire genomes and large clades, thus requiring more efficient and accurate methods designed specifically for the phylogenomic context. Here, we present SPIMAP, an efficient Bayesian method for reconstructing gene trees in the presence of a known species tree. We observe many improvements in reconstruction accuracy, achieved by modeling multiple aspects of evolution, including gene duplication and loss (DL) rates, speciation times, and correlated substitution rate variation across both species and loci. We have implemented and applied this method on two clades of fully sequenced species, 12 Drosophila and 16 fungal genomes as well as simulated phylogenies and find dramatic improvements in reconstruction accuracy as compared with the most popular existing methods, including those that take the species tree into account. We find that reconstruction inaccuracies of traditional phylogenetic methods overestimate the number of DL events by as much as 2–3-fold, whereas our method achieves significantly higher accuracy. We feel that the results and methods presented here will have many important implications for future investigations of gene evolution.National Science Foundation (U.S.) (CAREER award NSF 0644282

CiteSeerX

DSpace@MIT

Crossref

Harvard University - DASH

PubMed Central