Search CORE

1,967 research outputs found

Recommended from our members

OMMA enables population-scale analysis of complex genomic features and phylogenomic relationships from nanochannel-based optical maps.

Author: Chan Ting-Fung
Chu Catherine
Ho Pak-Leung
Kwok Pui-Yan
Lai Yvonne Yuk-Yin
Leung Alden King-Yung
Li Le
Liu Melissa Chun-Jiao
Yip Kevin Y
Publication venue: eScholarship, University of California
Publication date: 01/07/2019
Field of study

BackgroundOptical mapping is an emerging technology that complements sequencing-based methods in genome analysis. It is widely used in improving genome assemblies and detecting structural variations by providing information over much longer (up to 1 Mb) reads. Current standards in optical mapping analysis involve assembling optical maps into contigs and aligning them to a reference, which is limited to pairwise comparison and becomes bias-prone when analyzing multiple samples.FindingsWe present a new method, OMMA, that extends optical mapping to the study of complex genomic features by simultaneously interrogating optical maps across many samples in a reference-independent manner. OMMA captures and characterizes complex genomic features, e.g., multiple haplotypes, copy number variations, and subtelomeric structures when applied to 154 human samples across the 26 populations sequenced in the 1000 Genomes Project. For small genomes such as pathogenic bacteria, OMMA accurately reconstructs the phylogenomic relationships and identifies functional elements across 21 Acinetobacter baumannii strains.ConclusionsWith the increasing data throughput of optical mapping system, the use of this technology in comparative genome analysis across many samples will become feasible. OMMA is a timely solution that can address such computational need. The OMMA software is available at https://github.com/TF-Chan-Lab/OMTools

eScholarship - University of California

Limited Lifespan of Fragile Regions in Mammalian Evolution

Author: A. Bergeron
A. Bhutkar
A. Kulemzina
A. Ruiz-Herrera
A. Ruiz-Herrera
A.E. Wind van der
C. Webber
D. Larkin
D. Misceo
D. San Mauro
D. Sankoff
D. Sankoff
D.M. Larkin
D.M. Larkin
E. Mlynarski
E. Mongin
E.E. Eichler
G. Fertin
H. Hinsch
H. Kikuta
H. Zhao
J. Ma
J. Ma
J.H. Nadeau
L. Armengol
L. Gordon
M. Caceres
M. Longo
M.A. Alekseyev
M.A. Alekseyev
M.A. Alekseyev
M.A. Alekseyev
M.R. Mehan
O. Lecompte
P. Pevzner
P.A. Pevzner
R. Koszul
S. Myers
S. Ohno
S. Yancopoulos
S. Zhao
W.J. Kent
W.J. Murphy
Y. Yue
Z. Jiang
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2010
Field of study

An important question in genome evolution is whether there exist fragile regions (rearrangement hotspots) where chromosomal rearrangements are happening over and over again. Although nearly all recent studies supported the existence of fragile regions in mammalian genomes, the most comprehensive phylogenomic study of mammals (Ma et al. (2006) Genome Research 16, 1557-1565) raised some doubts about their existence. We demonstrate that fragile regions are subject to a "birth and death" process, implying that fragility has limited evolutionary lifespan. This finding implies that fragile regions migrate to different locations in different mammals, explaining why there exist only a few chromosomal breakpoints shared between different lineages. The birth and death of fragile regions phenomenon reinforces the hypothesis that rearrangements are promoted by matching segmental duplications and suggests putative locations of the currently active fragile regions in the human genome

arXiv.org e-Print Archive

CiteSeerX

Crossref

The highly rearranged mitochondrial genomes of the crabs Maja crispata and Maja squinado (Majidae) and gene order evolution in Brachyura

Author: AD Miller
AE Smith
AH Sahyoun
AT Beckenbach
BQ Minh
C Hahn
C Moritz
DR Wolstenholme
DV Lavrov
E Negrisolo
F Jühling
F Kilpert
FJ Lin
G Shi
G Shi
G Sotelo
G Tan
G Tang
H Ma
H Shen
H Shimodaira
H Sun
I Marcadé
IM Fearnley
J Yang
JD Thompson
JL Boore
JL Boore
JL Boore
JL Boore
JM Sung
JM Sung
JS Ki
K Strimmer
K Tamura
LM Tsang
LS Quang
LT Nguyen
M Babbucci
M Bernt
M Bernt
M Bernt
M Dowton
M Hui
MH Tan
MH Tan
MM Yamauchi
N Lartillot
NT Perna
O Rota-Stabelli
P Cantatore
P Salvato
R Lanfear
R Raimond
RD Segawa
S Grave De
S Montelli
S Saito
SF Altschul
SJ Kim
SL Cameron
TA Rawlings
TM Lowe
X Wang
Y Xing
YK Ji
YQ Yu
Z Yang
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2017
Field of study

Abstract We sequenced the mitochondrial genomes of the spider crabs Maja crispata and Maja squinado (Majidae, Brachyura). Both genomes contain the whole set of 37 genes characteristic of Bilaterian genomes, encoded on both \u3b1- and \u3b2-strands. Both species exhibit the same gene order, which is unique among known animal genomes. In particular, all the genes located on the \u3b2-strand form a single block. This gene order was analysed together with the other nine gene orders known for the Brachyura. Our study confirms that the most widespread gene order (BraGO) represents the plesiomorphic condition for Brachyura and was established at the onset of this clade. All other gene orders are the result of transformational pathways originating from BraGO. The different gene orders exhibit variable levels of genes rearrangements, which involve only tRNAs or all types of genes. Local homoplastic arrangements were identified, while complete gene orders remain unique and represent signatures that can have a diagnostic value. Brachyura appear to be a hot-spot of gene order diversity within the phylum Arthropoda. Our analysis, allowed to track, for the first time, the fully evolutionary pathways producing the Brachyuran gene orders. This goal was achieved by coupling sophisticated bioinformatic tools with phylogenetic analysis

Crossref

Archivio istituzionale della ricerca - Università di Padova

The inference of gene trees with species trees

Author: Bastien Boussau
Eric Tannier
Gergely J. Szöllősi
Montbonnot France
Vincent Daubin
Publication venue
Publication date: 04/11/2013
Field of study

Molecular phylogeny has focused mainly on improving models for the reconstruction of gene trees based on sequence alignments. Yet, most phylogeneticists seek to reveal the history of species. Although the histories of genes and species are tightly linked, they are seldom identical, because genes duplicate, are lost or horizontally transferred, and because alleles can co-exist in populations for periods that may span several speciation events. Building models describing the relationship between gene and species trees can thus improve the reconstruction of gene trees when a species tree is known, and vice-versa. Several approaches have been proposed to solve the problem in one direction or the other, but in general neither gene trees nor species trees are known. Only a few studies have attempted to jointly infer gene trees and species trees. In this article we review the various models that have been used to describe the relationship between gene trees and species trees. These models account for gene duplication and loss, transfer or incomplete lineage sorting. Some of them consider several types of events together, but none exists currently that considers the full repertoire of processes that generate gene trees along the species tree. Simulations as well as empirical studies on genomic data show that combining gene tree-species tree models with models of sequence evolution improves gene tree reconstruction. In turn, these better gene trees provide a better basis for studying genome evolution or reconstructing ancestral chromosomes and ancestral gene sequences. We predict that gene tree-species tree methods that can deal with genomic data sets will be instrumental to advancing our understanding of genomic evolution.Comment: Review article in relation to the "Mathematical and Computational Evolutionary Biology" conference, Montpellier, 201

arXiv.org e-Print Archive

CiteSeerX

INRIA a CCSD electronic archive server

PubMed Central

HAL

Repository of the Academy's Library

ELTE Digital Institutional Repository (EDIT)

Hal-Diderot

A fast algorithm for the multiple genome rearrangement problem with weighted reversals and transpositions

Author: A Bergeron
A Caprara
A Caprara
B Bourque
B Moret
B Moret
D Bader
D Sankoff
D Sankoff
E Tannier
Enno Ohlebusch
G Fritzsch
J Tang
M Bader
M Bader
M Bernt
M Blanchette
M Blanchette
M Cosner
Martin Bader
Mohamed I Abouelhoda
N Eriksen
P Pevzner
S Hannenhalli
S Wu
S Wu
T Hartman
T Liu
V Bafna
Publication venue: BioMed Central
Publication date: 01/01/2008
Field of study

Abstract Background Due to recent progress in genome sequencing, more and more data for phylogenetic reconstruction based on rearrangement distances between genomes become available. However, this phylogenetic reconstruction is a very challenging task. For the most simple distance measures (the breakpoint distance and the reversal distance), the problem is NP-hard even if one considers only three genomes. Results In this paper, we present a new heuristic algorithm that directly constructs a phylogenetic tree w.r.t. the weighted reversal and transposition distance. Experimental results on previously published datasets show that constructing phylogenetic trees in this way results in better trees than constructing the trees w.r.t. the reversal distance, and recalculating the weight of the trees with the weighted reversal and transposition distance. An implementation of the algorithm can be obtained from the authors. Conclusion The possibility of creating phylogenetic trees directly w.r.t. the weighted reversal and transposition distance results in biologically more realistic scenarios. Our algorithm can solve today's most challenging biological datasets in a reasonable amount of time.</p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Progressive Mauve: Multiple alignment of genomes with gene flux and rearrangement

Author: Darling Aaron E.
Mau Bob
Perna Nicole T.
Publication venue
Publication date: 01/01/2009
Field of study

Multiple genome alignment remains a challenging problem. Effects of recombination including rearrangement, segmental duplication, gain, and loss can create a mosaic pattern of homology even among closely related organisms. We describe a method to align two or more genomes that have undergone large-scale recombination, particularly genomes that have undergone substantial amounts of gene gain and loss (gene flux). The method utilizes a novel alignment objective score, referred to as a sum-of-pairs breakpoint score. We also apply a probabilistic alignment filtering method to remove erroneous alignments of unrelated sequences, which are commonly observed in other genome alignment methods. We describe new metrics for quantifying genome alignment accuracy which measure the quality of rearrangement breakpoint predictions and indel predictions. The progressive genome alignment algorithm demonstrates markedly improved accuracy over previous approaches in situations where genomes have undergone realistic amounts of genome rearrangement, gene gain, loss, and duplication. We apply the progressive genome alignment algorithm to a set of 23 completely sequenced genomes from the genera Escherichia, Shigella, and Salmonella. The 23 enterobacteria have an estimated 2.46Mbp of genomic content conserved among all taxa and total unique content of 15.2Mbp. We document substantial population-level variability among these organisms driven by homologous recombination, gene gain, and gene loss. Free, open-source software implementing the described genome alignment approach is available from http://gel.ahabs.wisc.edu/mauve .Comment: Revision dated June 19, 200

arXiv.org e-Print Archive

CiteSeerX

The amphioxus genome and the evolution of the chordate karyotype

Lancelets ('amphioxus') are the modern survivors of an ancient chordate lineage, with a fossil record dating back to the Cambrian period. Here we describe the structure and gene content of the highly polymorphic approx520-megabase genome of the Florida lancelet Branchiostoma floridae, and analyse it in the context of chordate evolution. Whole-genome comparisons illuminate the murky relationships among the three chordate groups (tunicates, lancelets and vertebrates), and allow not only reconstruction of the gene complement of the last common chordate ancestor but also partial reconstruction of its genomic organization, as well as a description of two genome-wide duplications and subsequent reorganizations in the vertebrate lineage. These genome-scale events shaped the vertebrate genome and provided additional genetic variation for exploitation during vertebrate evolution

Serveur académique lausannois

Caltech Authors

Sunderland University Institutional Repository

Oxford University Research Archive

National Taiwan University Repository

University of St. Andrews - Pure

Integration of Alignment and Phylogeny in the Whole-Genome Era

Author: Sun Hongtao
Publication venue: Washington University Open Scholarship
Publication date: 15/05/2015
Field of study

With the development of new sequencing techniques, whole genomes of many species have become available. This huge amount of data gives rise to new opportunities and challenges. These new sequences provide valuable information on relationships among species, e.g. genome recombination and conservation. One of the principal ways to investigate such information is multiple sequence alignment (MSA). Currently, there is large amount of MSA data on the internet, such as the UCSC genome database, but how to effectively use this information to solve classical and new problems is still an area lacking of exploration. In this thesis, we explored how to use this information in four problems, i.e. sequence orthology search problem, multiple alignment improvement problem, short read mapping problem, and genome rearrangement inference problem. For the first problem, we developed a EM algorithm to iteratively align a query with a multiple alignment database with the information from a phylogeny relating the query species and the species in the multiple alignment. We also infer the query\u27s location in the phylogeny. We showed that by doing alignment and phylogeny inference together, we can improve the accuracies for both problems. For the second problem, we developed an optimization algorithm to iteratively refine the multiple alignment quality. Experiment results showed our algorithm is very stable in term of resulting alignments. The results showed that our method is more accurate than existing methods, i.e. Mafft, Clustal-O, and Mavid, on test data from three sets of species from the UCSC genome database. For the third problem, we developed a model, PhyMap, to align a read to a multiple alignment allowing mismatches and indels. PhyMap computes local alignments of a query sequence against a fixed multiple-genome alignment of closely related species. PhyMap uses a known phylogenetic tree on the species in the multiple alignment to improve the quality of its computed alignments while also estimating the placement of the query on this tree. Both theoretical computation and experiment results show that our model can differentiate between orthologous and paralogous alignments better than other popular short read mapping tools (BWA, BOWTIE and BLAST). For the fourth problem, we gave a simple genome recombination model which can express insertions, deletions, inversions, translocations and inverted translocations on aligned genome segments. We also developed an MCMC algorithm to infer the order of the query segments. We proved that using any Euclidian metrics to measure distance between two sequence orders in the tree optimization goal function will lead to a degenerated solution where the inferred order will be the order of one of the leaf nodes. We also gave a graph-based formulation of the problem which can represent the probability distribution of the order of the query sequences

Washington University St. Louis: Open Scholarship