Search CORE

166 research outputs found

Maximum likelihood estimates of pairwise rearrangement distances

Author: Egri-Nagy Attila
Francis Andrew R.
Holland Barbara R.
Jarvis Peter D.
Serdoz Stuart
Sumner Jeremy
Tanaka Mark M.
Publication venue
Publication date: 01/01/2017
Field of study

Accurate estimation of evolutionary distances between taxa is important for many phylogenetic reconstruction methods. In the case of bacteria, distances can be estimated using a range of different evolutionary models, from single nucleotide polymorphisms to large-scale genome rearrangements. In the case of sequence evolution models (such as the Jukes-Cantor model and associated metric) have been used to correct pairwise distances. Similar correction methods for genome rearrangement processes are required to improve inference. Current attempts at correction fall into 3 categories: Empirical computational studies, Bayesian/MCMC approaches, and combinatorial approaches. Here we introduce a maximum likelihood estimator for the inversion distance between a pair of genomes, using the group-theoretic approach to modelling inversions introduced recently. This MLE functions as a corrected distance: in particular, we show that because of the way sequences of inversions interact with each other, it is quite possible for minimal distance and MLE distance to differently order the distances of two genomes from a third. This has obvious implications for the use of minimal distance in phylogeny reconstruction. The work also tackles the above problem allowing free rotation of the genome. Generally a frame of reference is locked, and all computation made accordingly. This work incorporates the action of the dihedral group so that distance estimates are free from any a priori frame of reference.Comment: 21 pages, 7 figures. To appear in the Journal of Theoretical Biolog

arXiv.org e-Print Archive

Crossref

University of Tasmania Open Access Repository

UNSWorks

Western Sydney ResearchDirect

Moments Of Genome Evolution By Double Cut-and-join

Author: Biller
Eric
Laurent
Priscila
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 07/06/2016
Field of study

Fundação de Amparo à Pesquisa do Estado de São Paulo (FAPESP)We study statistical estimators of the number of genomic events separating two genomes under a Double Cut-and Join (DCJ) rearrangement model, by a method of moment estimation. We first propose an exact, closed, analytically invertible formula for the expected number of breakpoints after a given number of DCJs. This improves over the heuristic, recursive and computationally slower previously proposed one. Then we explore the analogies of genome evolution by DCJ with evolution of binary sequences under substitutions, permutations under transpositions, and random graphs. Each of these are presented in the literature with intuitive justifications, and are used to import results from better known fields. We formalize the relations by proving a correspondence between moments in sequence and genome evolution, provided substitutions appear four by four in the corresponding model. Eventually we prove a bounded error on two estimators of the number of cycles in the breakpoint graph after a given number of rearrangements, by an analogy with cycles in permutations and components in random graphs.1614Agence Nationale pour la Recherche, Ancestrome project [ANR-10-BINF-01-01]Fundação de Amparo à Pesquisa do Estado de São Paulo (FAPESP)Fundação de Amparo à Pesquisa do Estado de São Paulo (FAPESP)FAPESP [2013/25084-2

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

Repositorio da Producao Cientifica e Intelectual da Unicamp

Recommended from our members

Models and analyses of chromosome evolution

Author: Guerrero Rafael Felipe
Publication venue
Publication date: 18/10/2013
Field of study

textAt the core of evolutionary biology stands the study of divergence between populations and the formation of new species. This dissertation applies a diverse array of theoretical and statistical approaches to study how chromosomes evolve. In the first chapter, I build models that predict the amount of neutral genetic variation in chromosomal inversions involved in local adaptation, providing a foundation for future studies on the role of these rearrangements in population divergence. In the second chapter, I use a large dataset of the geographic variation in frequency of a chromosomal inversion to infer natural selection and non-random mating, revealing that this inversion could be implicated in strong reproductive isolation between subpopulations of a single species. In the third chapter, I use coalescent models for recombining sex chromosomes coupled with approximate Bayesian computation to estimate the recombination rate between X and Y chromosomes in European tree frogs. This novel approach allows me to infer a rate so low that would have been hard to detect with empirical methods. In the fourth chapter, I study the theoretical conditions that favor the evolution of a chromosome fusion that reduces recombination between locally adapted alleles.Ecology, Evolution and Behavio

Texas ScholarWorks

PerSVade: personalized structural variant detection in any species of interest

Author: Gabaldón Toni
Schikora Tamarit Miquel Àngel
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2022
Field of study

Structural variants (SVs) underlie genomic variation but are often overlooked due to difficult detection from short reads. Most algorithms have been tested on humans, and it remains unclear how applicable they are in other organisms. To solve this, we develop perSVade (personalized structural variation detection), a sample-tailored pipeline that provides optimally called SVs and their inferred accuracy, as well as small and copy number variants. PerSVade increases SV calling accuracy on a benchmark of six eukaryotes. We find no universal set of optimal parameters, underscoring the need for sample-specific parameter optimization. PerSVade will facilitate SV detection and study across diverse organisms.Peer ReviewedPostprint (author's final draft

UPCommons. Portal del coneixement obert de la UPC

PubMed Central

Evolution of whole genomes through inversions:models and algorithms for duplicates, ancestors, and edit scenarios

Author: Swenson Krister
Publication venue: Lausanne, EPFL
Publication date: 15/10/2009
Field of study

Advances in sequencing technology are yielding DNA sequence data at an alarming rate – a rate reminiscent of Moore's law. Biologists' abilities to analyze this data, however, have not kept pace. On the other hand, the discrete and mechanical nature of the cell life-cycle has been tantalizing to computer scientists. Thus in the 1980s, pioneers of the field now called Computational Biology began to uncover a wealth of computer science problems, some confronting modern Biologists and some hidden in the annals of the biological literature. In particular, many interesting twists were introduced to classical string matching, sorting, and graph problems. One such problem, first posed in 1941 but rediscovered in the early 1980s, is that of sorting by inversions (also called reversals): given two permutations, find the minimum number of inversions required to transform one into the other, where an inversion inverts the order of a subpermutation. Indeed, many genomes have evolved mostly or only through inversions. Thus it becomes possible to trace evolutionary histories by inferring sequences of such inversions that led to today's genomes from a distant common ancestor. But unlike the classic edit distance problem where string editing was relatively simple, editing permutation in this way has proved to be more complex. In this dissertation, we extend the theory so as to make these edit distances more broadly applicable and faster to compute, and work towards more powerful tools that can accurately infer evolutionary histories. In particular, we present work that for the first time considers genomic distances between any pair of genomes, with no limitation on the number of occurrences of a gene. Next we show that there are conditions under which an ancestral genome (or one close to the true ancestor) can be reliably reconstructed. Finally we present new methodology that computes a minimum-length sequence of inversions to transform one permutation into another in, on average, O(n log n) steps, whereas the best worst-case algorithm to compute such a sequence uses O(n√n log n) steps

Infoscience - École polytechnique fédérale de Lausanne

Inferring genome-scale rearrangement phylogeny and ancestral gene order: a Drosophila case study

Author: Bhutkar Arjun
Gelbart William M
Smith Temple F
Publication venue: BioMed Central
Publication date: 01/01/2007
Field of study

A simple, fast, and biologically-inspired computational approach to infer genome-scale rearrangement phylogeny and ancestral gene order has been developed and applied to eight Drosophila genomes, providing insights into evolutionary chromosomal dynamics

Boston University Institutional Repository (OpenBU)

Harvard University - DASH

Springer - Publisher Connector

PubMed Central

Distance-Based Genome Rearrangement Phylogeny

Author: Jansen Robert K.
Moret Bernard M.E.
Raubeson Linda A.
Wang Li-San
Warnow Tandy
Publication venue: ScholarWorks@CWU
Publication date: 01/01/2006
Field of study

Evolution operates on whole genomes through direct rearrangements of genes, such as inversions, transpositions, and inverted transpositions, as well as through operations, such as duplications, losses, and transfers, that also affect the gene content of the genomes. Because these events are rare relative to nucleotide substitutions, gene order data offer the possibility of resolving ancient branches in the tree of life; the combination of gene order data with sequence data also has the potential to provide more robust phylogenetic reconstructions, since each can elucidate evolution at different time scales. Distance corrections greatly improve the accuracy of phylogeny reconstructions from DNA sequences, enabling distance-based methods to approach the accuracy of the more elaborate methods based on parsimony or likelihood at a fraction of the computational cost. This paper focuses on developing distance correction methods for phylogeny reconstruction from whole genomes. The main question we investigate is how to estimate evolutionary histories from whole genomes with equal gene content, and we present a technique, the empirically derived estimator (EDE), that we have developed for this purpose. We study the use of EDE on whole genomes with identical gene content, and we explore the accuracy of phylogenies inferred using EDE with the neighbor joining and minimum evolution methods under a wide range of model conditions. Our study shows that tree reconstruction under these two methods is much more accurate when based on EDE distances than when based on other distances previously suggested for whole genomes

Infoscience - École polytechnique fédérale de Lausanne

CiteSeerX

ScholarWorks at Central Washington University

Genome dedoubling by DCJ and reversal

Author: Ouangraoua Aïda
Thomas Antoine
Varré Jean-Stéphane
Publication venue: BioMed Central
Publication date: 01/01/2011
Field of study

Abstract Background Segmental duplications in genomes have been studied for many years. Recently, several studies have highlighted a biological phenomenon called <it>breakpoint-duplication</it> that apparently associates a significant proportion of segmental duplications in Mammals, and the Drosophila species group, to breakpoints in rearrangement events. Results In this paper, we introduce and study a combinatorial problem, inspired from the breakpoint-duplication phenomenon, called the <it>Genome Dedoubling Problem.</it> It consists of finding a minimum length rearrangement scenario required to transform a genome with duplicated segments into a non-duplicated genome such that duplications are caused by rearrangement breakpoints. We show that the problem, in the Double-Cut-and-Join (DCJ) and the reversal rearrangement models, can be reduced to an APX-complete problem, and we provide algorithms for the Genome Dedoubling Problem with 2-approximable parts. We apply the methods for the reconstruction of a non-duplicated ancestor of <it>Drosophila yakuba.</it> Conclusions We present the <it>Genome Dedoubling Problem</it>, and describe two algorithms solving the problem in the DCJ model, and the reversal model. The usefulness of the problems and the methods are showed through an application to real Drosophila data.</p

HAL - Lille 3

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

INRIA a CCSD electronic archive server

PubMed Central

The Distance and Median Problems in the Single-Cut-Or-Join Model with Single-Gene Duplications

Author: Chauve Cedric
Feijao Pedro C.
Lafond Manuel
Mane Aniket C.
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 04/05/2020
Field of study

Background. In the field of genome rearrangement algorithms, models accounting for gene duplication lead often to hard problems. For example, while computing the pairwise distance is tractable in most duplication-free models, the problem is NP-complete for most extensions of these models accounting for duplicated genes. Moreover, problems involving more than two genomes, such as the genome median and the Small Parsimony problem, are intractable for most duplication-free models, with some exceptions, for example the Single-Cut-or-Join (SCJ) model. Results. We introduce a variant of the SCJ distance that accounts for duplicated genes, in the context of directed evolution from an ancestral genome to a descendant genome where orthology relations between ancestral genes and their descendant are known. Our model includes two duplication mechanisms: single-gene tandem duplication and the creation of single-gene circular chromosomes. We prove that in this model, computing the directed distance and a parsimonious evolutionary scenario in terms of SCJ and single-gene duplication events can be done in linear time. We also show that the directed median problem is tractable for this distance, while the rooted median problem, where we assume that one of the given genomes is ancestral to the median, is NP-complete. We also describe an Integer Linear Program for solving this problem. We evaluate the directed distance and rooted median algorithms on simulated data. Conclusion. Our results provide a simple genome rearrangement model, extending the SCJ model to account for single-gene duplications, for which we prove a mix of tractability and hardness results. For the NP-complete rooted median problem, we design a simple Integer Linear Program. Our publicly available implementation of these algorithms for the directed distance and median problems allow to solve efficiently these problems on large instances

Simon Fraser University Institutional Repository

The geography of recent genetic ancestry across Europe

Author: A Albrechtsen
A Auton
A Gillett
A Gusev
A Keller
A Zeileis
AE Hoerl
AL Price
AL Price
AM Stuart
B Winney
BL Browning
BM Henn
BM Henn
C Tyler-Smith
CD Huff
Chris Tyler-Smith
CL Epstein
CT O'Dushlaine
DJ Lawson
DLT Rohde
E Jakkula
F Rousset
G McVean
Graham Coop
H Li
J Chang
J Novembre
J Novembre
J Novembre
JA Tennessen
JE Pool
JE Powell
JFC Kingman
JK Gusev Lowe
JN Fenner
K Harris
KA Frazer
KP Donnelly
M Slatkin
MD Brown
MR Nelson
MR Nelson
N Patterson
N Patterson
N Takahata
NH Chapman
O Lao
P Menozzi
P Moorjani
P Skoglund
P Soares
Peter Ralph
PF Palamara
R Hudson
RA Fisher
RL Cann
S Carmi
S Giglio
S Gravel
S Purcell
Y Petrov
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 07/05/2013
Field of study

The recent genealogical history of human populations is a complex mosaic formed by individual migration, large-scale population movements, and other demographic events. Population genomics datasets can provide a window into this recent history, as rare traces of recent shared genetic ancestry are detectable due to long segments of shared genomic material. We make use of genomic data for 2,257 Europeans (the POPRES dataset) to conduct one of the first surveys of recent genealogical ancestry over the past three thousand years at a continental scale. We detected 1.9 million shared genomic segments, and used the lengths of these to infer the distribution of shared ancestors across time and geography. We find that a pair of modern Europeans living in neighboring populations share around 10-50 genetic common ancestors from the last 1500 years, and upwards of 500 genetic ancestors from the previous 1000 years. These numbers drop off exponentially with geographic distance, but since genetic ancestry is rare, individuals from opposite ends of Europe are still expected to share millions of common genealogical ancestors over the last 1000 years. There is substantial regional variation in the number of shared genetic ancestors: especially high numbers of common ancestors between many eastern populations likely date to the Slavic and/or Hunnic expansions, while much lower levels of common ancestry in the Italian and Iberian peninsulas may indicate weaker demographic effects of Germanic expansions into these areas and/or more stably structured populations. Recent shared ancestry in modern Europeans is ubiquitous, and clearly shows the impact of both small-scale migration and large historical events. Population genomic datasets have considerable power to uncover recent demographic history, and will allow a much fuller picture of the close genealogical kinship of individuals across the world.Comment: Full size figures available from http://www.eve.ucdavis.edu/~plralph/research.html; or html version at http://ralphlab.usc.edu/ibd/ibd-paper/ibd-writeup.xhtm

arXiv.org e-Print Archive

Crossref

Directory of Open Access Journals

PubMed Central

FigShare