Search CORE

2,092 research outputs found

Comparing genomes with rearrangements and segmental duplications

Author: Moret Bernard M.E.
Shao Mingfu
Publication venue
Publication date: 02/08/2017
Field of study

Motivation: Large-scale evolutionary events such as genomic rearrange.ments and segmental duplications form an important part of the evolution of genomes and are widely studied from both biological and computational perspectives. A basic computational problem is to infer these events in the evolutionary history for given modern genomes, a task for which many algorithms have been proposed under various constraints. Algorithms that can handle both rearrangements and content-modifying events such as duplications and losses remain few and limited in their applicability. Results: We study the comparison of two genomes under a model including general rearrangements (through double-cut-and-join) and segmental duplications. We formulate the comparison as an optimization problem and describe an exact algorithm to solve it by using an integer linear program. We also devise a sufficient condition and an efficient algorithm to identify optimal substructures, which can simplify the problem while preserving optimality. Using the optimal substructures with the integer linear program (ILP) formulation yields a practical and exact algorithm to solve the problem. We then apply our algorithm to assign in-paralogs and orthologs (a necessary step in handling duplications) and compare its performance with that of the state-of-the-art method MSOAR, using both simulations and real data. On simulated datasets, our method outperforms MSOAR by a significant margin, and on five well-annotated species, MSOAR achieves high accuracy, yet our method performs slightly better on each of the 10 pairwise comparisons. Availability and implementation: http://lcbb.epfl.ch/softwares/coser. Contact: [email protected] or [email protected]

RERO DOC Digital Library

Comparing genomes with rearrangements and segmental duplications

Author: Moret Bernard M. E.
Shao Mingfu
Publication venue: 'Oxford University Press (OUP)'
Publication date: 10/06/2015
Field of study

Motivation: Large-scale evolutionary events such as genomic rearrange. ments and segmental duplications form an important part of the evolution of genomes and are widely studied from both biological and computational perspectives. A basic computational problem is to infer these events in the evolutionary history for given modern genomes, a task for which many algorithms have been proposed under various constraints. Algorithms that can handle both rearrangements and content-modifying events such as duplications and losses remain few and limited in their applicability. Results: We study the comparison of two genomes under a model including general rearrangements (through double-cut-and-join) and segmental duplications. We formulate the comparison as an optimization problem and describe an exact algorithm to solve it by using an integer linear program. We also devise a sufficient condition and an efficient algorithm to identify optimal substructures, which can simplify the problem while preserving optimality. Using the optimal substructures with the integer linear program (ILP) formulation yields a practical and exact algorithm to solve the problem. We then apply our algorithm to assign in-paralogs and orthologs (a necessary step in handling duplications) and compare its performance with that of the state-of-the-art method MSOAR, using both simulations and real data. On simulated datasets, our method outperforms MSOAR by a significant margin, and on five well-annotated species, MSOAR achieves high accuracy, yet our method performs slightly better on each of the 10 pairwise comparisons

Infoscience - École polytechnique fédérale de Lausanne

PubMed Central

Limited Lifespan of Fragile Regions in Mammalian Evolution

Author: A. Bergeron
A. Bhutkar
A. Kulemzina
A. Ruiz-Herrera
A. Ruiz-Herrera
A.E. Wind van der
C. Webber
D. Larkin
D. Misceo
D. San Mauro
D. Sankoff
D. Sankoff
D.M. Larkin
D.M. Larkin
E. Mlynarski
E. Mongin
E.E. Eichler
G. Fertin
H. Hinsch
H. Kikuta
H. Zhao
J. Ma
J. Ma
J.H. Nadeau
L. Armengol
L. Gordon
M. Caceres
M. Longo
M.A. Alekseyev
M.A. Alekseyev
M.A. Alekseyev
M.A. Alekseyev
M.R. Mehan
O. Lecompte
P. Pevzner
P.A. Pevzner
R. Koszul
S. Myers
S. Ohno
S. Yancopoulos
S. Zhao
W.J. Kent
W.J. Murphy
Y. Yue
Z. Jiang
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2010
Field of study

An important question in genome evolution is whether there exist fragile regions (rearrangement hotspots) where chromosomal rearrangements are happening over and over again. Although nearly all recent studies supported the existence of fragile regions in mammalian genomes, the most comprehensive phylogenomic study of mammals (Ma et al. (2006) Genome Research 16, 1557-1565) raised some doubts about their existence. We demonstrate that fragile regions are subject to a "birth and death" process, implying that fragility has limited evolutionary lifespan. This finding implies that fragile regions migrate to different locations in different mammals, explaining why there exist only a few chromosomal breakpoints shared between different lineages. The birth and death of fragile regions phenomenon reinforces the hypothesis that rearrangements are promoted by matching segmental duplications and suggests putative locations of the currently active fragile regions in the human genome

arXiv.org e-Print Archive

CiteSeerX

Crossref

Discovery of large genomic inversions using long range information.

Author: Alkan Can
Amemiya Chris T
Antonacci Francesca
Chiatante Giorgia
Eichler Evan E
Eslami Rasekh Marzieh
Miroballo Mattia
Tang Joyce
Ventura Mario
Publication venue: eScholarship, University of California
Publication date: 01/01/2017
Field of study

BackgroundAlthough many algorithms are now available that aim to characterize different classes of structural variation, discovery of balanced rearrangements such as inversions remains an open problem. This is mainly due to the fact that breakpoints of such events typically lie within segmental duplications or common repeats, which reduces the mappability of short reads. The algorithms developed within the 1000 Genomes Project to identify inversions are limited to relatively short inversions, and there are currently no available algorithms to discover large inversions using high throughput sequencing technologies.ResultsHere we propose a novel algorithm, VALOR, to discover large inversions using new sequencing methods that provide long range information such as 10X Genomics linked-read sequencing, pooled clone sequencing, or other similar technologies that we commonly refer to as long range sequencing. We demonstrate the utility of VALOR using both pooled clone sequencing and 10X Genomics linked-read sequencing generated from the genome of an individual from the HapMap project (NA12878). We also provide a comprehensive comparison of VALOR against several state-of-the-art structural variation discovery algorithms that use whole genome shotgun sequencing data.ConclusionsIn this paper, we show that VALOR is able to accurately discover all previously identified and experimentally validated large inversions in the same genome with a low false discovery rate. Using VALOR, we also predicted a novel inversion, which we validated using fluorescent in situ hybridization. VALOR is available at https://github.com/BilkentCompGen/VALOR

Springer - Publisher Connector

PubMed Central

eScholarship - University of California

Springer OAI

Genome-wide detection of segmental duplications and potential assembly errors in the human genome sequence

Author: Cheung Joseph
Estivill Xavier
Khaja Razi
Lau Ken
MacDonald Jeffrey R
Scherer Stephen W
Tsui Lap-Chee
Publication venue: BioMed Central
Publication date: 01/01/2003
Field of study

BACKGROUND: Previous studies have suggested that recent segmental duplications, which are often involved in chromosome rearrangements underlying genomic disease, account for some 5% of the human genome. We have developed rapid computational heuristics based on BLAST analysis to detect segmental duplications, as well as regions containing potential sequence misassignments in the human genome assemblies. RESULTS: Our analysis of the June 2002 public human genome assembly revealed that 107.4 of 3,043.1 megabases (Mb) (3.53%) of sequence contained segmental duplications, each with size equal or more than 5 kb and 90% identity. We have also detected that 38.9 Mb (1.28%) of sequence within this assembly is likely to be involved in sequence misassignment errors. Furthermore, we have identified a significant subset (199,965 of 2,327,473 or 8.6%) of single-nucleotide polymorphisms (SNPs) in the public databases that are not true SNPs but are potential paralogous sequence variants. CONCLUSION: Using two distinct computational approaches, we have identified most of the sequences in the human genome that have undergone recent segmental duplications. Near-identical segmental duplications present a major challenge to the completion of the human genome sequence. Potential sequence misassignments detected in this study would require additional efforts to resolve

Springer - Publisher Connector

PubMed Central

UPF Digital Repository

HKU Scholars Hub

Mechanisms and impact of alternative transposition-induced segmental duplications

Author: Zuo Tao
Publication venue: Iowa State University Digital Repository
Publication date: 01/01/2015
Field of study

Segmental duplications are prevalent in both plant and animal genomes, and have played important roles in genome evolution. The focus of my project is to understand the transposition-mediated mechanisms that lead to the formation of segmental duplications, and the immediate impact of recently generated large (up to 14.6 Mb) tandem duplications in maize. We applied a variety of genetic, molecular, statistical and bioinformatics approaches, including genetic screening, PCR, Southern blotting, qRT-PCR, microarray, mRNA-sequencing, small RNA-sequencing, and a self-developed program (STRAND: Search for Transposon-Induced Tandem Direct Duplications) to study these questions. We discovered new genome rearrangement mechanisms, including transposition of paired DNA transposon termini that can generate tandem direct duplications (TDD) and novel structures termed Composite Insertions. Genomic study revealed that these mechanisms have played an important role in generating TDD in 8 of 22 examined plant genomes. We also found a significant dosage-dependent effect of a 14.6 Mb duplication on phenotypic variation, and expression of mRNA and small RNA transcripts. This work expands our current knowledge of how DNA transposons contribute to rapid genome expansion, extends our understanding of the significance of DNA transposons in altering genome structure, and provides new insight into the transcriptional expression and phenotypic effect of a specific and recent maize duplication

Digital Repository @ Iowa State University (ISU)

Drosophila Duplication Hotspots Are Associated with Late-Replicating Regions of the Genome

Author: A Aguilera
A Eyre-Walker
A Letessier
AG Clark
AG Clark
AJ Iafrate
AJ Sharp
Andrew G. Clark
AR Quinlan
AS Fiston-Lavier
B Vicoso
B Walsh
B Xu
C Laird
CJ Willer
CM Bergman
CM Egan
CS McBride
D Bourguet
D Wright
DF Conrad
DJ Begun
DJ Turner
DQ Nguyen
DR Schrider
E Eden
E Gonzalez
EB Dopman
G Rosengren Pielberg
G Sella
GH Perry
GH Perry
GH Perry
GR Bignell
H Akashi
H Innan
H Innan
H Kaessmann
Harmit S. Malik
HG Parker
J Sebat
J Sebat
J. J. Emerson
JA Lee
JD Wall
JJ Emerson
JM Chen
JM Chen
JM Cridland
JM Schmidt
JR Lupski
JR Pollack
JS Taylor
K Inoue
K Sankaranarayanan
Kimura
M Aguadé
M Ashburner
M Cardoso-Moreira
M Cardoso-Moreira
M Kreitman
M Long
M Schwaiger
Manyuan Long
Margarida Cardoso-Moreira
MF Arlt
MF Arlt
MF Arlt
ML Eaton
MM Le Beau
P Andolfatto
P Capy
PJ Hastings
PJ Hastings
PR Haddrill
Q Zhou
R de Cid
R Koszul
R Lyne
RC Moore
S Aulard
SA McCarroll
W Fu
W Gu
WJ Kent
Y Kuwada
Z Zhang
Publication venue: Public Library of Science
Publication date: 01/11/2011
Field of study

Duplications play a significant role in both extremes of the phenotypic spectrum of newly arising mutations: they can have severe deleterious effects (e.g. duplications underlie a variety of diseases) but can also be highly advantageous. The phenotypic potential of newly arisen duplications has stimulated wide interest in both the mutational and selective processes shaping these variants in the genome. Here we take advantage of the Drosophila simulans–Drosophila melanogaster genetic system to further our understanding of both processes. Regarding mutational processes, the study of two closely related species allows investigation of the potential existence of shared duplication hotspots, and the similarities and differences between the two genomes can be used to dissect its underlying causes. Regarding selection, the difference in the effective population size between the two species can be leveraged to ask questions about the strength of selection acting on different classes of duplications. In this study, we conducted a survey of duplication polymorphisms in 14 different lines of D. simulans using tiling microarrays and combined it with an analogous survey for the D. melanogaster genome. By integrating the two datasets, we identified duplication hotspots conserved between the two species. However, unlike the duplication hotspots identified in mammalian genomes, Drosophila duplication hotspots are not associated with sequences of high sequence identity capable of mediating non-allelic homologous recombination. Instead, Drosophila duplication hotspots are associated with late-replicating regions of the genome, suggesting a link between DNA replication and duplication rates. We also found evidence supporting a higher effectiveness of selection on duplications in D. simulans than in D. melanogaster. This is also true for duplications segregating at high frequency, where we find evidence in D. simulans that a sizeable fraction of these mutations is being driven to fixation by positive selection

Public Library of Science (PLOS)

Crossref

Directory of Open Access Journals

PubMed Central

Precise detection of rearrangement breakpoints in mammalian chromosomes

Author: A Bhutkar
A Ruiz-Herrera
ACE Darling
AU Sinha
BJ Haas
Christian Gautier
Claire Lemaitre
DG Albertson
EE Eichler
Eric Tannier
F Casals
F Swidan
G Bourque
G Bourque
H Hinsch
H Kehrer-Sawatzki
I Auger
J Ma
JA Bailey
JH Nadeau
JM Ranz
L Armengol
L Rieseberg
Marie-France Sagot
P Pevzner
P Pevzner
P Stankiewicz
P Trinh
PP Calabrese
R Bellman
RM Kuhn
RW DeBry
S Hampson
S Schwartz
V Choi
WJ Kent
WJ Murphy
X Pan
Publication venue: BioMed Central
Publication date: 01/01/2008
Field of study

Abstract Background Genomes undergo large structural changes that alter their organisation. The chromosomal regions affected by these rearrangements are called breakpoints, while those which have not been rearranged are called synteny blocks. We developed a method to precisely delimit rearrangement breakpoints on a genome by comparison with the genome of a related species. Contrary to current methods which search for synteny blocks and simply return what remains in the genome as breakpoints, we propose to go further and to investigate the breakpoints themselves in order to refine them. Results Given some reliable and non overlapping synteny blocks, the core of the method consists in refining the regions that are not contained in them. By aligning each breakpoint sequence against its specific orthologous sequences in the other species, we can look for weak similarities inside the breakpoint, thus extending the synteny blocks and narrowing the breakpoints. The identification of the narrowed breakpoints relies on a segmentation algorithm and is statistically assessed. Since this method requires as input synteny blocks with some properties which, though they appear natural, are not verified by current methods for detecting such blocks, we further give a formal definition and provide an algorithm to compute them. The whole method is applied to delimit breakpoints on the human genome when compared to the mouse and dog genomes. Among the 355 human-mouse and 240 human-dog breakpoints, 168 and 146 respectively span less than 50 Kb. We compared the resulting breakpoints with some publicly available ones and show that we achieve a better resolution. Furthermore, we suggest that breakpoints are rarely reduced to a point, and instead consist in often large regions that can be distinguished from the sequences around in terms of segmental duplications, similarity with related species, and transposable elements. Conclusion Our method leads to smaller breakpoints than already published ones and allows for a better description of their internal structure. In the majority of cases, our refined regions of breakpoint exhibit specific biological properties (no similarity, presence of segmental duplications and of transposable elements). We hope that this new result may provide some insight into the mechanism and evolutionary properties of chromosomal rearrangements.</p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

INRIA a CCSD electronic archive server

PubMed Central

HAL Descartes