514 research outputs found

    Reconstructing the History of Syntenies Through Super-Reconciliation

    Get PDF
    Classical gene and species tree reconciliation, used to infer the history of gene gain and loss explaining the evolution of gene families, assumes an independent evolution for each family. While this assumption is reasonable for genes that are far apart in the genome, it is clearly not suited for genes grouped in syntenic blocks, which are more plausibly the result of a concerted evolution. Here, we introduce the Super-Reconciliation model, that extends the traditional Duplication-Loss model to the reconciliation of a set of trees, accounting for segmental duplications and losses. From a complexity point of view, we show that the associated decision problem is NP-hard. We then give an exact exponential-time algorithm for this problem, assess its time efficiency on simulated datasets, and give a proof of concept on the opioid receptor genes

    Evolution through segmental duplications and losses : A Super-Reconciliation approach

    Get PDF
    The classical gene and species tree reconciliation, used to infer the history of gene gain and loss explaining the evolution of gene families, assumes an independent evolution for each family. While this assumption is reasonable for genes that are far apart in the genome, it is not appropriate for genes grouped into syntenic blocks, which are more plausibly the result of a concerted evolution. Here, we introduce the Super-Reconciliation problem which consists in inferring a history of segmental duplication and loss events (involving a set of neighboring genes) leading to a set of present-day syntenies from a single ancestral one. In other words, we extend the traditional Duplication-Loss reconciliation problem of a single gene tree, to a set of trees, accounting for segmental duplications and losses. Existency of a Super-Reconciliation depends on individual gene tree consistency. In addition, ignoring rearrangements implies that existency also depends on gene order consistency. We first show that the problem of reconstructing a most parsimonious Super-Reconciliation, if any, is NP-hard and give an exact exponential-time algorithm to solve it. Alternatively, we show that accounting for rearrangements in the evolutionary model, but still only minimizing segmental duplication and loss events, leads to an exact polynomial-time algorithm. We finally assess time efficiency of the former exponential time algorithm for the Duplication-Loss model on simulated datasets, and give a proof of concept on the opioid receptor genes

    Modeling the evolution space of breakage fusion bridge cycles with a stochastic folding process

    Get PDF
    Breakage-Fusion-Bridge cycles in cancer arise when a broken segment of DNA is duplicated and an end from each copy joined together. This structure then 'unfolds' into a new piece of palindromic DNA. This is one mechanism responsible for the localised amplicons observed in cancer genome data. The process has parallels with paper folding sequences that arise when a piece of paper is folded several times and then unfolded. Here we adapt such methods to study the breakage-fusion-bridge structures in detail. We firstly consider discrete representations of this space with 2-d trees to demonstrate that there are 2^(n(n-1)/2) qualitatively distinct evolutions involving n breakage-fusion-bridge cycles. Secondly we consider the stochastic nature of the fold positions, to determine evolution likelihoods, and also describe how amplicons become localised. Finally we highlight these methods by inferring the evolution of breakage-fusion-bridge cycles with data from primary tissue cancer samples

    Joint assembly and genetic mapping of the Atlantic horseshoe crab genome reveals ancient whole genome duplication

    Get PDF
    Horseshoe crabs are marine arthropods with a fossil record extending back approximately 450 million years. They exhibit remarkable morphological stability over their long evolutionary history, retaining a number of ancestral arthropod traits, and are often cited as examples of "living fossils." As arthropods, they belong to the Ecdysozoa}, an ancient super-phylum whose sequenced genomes (including insects and nematodes) have thus far shown more divergence from the ancestral pattern of eumetazoan genome organization than cnidarians, deuterostomes, and lophotrochozoans. However, much of ecdysozoan diversity remains unrepresented in comparative genomic analyses. Here we use a new strategy of combined de novo assembly and genetic mapping to examine the chromosome-scale genome organization of the Atlantic horseshoe crab Limulus polyphemus. We constructed a genetic linkage map of this 2.7 Gbp genome by sequencing the nuclear DNA of 34 wild-collected, full-sibling embryos and their parents at a mean redundancy of 1.1x per sample. The map includes 84,307 sequence markers and 5,775 candidate conserved protein coding genes. Comparison to other metazoan genomes shows that the L. polyphemus genome preserves ancestral bilaterian linkage groups, and that a common ancestor of modern horseshoe crabs underwent one or more ancient whole genome duplications (WGDs) ~ 300 MYA, followed by extensive chromosome fusion

    Fast and accurate read mapping with approximate seeds and multiple backtracking

    Get PDF
    We present Masai, a read mapper representing the state-of-the-art in terms of speed and accuracy. Our tool is an order of magnitude faster than RazerS 3 and mrFAST, 2-4 times faster and more accurate than Bowtie 2 and BWA. The novelties of our read mapper are filtration with approximate seeds and a method for multiple backtracking. Approximate seeds, compared with exact seeds, increase filtration specificity while preserving sensitivity. Multiple backtracking amortizes the cost of searching a large set of seeds by taking advantage of the repetitiveness of next-generation sequencing data. Combined together, these two methods significantly speed up approximate search on genomic data sets. Masai is implemented in C++ using the SeqAn library. The source code is distributed under the BSD license and binaries for Linux, Mac OS X and Windows can be freely downloaded from http://www.seqan.de/projects/masai

    Multiple reassortment events in the evolutionary history of H1N1 influenza A virus since 1918

    Get PDF
    The H1N1 subtype of influenza A virus has caused substantial morbidity and mortality in humans, first documented in the global pandemic of 1918 and continuing to the present day. Despite this disease burden, the evolutionary history of the A/H1N1 virus is not well understood, particularly whether there is a virological basis for several notable epidemics of unusual severity in the 1940s and 1950s. Using a data set of 71 representative complete genome sequences sampled between 1918 and 2006, we show that segmental reassortment has played an important role in the genomic evolution of A/H1N1 since 1918. Specifically, we demonstrate that an A/H1N1 isolate from the 1947 epidemic acquired novel PB2 and HA genes through intra-subtype reassortment, which may explain the abrupt antigenic evolution of this virus. Similarly, the 1951 influenza epidemic may also have been associated with reassortant A/H1N1 viruses. Intra-subtype reassortment therefore appears to be a more important process in the evolution and epidemiology of H1N1 influenza A virus than previously realized

    木編集距離の宣言的意味に基づく階層とその計算に関する研究

    Get PDF
    WebにおけるHTMLデータやXMLデータ,バイオインフォマティクスにおけるRNAや糖鎖データのような根付きラベル付き木(以後,木という)として表現される木構造データを比較することは,構造データからのデータマイニングや機械学習における重要な研究の一つである.そのような木同士の距離として有名なものの一つに木編集距離がある.木編集距離は,ノードの削除,挿入,置換からなる編集操作を用いて,一方の根付き木から他方の木への変換に必要な編集操作列の最小コストとして定式化される.2つの木の間の編集操作列は無数に存在するため,操作列をすべて計算して木編集距離を求める方法は現実的ではない.そこでTaiは,木編集距離計算の指針として,木編集距離に宣言的意味を与えるTaiマッピング(以後単にマッピングともいう)を導入した.このTaiマッピングは,先祖子孫関係(および順序木の場合は兄弟関係)を保持する木のノード間の一対一対応であり,Taiマッピングの最小コストは木編集距離と一致する.木編集距離の計算時間は,順序木の場合はノード数nに対してO(n3)時間であるが,無順序木の場合はMAX SNP困難である.一方,糖鎖データではノードのつながりに意味があるためそのつながりを崩さないような制約が求められ,XMLデータでは根ノードから一定のノードはどの木にも共通する場合があり,より葉ノードに重点を置いた距離が求められる.このように,対象によっては木編集距離は過度に一般的となるため,他方では計算効率を上げるという目的の下に,宣言的意味であるマッピングに制限を加えることで木編集距離のさまざまな変種が研究されている.特に,RNA解析などで利用され,削除の前に挿入を行う木編集距離でもある木アライメント距離の計算は,順序木の場合はノード数nに対してO(n4)時間,無順序木の場合は一般にMAX SNP困難であるが,次数が限定されている木のときは多項式時間で計算できる.このアライメント距離は,2つの木の超木となるアライメント木の最小コストとして定式化することができ,Taiマッピングに制限を加えた劣制限マッピングの最小コストと一致する.本論文では,まず,マッピングへの制限をTaiマッピングの階層として捉え,この階層を共通部分森,特に,共通部分森中のノードの接続と部分木の並びの観点から見直すことで,木編集距離の変種の計算における本質について研究する.また,これらの観点によって新たに導入されるマッピングについて,それらの最小コストとなる編集距離の変種の時間計算時間を解析する.また,木アライメント距離に対して,森アライメント構築の高速化を目的として導入されたアンカーアライメント問題が提唱されている.これは,アンカーと呼ばれるマッピングを入力とし,そのアンカーでの対応を保持したアライメント木を構築する問題であるが,このアンカーはTaiマッピングであり,劣制限マッピングでないマッピングがアンカーとして入力されると木が構築することができない.そこで本論文では,木アライメント距離の宣言的意味が劣制限マッピングとなることの構成的な別証明を与え,その構成方法を利用することで,アンカーアライメント問題の出力を,アライメント木が構築できない場合は”no”を返す形に定式化する.また,それに基づくアンカーアライメント距離を定式化し,アンカーアライメント距離とアライメント距離を実データをもとに比較する.さらに,順序木より一般的であり,無順序木より制限された巡回的順序木を提案し,巡回的順序木間でのアライメント距離を計算するアルゴリズムを設計する.最後に,木編集距離に関するさまざまな内容として,無順序木編集距離を計算する動的A∗アルゴリズムの設計,Taiマッピングの根無し木への拡張,巡回的順序木と次数制限無順序木のマッピングカーネルの設計を行う.無順序木編集距離を計算するアルゴリズムとしては,既に,複数の下限関数を用いるHiguchiらのA∗アルゴリズムが導入されているが,これには計算の重複が存在するため,改善の余地がある.本論文では,その重複計算を動的計画法を用いて省いた動的A∗アルゴリズムを導入する.また,実験により,下限関数の効率を確認する.また,根付き木Taiマッピングは木編集距離に対応する重要な概念であるが,このTaiマッピングを根無し木に拡張するためには,単射であることに加えて,先祖子孫関係に代わる条件を導入する必要がある.そこで,ZhangらがLCA保存マッピングを根無し木に拡張する際に用いた中心に着目し,根無し木のマッピングを導入する.特に,根無し木としてよく表現される進化系統樹を特徴づける条件である4点条件と3点条件を木のトポロジーを特徴づける条件に変更し,それぞれの条件を保存するようなマッピングを導入する.さらに,サポートベクターマシンを利用して木を分類するための基本的な方法の1つである木カーネルは順序木について多く研究がおこなわれており,そのほとんどが,順序木間のマッピングを数え上げるマッピングカーネルのフレームワークに分類される.一方で,無順序木のカーネルは,その計算の難しさからほとんど研究がなされていない.そこで,巡回的順序木と,次数を定数Dに制限した無順序木に対するマッピングカーネルを設計し,それらの計算時間について議論する.九州工業大学博士学位論文 学位記番号:情工博甲第332号 学位授与年月日:平成30年3月23日第1章 はじめに|第2章 木編集距離と木アライメント距離|第3章 共通部分森に基づくTaiマッピング階層|第4章 木アライメント距離の計算|第5章 さまざまな拡張|第6章 結論と今後の課題九州工業大学平成29年

    2R and remodeling of vertebrate signal transduction engine

    Get PDF
    <p>Abstract</p> <p><b>Background</b></p> <p>Whole genome duplication (WGD) is a special case of gene duplication, observed rarely in animals, whereby all genes duplicate simultaneously through polyploidisation. Two rounds of WGD (2R-WGD) occurred at the base of vertebrates, giving rise to an enormous wave of genetic novelty, but a systematic analysis of functional consequences of this event has not yet been performed.</p> <p><b>Results</b></p> <p>We show that 2R-WGD affected an overwhelming majority (74%) of signalling genes, in particular developmental pathways involving receptor tyrosine kinases, Wnt and transforming growth factor-β ligands, G protein-coupled receptors and the apoptosis pathway. 2R-retained genes, in contrast to tandem duplicates, were enriched in protein interaction domains and multifunctional signalling modules of Ras and mitogen-activated protein kinase cascades. 2R-WGD had a fundamental impact on the cell-cycle machinery, redefined molecular building blocks of the neuronal synapse, and was formative for vertebrate brains. We investigated 2R-associated nodes in the context of the human signalling network, as well as in an inferred ancestral pre-2R (AP2R) network, and found that hubs (particularly involving negative regulation) were preferentially retained, with high connectivity driving retention. Finally, microarrays and proteomics demonstrated a trend for gradual paralog expression divergence independent of the duplication mechanism, but inferred ancestral expression states suggested preferential subfunctionalisation among 2R-ohnologs (2ROs).</p> <p><b>Conclusions</b></p> <p>The 2R event left an indelible imprint on vertebrate signalling and the cell cycle. We show that 2R-WGD preferentially retained genes are associated with higher organismal complexity (for example, locomotion, nervous system, morphogenesis), while genes associated with basic cellular functions (for example, translation, replication, splicing, recombination; with the notable exception of cell cycle) tended to be excluded. 2R-WGD set the stage for the emergence of key vertebrate functional novelties (such as complex brains, circulatory system, heart, bone, cartilage, musculature and adipose tissue). A full explanation of the impact of 2R on evolution, function and the flow of information in vertebrate signalling networks is likely to have practical consequences for regenerative medicine, stem cell therapies and cancer treatment.</p

    The Orthology Road: Theory and Methods in Orthology Analysis

    Get PDF
    The evolution of biological species depends on changes in genes. Among these changes are the gradual accumulation of DNA mutations, insertions and deletions, duplication of genes, movements of genes within and between chromosomes, gene losses and gene transfer. As two populations of the same species evolve independently, they will eventually become reproductively isolated and become two distinct species. The evolutionary history of a set of related species through the repeated occurrence of this speciation process can be represented as a tree-like structure, called a phylogenetic tree or a species tree. Since duplicated genes in a single species also independently accumulate point mutations, insertions and deletions, they drift apart in composition in the same way as genes in two related species. The divergence of all the genes descended from a single gene in an ancestral species can also be represented as a tree, a gene tree that takes into account both speciation and duplication events. In order to reconstruct the evolutionary history from the study of extant species, we use sets of similar genes, with relatively high degree of DNA similarity and usually with some functional resemblance, that appear to have been derived from a common ancestor. The degree of similarity among different instances of the “same gene” in different species can be used to explore their evolutionary history via the reconstruction of gene family histories, namely gene trees. Orthology refers specifically to the relationship between two genes that arose by a speciation event, recent or remote, rather than duplication. Comparing orthologous genes is essential to the correct reconstruction of species trees, so that detecting and identifying orthologous genes is an important problem, and a longstanding challenge, in comparative and evolutionary genomics as well as phylogenetics. A variety of orthology detection methods have been devised in recent years. Although many of these methods are dependent on generating gene and/or species trees, it has been shown that orthology can be estimated at acceptable levels of accuracy without having to infer gene trees and/or reconciling gene trees with species trees. Therefore, there is good reason to look at the connection of trees and orthology from a different angle: How much information about the gene tree, the species tree, and their reconciliation is already contained in the orthology relation among genes? Intriguingly, a solution to the first part of this question has already been given by Boecker and Dress [Boecker and Dress, 1998] in a different context. In particular, they completely characterized certain maps which they called symbolic ultrametrics. Semple and Steel [Semple and Steel, 2003] then presented an algorithm that can be used to reconstruct a phylogenetic tree from any given symbolic ultrametric. In this thesis we investigate a new characterization of orthology relations, based on symbolic ultramterics for recovering the gene tree. According to Fitch’s definition [Fitch, 2000], two genes are (co-)orthologous if their last common ancestor in the gene tree represents a speciation event. On the other hand, when their last common ancestor is a duplication event, the genes are paralogs. The orthology relation on a set of genes is therefore determined by the gene tree and an “event labeling” that identifies each interior vertex of that tree as either a duplication or a speciation event. In the context of analyzing orthology data, the problem of reconciling event-labeled gene trees with a species tree appears as a variant of the reconciliation problem where genes trees have no labels in their internal vertices. When reconciling a gene tree with a species tree, it can be assumed that the species tree is correct or, in the case of a unknown species tree, it can be inferred. Therefore it is crucial to know for a given gene tree whether there even exists a species tree. In this thesis we characterize event-labelled gene trees for which a species tree exists and species trees to which event-labelled gene trees can be mapped. Reconciliation methods are not always the best options for detecting orthology. A fundamental problem is that, aside from multicellular eukaryotes, evolution does not seem to have conformed to the descent-with-modification model that gives rise to tree-like phylogenies. Examples include many cases of prokaryotes and viruses whose evolution involved horizontal gene transfer. To treat the problem of distinguishing orthology and paralogy within a more general framework, graph-based methods have been proposed to detect and differentiate among evolutionary relationships of genes in those organisms. In this work we introduce a measure of orthology that can be used to test graph-based methods and reconciliation methods that detect orthology. Using these results a new algorithm BOTTOM-UP to determine whether a map from the set of vertices of a tree to a set of events is a symbolic ultrametric or not is devised. Additioanlly, a simulation environment designed to generate large gene families with complex duplication histories on which reconstruction algorithms can be tested and software tools can be benchmarked is presented
    corecore