1,418 research outputs found

    TRACTION: Fast Non-Parametric Improvement of Estimated Gene Trees

    Get PDF
    Gene tree correction aims to improve the accuracy of a gene tree by using computational techniques along with a reference tree (and in some cases available sequence data). It is an active area of research when dealing with gene tree heterogeneity due to duplication and loss (GDL). Here, we study the problem of gene tree correction where gene tree heterogeneity is instead due to incomplete lineage sorting (ILS, a common problem in eukaryotic phylogenetics) and horizontal gene transfer (HGT, a common problem in bacterial phylogenetics). We introduce TRACTION, a simple polynomial time method that provably finds an optimal solution to the RF-Optimal Tree Refinement and Completion Problem, which seeks a refinement and completion of an input tree t with respect to a given binary tree T so as to minimize the Robinson-Foulds (RF) distance. We present the results of an extensive simulation study evaluating TRACTION within gene tree correction pipelines on 68,000 estimated gene trees, using estimated species trees as reference trees. We explore accuracy under conditions with varying levels of gene tree heterogeneity due to ILS and HGT. We show that TRACTION matches or improves the accuracy of well-established methods from the GDL literature under conditions with HGT and ILS, and ties for best under the ILS-only conditions. Furthermore, TRACTION ties for fastest on these datasets. TRACTION is available at https://github.com/pranjalv123/TRACTION-RF and the study datasets are available at https://doi.org/10.13012/B2IDB-1747658_V1

    Algorithms for phylogenetic tree correction in species and cancer evolution

    Get PDF
    Reconstructing evolutionary trees, also known as phylogenies, from molecular sequence data is a fundamental problem in computational biology. Classically, evolutionary trees have been estimated over a set of species, where leaves correspond to extant species and internal nodes correspond to ancestral species. This type of phylogeny is colloquially thought of as the “Tree of Life” and assembling it has been designated as a Grand Challenge by the National Science Foundation Advisory Committee for Cyberinfrastructure. However, processes other than speciation are also shaped by evolution. One notable example is in the development of a malignant tumor; tumor cells rapidly grow and divide, acquiring new mutations with each subsequent generation. Tumor cells then compete for resources, often resulting in selection for more aggressive cell types. Recent advancements in sequencing technology rapidly increased the amount of sequencing data taken from tumor biopsies. This development has allowed researchers to attempt reconstructing evolutionary histories for individual patient tumors, improving our understanding of cancer and laying the groundwork for precision therapy. Despite algorithmic improvements in the estimation of both species and tumor phylogenies from molecular sequence data, current approaches still suffer a number of limitations. Incomplete sampling and estimation error can lead to missing leaves and low-support branches in the estimated phylogenies. Moreover, commonly posed optimization problems are often under-determined given the limited amounts and low quality of input data, leading to large solution spaces of equally plausible phylogenies. In this dissertation, we explore current limitations in both species and tumor phylogeny estimation, connecting similarities and highlighting key differences. We then put forward four new methods that improve phylogeny estimation methods by incorporating auxiliary information: OCTAL, TRACTION, PhySigs, and RECAP. For each method, we present theoretical results (e.g., optimization problem complexity, algorithmic correctness, running time analysis) as well as empirical results on simulated and real datasets. Collectively, these methods show we can significantly improve the accuracy of leading phylogeny estimation methods by leveraging additional signal in distinct, but related datasets

    ¿Son los endemismos ibéricos realmente ibéricos? El caso de los Coleópteros acuáticos de la familia Dytiscidae (Coleoptera)

    Get PDF
    The phylogenetic relationships and the geographical origin of 27 of the 34 species and of 3 of the 9 subspecies of Iberian endemic Dytiscidae are studied, based on species level phylogenies constructed with two mitochondrial gene fragments (16S rRNA and Cytochrome Oxidase I). All Iberian endemic species for which more than one specimen was included were monophyletic with the exception of the complex Deronectes aubei sanfilippoi Fery & Brancucci, 1997-D. delarouzei (Jac. Du Val, 1857). The genus Stictotarsus as presently defined is polyphyletic, containing three different lineages: the S. duodecimpustulatus group —including the Iberian endemic S. bertrandi (Legros, 1956)—, Trichonectes otini (Guignot, 1941) (new combination) and the S. griseostriatus and S. roffii groups, which are in need of a new generic name. The genus Oreodytes is found to be paraphyletic, although with low bootstrap support. The species Nebrioporus (Nebrioporus) martinii (Fairmaire, 1858) (new combination) is transferred from the subgenus Zimmermannius to Nebrioporus. The Iberian populations of Stictotarsus griseostriatus (De Geer, 1774) and the endemic subspecies Oreodytes davisii rhianae Carr, 2001, O. sanmarkii alienus (Sharp, 1872) and Hydroporus normandi normandi Régimbart, 1903 do not form well characterised lineages, as measured with the mitochondrial markers used in this study. The Iberian endemic species of Dytiscidae are divided in three groups according to the type of vicariant origin: 1) within-Iberian species, when the sister species (or clade) of the Iberian endemic is also and Iberian endemic; 2) Iberian/European, when the sister occurs in Europe north of the Pyrenees; and 3) Iberian/North African, when the sister occurs in North Africa. Within-Iberian endemics are found to be on average older than Iberian/European and Iberian/North African species, they have more restricted distributions within the Iberian peninsula (they occur typically in only one of the main biogeographical regions), and tend to occur exclusively in running waters. The within-Iberian species are best represented by the “Iberian” clade of the genus Deronectes, formed by six endemic species plus two species with wider distributions. Most species in this group originated in rapid succession in the Late Miocene-Early Pliocene boundary by repeated vicariant events in the three main mountain massifs in the Iberian peninsula: the Pyrenees, the Baetic ranges, and the Sistema Central plus mountain massifs of the NW. On the contrary, most of the Iberian/European species seem to be the recent (Pleistocene) vicariants of a species with a widespread distribution encompassing the Iberian peninsula, at present restricted to south and west of the Ebro valley. The results of these analyses suggest that the Iberian peninsula was an isolated refuge during the Quaternary glaciations, in where allopatric speciation was frequent among some lineages of Dytiscidae diving beetles.Se estudian las relaciones filogenéticas y el origen geográfico de 27 de las 34 especies, y de 3 de las 9 subespecies, de endemismos ibéricos de la la familia Dytiscidae, en base a filogenias de las especies construidas con dos fragmentos de genes mitocondriales (16S rRNA y Citocromo Oxidasa I). Todas las especies ibéricas de las que se pudo estudiar más de un ejemplar son monofiléticas, con la excepción del complejo Deronectes aubei sanfilippoi Fery & Brancucci, 1997-D. delarouzei (Jac. Du Val, 1857). El género Stictotarsus tal y como está definido en la actualidad es polifilético, al estar compuesto de tres linajes distintos: el grupo de S. duodecimpustulatus —que incluye el endemismo ibérico S. bertrandi (Legros, 1956)—, Trichonectes otini (Guignot, 1941) (nueva combinación) y los grupos de S. griseostriatus y S. roffii, que precisan un nuevo nombre genérico. El género Oreodytes es parafilético, aunque con poco soporte de bootstrap. La especie Nebrioporus (Nebrioporus) martinii (Fairmaire, 1858) (nueva combinación) se transfiere del subgénero Zimmermannius a Nebrioporus. Las poblaciones ibéricas de Stictotarsus griseostriatus (De Geer, 1774) y las subespecies endémicas Oreodytes davisii rhianae Carr, 2001, O. sanmarkii alienus (Sharp, 1872) e Hydroporus normandi normandi Régimbart, 1903 no forman linajes bien caracterizados en base a los marcadores mitocondriales utilizados en este estudio. Los endemismos ibéricos de Dytiscidae se dividen en tres grupos en función del tipo de origen vicariante: 1) especies intra-ibéricas, cuando la especie o el clado hermano de un endemismo ibérico es otro endemismo ibérico; 2) íbero-europeas, cuando el grupo hermano se distribuye por Europa al norte de los Pirineos; y 3) íbero-norteafricanas, cuando el grupo hermano se distribuye en el norte de África. Los endemismos intra-ibéricos son en general más antiguos que los íbero-europeos o los íbero-norteafricanos, tienen distribuciones más restringidas (típicamente sólo en una de las principales regiones biogeográficas ibéricas), y tienden a ocupar exclusivamente aguas corrientes. El mejor ejemplo de especies intra-ibéricas lo constituye el clado ibérico del género Deronectes, formado por seis endemismos y dos especies con distribuciones algo más amplias. La mayoría de las especies del grupo se originaron en rápida sucesión en la transición del Mioceno tardío al Plioceno, en una serie de fragmentaciones vicariantes en los tres principales macizos montañosos de la península Ibérica: los Pirineos, las cordilleras Béticas, y el macizo Central más los sistemas del noroeste de la Península. Por el contrario, la mayoría de las especies íbero-europeas parecen ser las vicariantes recientes (del Pleistoceno) de especies con una distribución amplia, y están normalmente restringidas al sur y el oeste del valle del Ebro. Los resultados de este estudio sugieren que la península Ibérica fue un refugio aislado durante las glaciaciones del Cuaternario, en el que la especiación alopátrica en alguno de los linages de coleópteros acuáticos Dytiscidae fue relativamente frecuente

    Stretch-induced intussuceptive and sprouting angiogenesis in the chick chorioallantoic membrane

    Get PDF
    Vascular systems grow and remodel in response to not only metabolic needs, but also mechanical influences as well. Here, we investigated the influence of tissue-level mechanical forces on the patterning and structure of the chick chorioallantoic membrane (CAM) microcirculation. A dipole stretch field was applied to the CAM using custom computer-controlled servomotors. The topography of the stretch field was mapped using finite element models. After 3 days of stretch, Sholl analysis of the CAM demonstrated a 7-fold increase in conducting vessel intersections within the stretch field (p 0.05). In contrast, corrosion casting and SEM of the stretch field capillary meshwork demonstrated intense sprouting and intussusceptive angiogenesis. Both planar surface area (p < 0.05) and pillar density (p < 0.01) were significantly increased relative to control regions of the CAM. We conclude that a uniaxial stretch field stimulates the axial growth and realignment of conducting vessels as well as intussusceptive and sprouting angiogenesis within the gas exchange capillaries of the ex ovo CAM.National Institutes of Health (U.S.) (NIH grant HL95678

    Estimation and Detection

    Get PDF

    Graph based fusion of high-dimensional gene- and microRNA expression data

    Get PDF
    One of the main goals in cancer studies including high-throughput microRNA (miRNA) and mRNA data is to find and assess prognostic signatures capable of predicting clinical outcome. Both mRNA and miRNA expression changes in cancer diseases are described to reflect clinical characteristics like staging and prognosis. Furthermore, miRNA abundance can directly affect target transcripts and translation in tumor cells. Prediction models are trained to identify either mRNA or miRNA signatures for patient stratification. With the increasing number of microarray studies collecting mRNA and miRNA from the same patient cohort there is a need for statistical methods to integrate or fuse both kinds of data into one prediction model in order to find a combined signature that improves the prediction. Here, we propose a new method to fuse miRNA and mRNA data into one prediction model. Since miRNAs are known regulators of mRNAs, correlations between miRNA and mRNA expression data as well as target prediction information were used to build a bipartite graph representing the relations between miRNAs and mRNAs. Feature selection is a critical part when fitting prediction models to high- dimensional data. Most methods treat features, in this case genes or miRNAs, as independent, an assumption that does not hold true when dealing with combined gene and miRNA expression data. To improve prediction accuracy, a description of the correlation structure in the data is needed. In this work the bipartite graph was used to guide the feature selection and therewith improve prediction results and find a stable prognostic signature of miRNAs and genes. The method is evaluated on a prostate cancer data set comprising 98 patient samples with miRNA and mRNA expression data. The biochemical relapse, an important event in prostate cancer treatment, was used as clinical endpoint. Biochemical relapse coins the renewed rise of the blood level of a prostate marker (PSA) after surgical removal of the prostate. The relapse is a hint for metastases and usually the point in clinical practise to decide for further treatment. A boosting approach was used to predict the biochemical relapse. It could be shown that the bipartite graph in combination with miRNA and mRNA expression data could improve prediction performance. Furthermore the ap- proach improved the stability of the feature selection and therewith yielded more consistent marker sets. Of course, the marker sets produced by this new method contain mRNAs as well as miRNAs. The new approach was compared to two state-of-the-art methods suited for high-dimensional data and showed better prediction performance in both cases

    Phylogenetic Analyses: A Toolbox Expanding towards Bayesian Methods

    Get PDF
    The reconstruction of phylogenies is becoming an increasingly simple activity. This is mainly due to two reasons: the democratization of computing power and the increased availability of sophisticated yet user-friendly software. This review describes some of the latest additions to the phylogenetic toolbox, along with some of their theoretical and practical limitations. It is shown that Bayesian methods are under heavy development, as they offer the possibility to solve a number of long-standing issues and to integrate several steps of the phylogenetic analyses into a single framework. Specific topics include not only phylogenetic reconstruction, but also the comparison of phylogenies, the detection of adaptive evolution, and the estimation of divergence times between species

    Complex population genetic and demographic history of the Salangid, Neosalanx taihuensis, based on cytochrome b sequences

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>The Salangid icefish <it>Neosalanx taihuensis </it>(Salangidae) is an economically important fish, which is endemic to China, restricted to large freshwater systems (e.g. lakes, large rivers and estuaries) and typically exhibit low vagility. The continuous distribution ranges from the temperate region of the Huai and Yellow River basins to the subtropical region of the Pearl River basin. This wide ranging distribution makes the species an ideal model for the study of palaeoclimatic effects on population genetic structure and phylogeography. Here, we aim to analyze population genetic differentiation within and between river basins and demographic history in order to understand how this species responded to severe climatic oscillations, decline of the sea levels during the Pleistocene ice ages and tectonic activity.</p> <p>Results</p> <p>We obtained the complete mtDNA cytochrome <it>b </it>sequences (1141 bp) of 354 individuals from 13 populations in the Pearl River, the Yangze River and the Huai River basin. Thirty-six haplotypes were detected. Haplotype frequency distributions were strongly skewed, with most haplotypes (n = 24) represented only in single samples each and thus restricted to a single population. The most common haplotype (H36) was found in 49.15% of all individuals. Analysis of molecular variance (AMOVA) revealed a random pattern in the distribution of genetic diversity, which is inconsistent with contemporary hydrological structure. Significant levels of genetic subdivision were detected among populations within basins rather than between the three basins. Demographic analysis revealed that the population size in the Pearl River basin has remained relatively constant whereas the populations in the Yangze River and the Huai River basins expanded about 221 and 190 kyr ago, respectively, with the majority of mutations occurring after the last glacial maximum (LGM).</p> <p>Conclusion</p> <p>The observed complex genetic pattern of <it>N. taihuensis </it>is coherent with a scenario of multiple unrelated founding events by long-distance colonization and dispersal combined with contiguous population expansion and locally restricted gene flow. We also found that this species was likely severely impacted by past glaciations. More favourable climate and the formation of large suitable habitations together facilitated population expansion after the late Quaternary (especially the LGM). We proposed that all populations should be managed and conserved separately, especially for habitat protection.</p
    corecore