1,418 research outputs found
TRACTION: Fast Non-Parametric Improvement of Estimated Gene Trees
Gene tree correction aims to improve the accuracy of a gene tree by using computational techniques along with a reference tree (and in some cases available sequence data). It is an active area of research when dealing with gene tree heterogeneity due to duplication and loss (GDL). Here, we study the problem of gene tree correction where gene tree heterogeneity is instead due to incomplete lineage sorting (ILS, a common problem in eukaryotic phylogenetics) and horizontal gene transfer (HGT, a common problem in bacterial phylogenetics). We introduce TRACTION, a simple polynomial time method that provably finds an optimal solution to the RF-Optimal Tree Refinement and Completion Problem, which seeks a refinement and completion of an input tree t with respect to a given binary tree T so as to minimize the Robinson-Foulds (RF) distance. We present the results of an extensive simulation study evaluating TRACTION within gene tree correction pipelines on 68,000 estimated gene trees, using estimated species trees as reference trees. We explore accuracy under conditions with varying levels of gene tree heterogeneity due to ILS and HGT. We show that TRACTION matches or improves the accuracy of well-established methods from the GDL literature under conditions with HGT and ILS, and ties for best under the ILS-only conditions. Furthermore, TRACTION ties for fastest on these datasets. TRACTION is available at https://github.com/pranjalv123/TRACTION-RF and the study datasets are available at https://doi.org/10.13012/B2IDB-1747658_V1
Algorithms for phylogenetic tree correction in species and cancer evolution
Reconstructing evolutionary trees, also known as phylogenies, from molecular sequence data is a fundamental problem in computational biology. Classically, evolutionary trees have been estimated over a set of species, where leaves correspond to extant species and internal nodes correspond to ancestral species. This type of phylogeny is colloquially thought of as the “Tree of Life” and assembling it has been designated as a Grand Challenge by the National Science Foundation Advisory Committee for Cyberinfrastructure. However, processes other than speciation are also shaped by evolution. One notable example is in the development of a malignant tumor; tumor cells rapidly grow and divide, acquiring new mutations with each subsequent generation. Tumor cells then compete for resources, often resulting in selection for more aggressive cell types. Recent advancements in sequencing technology rapidly increased the amount of sequencing data taken from tumor biopsies. This development has allowed researchers to attempt reconstructing evolutionary histories for individual patient tumors, improving our understanding of cancer and laying the groundwork for precision therapy.
Despite algorithmic improvements in the estimation of both species and tumor phylogenies from molecular sequence data, current approaches still suffer a number of limitations. Incomplete sampling and estimation error can lead to missing leaves and low-support branches in the estimated phylogenies. Moreover, commonly posed optimization problems are often under-determined given the limited amounts and low quality of input data, leading to large solution spaces of equally plausible phylogenies. In this dissertation, we explore current limitations in both species and tumor phylogeny estimation, connecting similarities and highlighting key differences. We then put forward four new methods that improve phylogeny estimation methods by incorporating auxiliary information: OCTAL, TRACTION, PhySigs, and RECAP. For each method, we present theoretical results (e.g., optimization problem complexity, algorithmic correctness, running time analysis) as well as empirical results on simulated and real datasets. Collectively, these methods show we can significantly improve the accuracy of leading phylogeny estimation methods by leveraging additional signal in distinct, but related datasets
¿Son los endemismos ibéricos realmente ibéricos? El caso de los Coleópteros acuáticos de la familia Dytiscidae (Coleoptera)
The phylogenetic relationships and the geographical origin of 27 of the 34 species and of 3 of the 9 subspecies of Iberian endemic Dytiscidae are studied, based on species level phylogenies constructed with two mitochondrial gene fragments (16S rRNA and Cytochrome Oxidase I). All Iberian endemic species for which more than one specimen was included were monophyletic with the exception of the complex Deronectes aubei sanfilippoi Fery & Brancucci, 1997-D. delarouzei (Jac. Du Val, 1857). The genus Stictotarsus as presently defined is polyphyletic, containing three different lineages: the S. duodecimpustulatus group —including the Iberian endemic S. bertrandi (Legros, 1956)—, Trichonectes otini (Guignot, 1941) (new combination) and the S. griseostriatus and S. roffii groups, which are in need of a new generic name. The genus Oreodytes is found to be paraphyletic, although with low bootstrap support. The species Nebrioporus (Nebrioporus) martinii (Fairmaire, 1858) (new combination) is transferred from the subgenus Zimmermannius to Nebrioporus. The Iberian populations of Stictotarsus griseostriatus (De Geer, 1774) and the endemic subspecies Oreodytes davisii rhianae Carr, 2001, O. sanmarkii alienus (Sharp, 1872) and Hydroporus normandi normandi Régimbart, 1903 do not form well characterised lineages, as measured with the mitochondrial markers used in this study. The Iberian endemic species of Dytiscidae are divided in three groups according to the type of vicariant origin: 1) within-Iberian species, when the sister species (or clade) of the Iberian endemic is also and Iberian endemic; 2) Iberian/European, when the sister occurs in Europe north of the Pyrenees; and 3) Iberian/North African, when the sister occurs in North Africa. Within-Iberian endemics are found to be on average older than Iberian/European and Iberian/North African species, they have more restricted distributions within the Iberian peninsula (they occur typically in only one of the main biogeographical regions), and tend to occur exclusively in running waters. The within-Iberian species are best represented by the “Iberian” clade of the genus Deronectes, formed by six endemic species plus two species with wider distributions. Most species in this group originated in rapid succession in the Late Miocene-Early Pliocene boundary by repeated vicariant events in the three main mountain massifs in the Iberian peninsula: the Pyrenees, the Baetic ranges, and the Sistema Central plus mountain massifs of the NW. On the contrary, most of the Iberian/European species seem to be the recent (Pleistocene) vicariants of a species with a widespread distribution encompassing the Iberian peninsula, at present restricted to south and west of the Ebro valley. The results of these analyses suggest that the Iberian peninsula was an isolated refuge during the Quaternary glaciations, in where allopatric speciation was frequent among some lineages of Dytiscidae diving beetles.Se estudian las relaciones filogenéticas y el origen geográfico de 27 de las 34 especies, y de 3 de las 9 subespecies, de endemismos ibéricos de la la familia Dytiscidae, en base a filogenias de las especies construidas con dos fragmentos de genes mitocondriales (16S rRNA y Citocromo Oxidasa I). Todas las especies ibéricas de las que se pudo estudiar más de un ejemplar son monofiléticas, con la excepción del complejo Deronectes aubei sanfilippoi Fery & Brancucci, 1997-D. delarouzei (Jac. Du Val, 1857). El género Stictotarsus tal y como está definido en la actualidad es polifilético, al estar compuesto de tres linajes distintos: el grupo de S. duodecimpustulatus —que incluye el endemismo ibérico S. bertrandi (Legros, 1956)—, Trichonectes otini (Guignot, 1941) (nueva combinación) y los grupos de S. griseostriatus y S. roffii, que precisan un nuevo nombre genérico. El género Oreodytes es parafilético, aunque con poco soporte de bootstrap. La especie Nebrioporus (Nebrioporus) martinii (Fairmaire, 1858) (nueva combinación) se transfiere del subgénero Zimmermannius a Nebrioporus. Las poblaciones ibéricas de Stictotarsus griseostriatus (De Geer, 1774) y las subespecies endémicas Oreodytes davisii rhianae Carr, 2001, O. sanmarkii alienus (Sharp, 1872) e Hydroporus normandi normandi Régimbart, 1903 no forman linajes bien caracterizados en base a los marcadores mitocondriales utilizados en este estudio. Los endemismos ibéricos de Dytiscidae se dividen en tres grupos en función del tipo de origen vicariante: 1) especies intra-ibéricas, cuando la especie o el clado hermano de un endemismo ibérico es otro endemismo ibérico; 2) íbero-europeas, cuando el grupo hermano se distribuye por Europa al norte de los Pirineos; y 3) íbero-norteafricanas, cuando el grupo hermano se distribuye en el norte de África. Los endemismos intra-ibéricos son en general más antiguos que los íbero-europeos o los íbero-norteafricanos, tienen distribuciones más restringidas (típicamente sólo en una de las principales regiones biogeográficas ibéricas), y tienden a ocupar exclusivamente aguas corrientes. El mejor ejemplo de especies intra-ibéricas lo constituye el clado ibérico del género Deronectes, formado por seis endemismos y dos especies con distribuciones algo más amplias. La mayoría de las especies del grupo se originaron en rápida sucesión en la transición del Mioceno tardío al Plioceno, en una serie de fragmentaciones vicariantes en los tres principales macizos montañosos de la península Ibérica: los Pirineos, las cordilleras Béticas, y el macizo Central más los sistemas del noroeste de la Península. Por el contrario, la mayoría de las especies íbero-europeas parecen ser las vicariantes recientes (del Pleistoceno) de especies con una distribución amplia, y están normalmente restringidas al sur y el oeste del valle del Ebro. Los resultados de este estudio sugieren que la península Ibérica fue un refugio aislado durante las glaciaciones del Cuaternario, en el que la especiación alopátrica en alguno de los linages de coleópteros acuáticos Dytiscidae fue relativamente frecuente
Stretch-induced intussuceptive and sprouting angiogenesis in the chick chorioallantoic membrane
Vascular systems grow and remodel in response to not only metabolic needs, but also mechanical influences as well. Here, we investigated the influence of tissue-level mechanical forces on the patterning and structure of the chick chorioallantoic membrane (CAM) microcirculation. A dipole stretch field was applied to the CAM using custom computer-controlled servomotors. The topography of the stretch field was mapped using finite element models. After 3 days of stretch, Sholl analysis of the CAM demonstrated a 7-fold increase in conducting vessel intersections within the stretch field (p 0.05). In contrast, corrosion casting and SEM of the stretch field capillary meshwork demonstrated intense sprouting and intussusceptive angiogenesis. Both planar surface area (p < 0.05) and pillar density (p < 0.01) were significantly increased relative to control regions of the CAM. We conclude that a uniaxial stretch field stimulates the axial growth and realignment of conducting vessels as well as intussusceptive and sprouting angiogenesis within the gas exchange capillaries of the ex ovo CAM.National Institutes of Health (U.S.) (NIH grant HL95678
Graph based fusion of high-dimensional gene- and microRNA expression data
One of the main goals in cancer studies including high-throughput microRNA
(miRNA) and mRNA data is to find and assess prognostic signatures capable
of predicting clinical outcome. Both mRNA and miRNA expression changes in
cancer diseases are described to reflect clinical characteristics like staging and
prognosis. Furthermore, miRNA abundance can directly affect target transcripts
and translation in tumor cells. Prediction models are trained to identify either
mRNA or miRNA signatures for patient stratification. With the increasing
number of microarray studies collecting mRNA and miRNA from the same
patient cohort there is a need for statistical methods to integrate or fuse both
kinds of data into one prediction model in order to find a combined signature
that improves the prediction.
Here, we propose a new method to fuse miRNA and mRNA data into one
prediction model. Since miRNAs are known regulators of mRNAs, correlations
between miRNA and mRNA expression data as well as target prediction
information were used to build a bipartite graph representing the relations
between miRNAs and mRNAs.
Feature selection is a critical part when fitting prediction models to high-
dimensional data. Most methods treat features, in this case genes or miRNAs,
as independent, an assumption that does not hold true when dealing with
combined gene and miRNA expression data. To improve prediction accuracy, a
description of the correlation structure in the data is needed. In this work the
bipartite graph was used to guide the feature selection and therewith improve
prediction results and find a stable prognostic signature of miRNAs and genes.
The method is evaluated on a prostate cancer data set comprising 98 patient
samples with miRNA and mRNA expression data. The biochemical relapse, an
important event in prostate cancer treatment, was used as clinical endpoint.
Biochemical relapse coins the renewed rise of the blood level of a prostate
marker (PSA) after surgical removal of the prostate. The relapse is a hint
for metastases and usually the point in clinical practise to decide for further
treatment.
A boosting approach was used to predict the biochemical relapse. It could
be shown that the bipartite graph in combination with miRNA and mRNA
expression data could improve prediction performance. Furthermore the ap-
proach improved the stability of the feature selection and therewith yielded
more consistent marker sets. Of course, the marker sets produced by this new
method contain mRNAs as well as miRNAs.
The new approach was compared to two state-of-the-art methods suited for
high-dimensional data and showed better prediction performance in both cases
Phylogenetic Analyses: A Toolbox Expanding towards Bayesian Methods
The reconstruction of phylogenies is becoming an increasingly simple activity. This is mainly due to two reasons: the democratization of computing power and the increased availability of sophisticated yet user-friendly software. This review describes some of the latest additions to the phylogenetic toolbox, along with some of their theoretical and practical limitations. It is shown that Bayesian methods are under heavy development, as they offer the possibility to solve a number of long-standing issues and to integrate several steps of the phylogenetic analyses into a single framework. Specific topics include not only phylogenetic reconstruction, but also the comparison of phylogenies, the detection of adaptive evolution, and the estimation of divergence times between species
Complex population genetic and demographic history of the Salangid, Neosalanx taihuensis, based on cytochrome b sequences
<p>Abstract</p> <p>Background</p> <p>The Salangid icefish <it>Neosalanx taihuensis </it>(Salangidae) is an economically important fish, which is endemic to China, restricted to large freshwater systems (e.g. lakes, large rivers and estuaries) and typically exhibit low vagility. The continuous distribution ranges from the temperate region of the Huai and Yellow River basins to the subtropical region of the Pearl River basin. This wide ranging distribution makes the species an ideal model for the study of palaeoclimatic effects on population genetic structure and phylogeography. Here, we aim to analyze population genetic differentiation within and between river basins and demographic history in order to understand how this species responded to severe climatic oscillations, decline of the sea levels during the Pleistocene ice ages and tectonic activity.</p> <p>Results</p> <p>We obtained the complete mtDNA cytochrome <it>b </it>sequences (1141 bp) of 354 individuals from 13 populations in the Pearl River, the Yangze River and the Huai River basin. Thirty-six haplotypes were detected. Haplotype frequency distributions were strongly skewed, with most haplotypes (n = 24) represented only in single samples each and thus restricted to a single population. The most common haplotype (H36) was found in 49.15% of all individuals. Analysis of molecular variance (AMOVA) revealed a random pattern in the distribution of genetic diversity, which is inconsistent with contemporary hydrological structure. Significant levels of genetic subdivision were detected among populations within basins rather than between the three basins. Demographic analysis revealed that the population size in the Pearl River basin has remained relatively constant whereas the populations in the Yangze River and the Huai River basins expanded about 221 and 190 kyr ago, respectively, with the majority of mutations occurring after the last glacial maximum (LGM).</p> <p>Conclusion</p> <p>The observed complex genetic pattern of <it>N. taihuensis </it>is coherent with a scenario of multiple unrelated founding events by long-distance colonization and dispersal combined with contiguous population expansion and locally restricted gene flow. We also found that this species was likely severely impacted by past glaciations. More favourable climate and the formation of large suitable habitations together facilitated population expansion after the late Quaternary (especially the LGM). We proposed that all populations should be managed and conserved separately, especially for habitat protection.</p
- …