2,109 research outputs found

    Data congruence, paedomorphosis and salamanders

    Get PDF
    which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. Background: The retention of ancestral juvenile characters by adult stages of descendants is called paedomorphosis. However, this process can mislead phylogenetic analyses based on morphological data, even in combination with molecular data, because the assessment if a character is primary absent or secondary lost is difficult. Thus, the detection of incongruence between morphological and molecular data is necessary to investigate the reliability of simultaneous analyses. Different methods have been proposed to detect data congruence or incongruence. Five of them (PABA, PBS, NDI, LILD, DRI) are used herein to assess incongruence between morphological and molecular data in a case study addressing salamander phylogeny, which comprises several supposedly paedomorphic taxa. Therefore, previously published data sets were compiled herein. Furthermore, two strategies ameliorating effects of paedomorphosis on phylogenetic studies were tested herein using a statistical rigor. Additionally, efficiency of the different methods to assess incongruence was analyzed using this empirical data set. Finally, a test statistic is presented for all these methods except DRI

    Advances in Bayesian modelling of array structured data

    Get PDF
    Data organized in array structures arise in various domains. Each entry of the array serves as a statistical unit, while the dimensions correspond to indexing attributes. The inherent dependence among statistical units along the indexing attributes makes the array representation more suitable than the usual tabular format. Models for this type of data typically employ probabilistic low-rank factorizations, where the latent factors attempt to capture patterns within the indexing attributes responsible for the values of the outcome. It is of primary importance to correctly model the dependence within the latent factors eliciting structural information available from data. Our contribution consists of novel structured Bayesian factorization models for array data, with applications to mortality forecasts and network analysis. We first address the problem of accurately forecasting future death-rate patterns for different age groups and time horizons for a country of interest. This type of data exhibits smooth structures of different natures across ages and years, which we flexibly account for in our model. We propose a novel B-spline process with locally-adaptive dynamic coefficients that outperforms state-of-the-art forecasting strategies by explicitly incorporating the core structures of period mortality trajectories within an interpretable formulation. Next, we consider the problem of learning the underlying structure responsible for the connectivity patterns in the human brain. We analyze a population of networks representing the connections between brain regions for a set of subjects. These networks are characterized by a hierarchical or multiresolution organization of the nodes responsible for the connectivity. We propose a phylogenetic latent position model that effectively learns the multiresolution structure. The model reveals a tree organization of the brain regions coherent with known hemisphere and lobe partitions. Such a result uncovers interesting new possible clusterings of the brain regions at different levels of resolution. Finally, we explore the potential to incorporate additional covariates to inform the tree structure of the model responsible for the latent positions. We have considered two settings of array data that exhibit distinct structural properties. Through Bayesian modelling, we have been able to leverage this information in the form of prior specification. Our results highlight the importance of incorporating these structures appropriately, leading to improved outcomes in both inferential and forecasting problems

    Improving the Reliability of Decision-Support Systems for Nuclear Emergency Management by Leveraging Software Design Diversity

    Get PDF
    This paper introduces a novel method of continuous verification of simulation software used in decision-support systems for nuclear emergency management (DSNE). The proposed approach builds on methods from the field of software reliability engineering, such as N-Version Programming, Recovery Blocks, and Consensus Recovery Blocks. We introduce a new acceptance test for dispersion simulation results and a new voting scheme based on taxonomies of simulation results rather than individual simulation results. The acceptance test and the voter are used in a new scheme, which extends the Consensus Recovery Block method by a database of result taxonomies to support machine-learning. This enables the system to learn how to distinguish correct from incorrect results, with respect to the implemented numerical schemes. Considering that decision-support systems for nuclear emergency management are used in a safety-critical application context, the methods introduced in this paper help improve the reliability of the system and the trustworthiness of the simulation results used by emergency managers in the decision making process. The effectiveness of the approach has been assessed using the atmospheric dispersion forecasts of two test versions of the widely used RODOS DSNE system

    Polyphemus: a Palaeolithic Tale?

    No full text
    International audienceThis paper presents an analysis of 56 variants of European and North American examples of the so-called Polyphemus tale (international tale type ATU 1137) using phylogenetic software according to 190 traits. Discussion addresses a number of points of comparative methodology while considering the historical implications of a relationship between different versions of this tale type recorded in diverse cultures

    Molecular insights to crustacean phylogeny

    Get PDF
    This thesis aims to resolve internal relationships of the major crustacean groups inferring phylogenies with molecular data. New molecular and neuroanatomical data support the scenario that the Hexapoda might have evolved from Crustacea. Most molecular studies of crustaceans relied on single gene or multigene analyses in which for most cases partly sequenced rRNA genes were used. However, intensive data quality and alignment assessments prior to phylogenetic reconstructions are not conducted in most studies. One methodological aim in this thesis was to implement new tools to infer data quality, to improve alignment quality and to test the impact of complex modeling of the data. Two of the three phylogenetic analyses in this thesis are also based on rRNA genes. In analysis (A) 16S rRNA, 18S rRNA and COI sequences were analyzed. RY coding of the COI fragment, an alignment procedure that considers the secondary structure of RNA molecules and the exclusion of alignment positions of ambiguous positional homology was performed to improve data quality. Anyhow, by extensive network reconstructions it was shown that the signal quality in the chosen and commonly used markers is not suitable to infer crustacean phylogeny, despite the extensive data processing and optimization. This result draws a new light on previous studies relying on these markers. In analyses (B) completely sequenced 18S and 28S rRNA genes were used to reconstruct the phylogeny. Base compositional heterogeneity was taken into account based on the finding of analysis (A), additionally to secondary structure alignment optimization and alignment assessment. The complex modeling to compare time-heterogeneous versus time-homogenous processes in combination with mixed models for an implementation of secondary structures was only possible applying the Bayesian software package PHASE. The results clearly demonstrated that complex modeling counts and that ignoring time-heterogeneous processes can mislead phylogenetic reconstructions. Some results enlight the phylogeny of Crustaceans, for the first time the Cephalocarida (Hutchinsoniella macracantha) were placed in a clade with the Branchiopoda, which morphologically is plausible. Compared to the time-homogeneous tree the time-heterogeneous tree gives lower support values for some nodes. It can be suggested, that the incorporation of base compositional heterogeneity in phylogenetic analysis improves the reliability of the topology. The Pancrustacea are supported maximally in both approaches, but internal relations are not reliably reconstructed. One result of this analysis is that the phylogenetic signal in rRNA data might be eroded for crustaceans. Recent publications presented analyses based on phylogenomic data, to reconstruct mainly metazoan phylogeny. The supermatrix method seems to outperform the supertree approach. In this analysis the supermatrix approach was applied. Crustaceans were collected to conduct EST sequencing projects and to include the resulting sequences combined with public sequence data into a phylogenomic analysis (C). New and innovative reduction heuristics were performed to condense the dataset. The results showed that the matrix implementation of the reduced dataset ends in a more reliable topology in which most node values are highly supported. In analysis (C) the Branchiopoda were positioned as sister-group to Hexapoda, a differing result to analysis (A) and (B), but that is in line with other phylogenomic studies

    Split Analysis Methods and Parametric Bootstrapping in Molecular Phylogenetics : Taking a closer look at model adequacy

    Get PDF
    Even though the size of datasets in molecular analyses increased rapidly during the last years, undetected systematic errors as well as unsolved problems concerning the evaluation of data quality and adequate substitution model selection still persist. This not only hampers the correct analysis of these datasets but leads to undetectable effects in phylogenetic tree reconstruction. Model-based tree reconstruction methods like maximum likelihood estimation and Bayesian inference have become the methods of choice for reconstruction of phylogenetic trees. Although maximum likelihood methods are known to be consistent if all necessary conditions are met, it depends strongly on the quality of the multiple sequence alignment and the ability of the chosen evolutionary model to reflect the underlying historical processes. This thesis addresses the assessment of model adequacy of estimated evolutionary models to multiple sequence alignments in the light of parametric bootstrapping and aims to find new methods for detection of model misspecifications with the help of split analyses. The second chapter focuses on the influence of the number of gamma rate categories used in modelling among-site rate variation when trying to assess model adequacy using an absolute goodness-of-fit test. The analyses of simulated alignments show that the Goldmann-Cox test rejects models which were only approximated by four discrete gamma rate categories for various tree shapes and branch length setups, if they were simulated with a continuous gamma distribution. Increasing the number of discrete rate categories leads to an acceptance of model adequacy for stationary datasets and a correct detection of non-stationarity and inhomogenetity in simulated data. The results illustrate that the application of the proposed Goldmann-Cox test to evaluate model adequacy might be too strict and rigorous with empirical data, in particular for large phylogenomic datasets. Approaches such as the Goldman-Cox test evaluate the absolute fit of data and model but, do not deliver a deeper insight into the structure of the misfit. The third chapter presents the visualisation of overrepresented splits within splits graphs, which provides a good tool for gaining an overview of possible patterns and contradictory signal or noise within datasets. The analysis of these split residuals, observed by comparison to parametric bootstrap datasets based on the estimated models can help to gain a deeper insight into model adequacy. Highly overrepresented splits can give hints whether heterotachy applies or non symmetric substitution processes. The fourth chapter aims to define a new split weighting scheme by formalising aspects like 'contrast of character states' or 'character state homogeneity' within split subsets. Splits which are detected by the proposed SAMS (Splits Analysis MethodS) algorithm are re-evaluated for a more objective and formal split weighting. A comparison of the published and the new approach showed that the developed weighting scheme delivers reasonable results but needs further improvement. The development of a new GUI offers a much more capable tool to perform a split analysis and visualise the results. The shape of a visualised split spectra can indicate, whether a dataset delivers a clear split signal or if there is a lot of noise present

    Reassessment of the evolutionary history of the late Triassic and early Jurassic sauropodomorph dinosaurs through comparative cladistics and the supermatrix approach

    Get PDF
    Non-sauropod sauropodomorphs, also known as 'basal sauropodomorphs' or 'prosauropods', have been thoroughly studied in recent years. Several hypotheses on the interrelationships within this group have been proposed, ranging from a complete paraphyly, where the group represents a grade from basal saurischians to Sauropoda, to a group on its own. The grade-like hypothesis is the most accepted; however, the relationships between the different taxa are not consistent amongst the proposed scenarios. These inconsistencies have been attributed to missing data and unstable (i.e., poorly preserved) taxa, nevertheless, an extensive comparative cladistic analysis has found that these inconsistencies instead come from the character coding and character selection, plus the strategies on merging data sets. Furthermore, a detailed character analysis using information theory and mathematical topology as an approach for character delineation is explored here to operationalise characters and reduce the potential impact of missing data. This analysis also produced the largest and most comprehensive matrix after the reassessment and operationalisation of every character applied to this group far. Additionally, partition analyses performed on this data set have found consistencies in the interrelationships within non-sauropod Sauropodomorpha and has found strong support for smaller clades such as Plateosauridae, Riojasauridae, Anchisauridae, Massospondylinae and Lufengosarinae. The results of these analyses also highlight a different scenario on how quadrupedality evolved, independently originating twice within the group, and provide a better framework to understand the palaeo-biogeography and diversification rate of the first herbivore radiation of dinosaurs

    Ancestral sequence reconstruction as an accessible tool for the engineering of biocatalyst stability

    Get PDF
    Synthetic biology is the engineering of life to imbue non-natural functionality. As such, synthetic biology has considerable commercial potential, where synthetic metabolic pathways are utilised to convert low value substrates into high value products. High temperature biocatalysis offers several system-level benefits to synthetic biology, including increased dilution of substrate, increased reaction rates and decreased contamination risk. However, the current gamut of tools available for the engineering of thermostable proteins are either expensive, unreliable, or poorly understood, meaning their adoption into synthetic biology workflows is treacherous. This thesis focuses on the development of an accessible tool for the engineering of protein thermostability, based on the evolutionary biology tool ancestral sequence reconstruction (ASR). ASR allows researchers to walk back in time along the branches of a phylogeny and predict the most likely representation of a protein family’s ancestral state. It also has simple input requirements, and its output proteins are often observed to be thermostable, making ASR tractable to protein engineering. Chapter 2 explores the applicability of multiple ASR methods to the engineering of a carboxylic acid reductase (CAR) biocatalyst. Despite the family emerging only 500 million years ago, ancestors presented considerable improvements in thermostability over their modern counterparts. We proceed to thoroughly characterise the ancestral enzymes for their inclusion into the CAR biocatalytic toolbox. Chapter 3 explores why ASR derived proteins may be thermostable despite a mesophilic history. An in silico toolbox for tracking models of protein stability over simulated evolutionary time at the sequence, protein and population level is built. We provide considerable evidence that the sequence alignments of simulated protein families that evolved at marginal stability are saturated with stabilising residues. ASR therefore derives sequences from a dataset biased toward stabilisation. Importantly, while ASR is accessible, it still requires a steep learning curve based on its requirements of phylogenetic expertise. In chapter 4, we utilise the evolutionary model produced in chapter 3 to develop a highly simplified and accessible ASR protocol. This protocol was then applied to engineer CAR enzymes that displayed dramatic increases in thermostability compared to both modern CARs and the thermostable AncCARs presented in chapter 2
    • …
    corecore