4 research outputs found

    A Surrogate Function for One-Dimensional Phylogenetic Likelihoods

    Full text link
    © The Author 2017. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution. All rights reserved. For permissions, please e-mail: [email protected]. Phylogenetics has seen a steady increase in data set size and substitution model complexity, which require increasing amounts of computational power to compute likelihoods. This motivates strategies to approximate the likelihood functions for branch length optimization and Bayesian sampling. In this article, we develop an approximation to the 1D likelihood function as parametrized by a single branch length. Our method uses a four-parameter surrogate function abstracted from the simplest phylogenetic likelihood function, the binary symmetric model. We show that it offers a surrogate that can be fit over a variety of branch lengths, that it is applicable to a wide variety of models and trees, and that it can be used effectively as a proposal mechanism for Bayesian sampling. The method is implemented as a stand-Alone open-source C library for calling from phylogenetics algorithms; it has proven essential for good performance of our online phylogenetic algorithm sts

    Computational analysis of genetic variation

    No full text
    High throughput sequences are generating increasingly detailed catalogues of genetic variation both in human disease and within the larger population. To effectively utilise this rich data set for maximum research benefit, as a discipline we require robust, flexible, and reproducible analysis pipelines capable of accurately detecting and prioritising variants. While data-specific computational algorithms aimed at deriving accurate data from these technologies have reached maturity, two major challenges remain in order to realise the goals of elucidating the underlying genetic causes of disease as a means of developing custom treatment options. The first challenge is the creation of high-throughput variant detection pipelines able to reliably detect sample variation from a variety of sequence data types. Such a system needs to be scalable, flexible, robust, highly automated, and able to support reproducible analyses in order to support both default and custom variant detection workflows. The second challenge is the effective prioritisation of the huge number of variants detected in each sample, a task required to reduce the large search space for causal variants down to variant lists suitable for manual interrogation. This thesis describes six publications describing components of the larger informatics framework I have developed over the last four years to address these challenges, a framework designed from the onset to effectively manage and process large data sets with an end goal of utilising computational analysis of sequence data to further understand the relationship between genetic variation and human disease. The first publication “Reliably detecting clinically important variants requires both combined variant calls and optimized filtering strategies” describes a variant detection strategy designed to minimize false negative variants as is desired when utilising patient variation data in the clinic. The next four publications describe custom workflows developed for detecting variants in sequence data from different sample types, namely paired cancer samples (“Tumour procurement, DNA extraction, coverage analysis and optimisation of mutation-calling algorithms for human melanoma genomes”), pedigrees (“Reducing the search space for causal genetic variants with VASP: Variant Analysis of Sequenced Pedigrees”), mixed cell populations containing ultra-rare mutations (“DeepSNVMiner: A sequence analysis tool to detect emergent, rare mutations in sub-sets of cell populations”) and mouse exome data containing ENU mutations (“Massively parallel sequencing of the mouse exome to accurately identify rare, induced mutations: an immediate source for thousands of new mouse models”) . The last publication, “Comparison of predicted and actual consequences of missense mutations” focuses on the validation of computational tools that predict functional impact of missense mutations and further attempts to explain why many missense mutations predicted to be damaging do not result in an observable phenotype as might be expected. Collectively these publications detail efforts to reliably detect and prioritise variants across a wide variety of data types, efforts all based around the significant underlying software framework I have developed to better elucidate the link between genetic variation and disease
    corecore