306 research outputs found

    Ancient Admixture into Africa from the ancestors of non-Africans

    No full text

    Accurate reconstruction of insertion-deletion histories by statistical phylogenetics

    Get PDF
    The Multiple Sequence Alignment (MSA) is a computational abstraction that represents a partial summary either of indel history, or of structural similarity. Taking the former view (indel history), it is possible to use formal automata theory to generalize the phylogenetic likelihood framework for finite substitution models (Dayhoff's probability matrices and Felsenstein's pruning algorithm) to arbitrary-length sequences. In this paper, we report results of a simulation-based benchmark of several methods for reconstruction of indel history. The methods tested include a relatively new algorithm for statistical marginalization of MSAs that sums over a stochastically-sampled ensemble of the most probable evolutionary histories. For mammalian evolutionary parameters on several different trees, the single most likely history sampled by our algorithm appears less biased than histories reconstructed by other MSA methods. The algorithm can also be used for alignment-free inference, where the MSA is explicitly summed out of the analysis. As an illustration of our method, we discuss reconstruction of the evolutionary histories of human protein-coding genes.Comment: 28 pages, 15 figures. arXiv admin note: text overlap with arXiv:1103.434

    Reliability of panel-based mutational signatures for immune-checkpoint-inhibition efficacy prediction in non-small cell lung cancer

    Get PDF
    OBJECTIVES: Mutational signatures (MS) are gaining traction for deriving therapeutic insights for immune checkpoint inhibition (ICI). We asked if MS attributions from comprehensive targeted sequencing assays are reliable enough for predicting ICI efficacy in non-small cell lung cancer (NSCLC).METHODS: Somatic mutations of m = 126 patients were assayed using panel-based sequencing of 523 cancer-related genes. In silico simulations of MS attributions for various panels were performed on a separate dataset of m = 101 whole genome sequenced patients. Non-synonymous mutations were deconvoluted using COSMIC v3.3 signatures and used to test a previously published machine learning classifier.RESULTS: The ICI efficacy predictor performed poorly with an accuracy of 0.51 -0.09 +0.09, average precision of 0.52 -0.11 +0.11, and an area under the receiver operating characteristic curve of 0.50 -0.09 +0.10. Theoretical arguments, experimental data, and in silico simulations pointed to false negative rates (FNR) related to panel size. A secondary effect was observed, where deconvolution of small ensembles of point mutations lead to reconstruction errors and misattributions. CONCLUSION: MS attributions from current targeted panel sequencing are not reliable enough to predict ICI efficacy. We suggest that, for downstream classification tasks in NSCLC, signature attributions be based on whole exome or genome sequencing instead.</p

    The variant call format and VCFtools

    Get PDF
    Summary: The variant call format (VCF) is a generic format for storing DNA polymorphism data such as SNPs, insertions, deletions and structural variants, together with rich annotations. VCF is usually stored in a compressed manner and can be indexed for fast data retrieval of variants from a range of positions on the reference genome. The format was developed for the 1000 Genomes Project, and has also been adopted by other projects such as UK10K, dbSNP and the NHLBI Exome Project. VCFtools is a software suite that implements various utilities for processing VCF files, including validation, merging, comparing and also provides a general Perl API

    Resonances in a spring-pendulum: algorithms for equivariant singularity theory

    Get PDF
    A spring-pendulum in resonance is a time-independent Hamiltonian model system for formal reduction to one degree of freedom, where some symmetry (reversibility) is maintained. The reduction is handled by equivariant singularity theory with a distinguished parameter, yielding an integrable approximation of the Poincaré map. This makes a concise description of certain bifurcations possible. The computation of reparametrizations from normal form to the actual system is performed by Gröbner basis techniques.

    Short and long-read genome sequencing methodologies for somatic variant detection; genomic analysis of a patient with diffuse large B-cell lymphoma

    Get PDF
    Recent advances in throughput and accuracy mean that the Oxford Nanopore Technologies PromethiON platform is a now a viable solution for genome sequencing. Much of the validation of bioinformatic tools for this long-read data has focussed on calling germline variants (including structural variants). Somatic variants are outnumbered many-fold by germline variants and their detection is further complicated by the effects of tumour purity/subclonality. Here, we evaluate the extent to which Nanopore sequencing enables detection and analysis of somatic variation. We do this through sequencing tumour and germline genomes for a patient with diffuse B-cell lymphoma and comparing results with 150 bp short-read sequencing of the same samples. Calling germline single nucleotide variants (SNVs) from specific chromosomes of the long-read data achieved good specificity and sensitivity. However, results of somatic SNV calling highlight the need for the development of specialised joint calling algorithms. We find the comparative genome-wide performance of different tools varies significantly between structural variant types, and suggest long reads are especially advantageous for calling large somatic deletions and duplications. Finally, we highlight the utility of long reads for phasing clinically relevant variants, confirming that a somatic 1.6 Mb deletion and a p.(Arg249Met) mutation involving TP53 are oriented in trans

    Whole-genome sequencing of bladder cancers reveals somatic CDKN1A mutations and clinicopathological associations with mutation burden

    Get PDF
    Bladder cancers are a leading cause of death from malignancy. Molecular markers might predict disease progression and behaviour more accurately than the available prognostic factors. Here we use whole-genome sequencing to identify somatic mutations and chromosomal changes in 14 bladder cancers of different grades and stages. As well as detecting the known bladder cancer driver mutations, we report the identification of recurrent protein-inactivating mutations in CDKN1A and FAT1. The former are not mutually exclusive with TP53 mutations or MDM2 amplification, showing that CDKN1A dysfunction is not simply an alternative mechanism for p53 pathway inactivation. We find strong positive associations between higher tumour stage/grade and greater clonal diversity, the number of somatic mutations and the burden of copy number changes. In principle, the identification of sub-clones with greater diversity and/or mutation burden within early-stage or low-grade tumours could identify lesions with a high risk of invasive progression

    Multi-level evidence of an allelic hierarchy of USH2A variants in hearing, auditory processing and speech/language outcomes.

    Get PDF
    Language development builds upon a complex network of interacting subservient systems. It therefore follows that variations in, and subclinical disruptions of, these systems may have secondary effects on emergent language. In this paper, we consider the relationship between genetic variants, hearing, auditory processing and language development. We employ whole genome sequencing in a discovery family to target association and gene x environment interaction analyses in two large population cohorts; the Avon Longitudinal Study of Parents and Children (ALSPAC) and UK10K. These investigations indicate that USH2A variants are associated with altered low-frequency sound perception which, in turn, increases the risk of developmental language disorder. We further show that Ush2a heterozygote mice have low-level hearing impairments, persistent higher-order acoustic processing deficits and altered vocalizations. These findings provide new insights into the complexity of genetic mechanisms serving language development and disorders and the relationships between developmental auditory and neural systems

    Alignment and Prediction of cis-Regulatory Modules Based on a Probabilistic Model of Evolution

    Get PDF
    Cross-species comparison has emerged as a powerful paradigm for predicting cis-regulatory modules (CRMs) and understanding their evolution. The comparison requires reliable sequence alignment, which remains a challenging task for less conserved noncoding sequences. Furthermore, the existing models of DNA sequence evolution generally do not explicitly treat the special properties of CRM sequences. To address these limitations, we propose a model of CRM evolution that captures different modes of evolution of functional transcription factor binding sites (TFBSs) and the background sequences. A particularly novel aspect of our work is a probabilistic model of gains and losses of TFBSs, a process being recognized as an important part of regulatory sequence evolution. We present a computational framework that uses this model to solve the problems of CRM alignment and prediction. Our alignment method is similar to existing methods of statistical alignment but uses the conserved binding sites to improve alignment. Our CRM prediction method deals with the inherent uncertainties of binding site annotations and sequence alignment in a probabilistic framework. In simulated as well as real data, we demonstrate that our program is able to improve both alignment and prediction of CRM sequences over several state-of-the-art methods. Finally, we used alignments produced by our program to study binding site conservation in genome-wide binding data of key transcription factors in the Drosophila blastoderm, with two intriguing results: (i) the factor-bound sequences are under strong evolutionary constraints even if their neighboring genes are not expressed in the blastoderm and (ii) binding sites in distal bound sequences (relative to transcription start sites) tend to be more conserved than those in proximal regions. Our approach is implemented as software, EMMA (Evolutionary Model-based cis-regulatory Module Analysis), ready to be applied in a broad biological context

    Unlocking the bottleneck in forward genetics using whole-genome sequencing and identity by descent to isolate causative mutations

    No full text
    Forward genetics screens with N-ethyl-N-nitrosourea (ENU) provide a powerful way to illuminate gene function and generate mouse models of human disease; however, the identification of causative mutations remains a limiting step. Current strategies depend on conventional mapping, so the propagation of affected mice requires non-lethal screens; accurate tracking of phenotypes through pedigrees is complex and uncertain; out-crossing can introduce unexpected modifiers; and Sanger sequencing of candidate genes is inefficient. Here we show how these problems can be efficiently overcome using whole-genome sequencing (WGS) to detect the ENU mutations and then identify regions that are identical by descent (IBD) in multiple affected mice. In this strategy, we use a modification of the Lander-Green algorithm to isolate causative recessive and dominant mutations, even at low coverage, on a pure strain background. Analysis of the IBD regions also allows us to calculate the ENU mutation rate (1.54 mutations per Mb) and to model future strategies for genetic screens in mice. The introduction of this approach will accelerate the discovery of causal variants, permit broader and more informative lethal screens to be used, reduce animal costs, and herald a new era for ENU mutagenesis.The High-Throughput Genomics Group at the Wellcome Trust Centre for Human Genetics is funded by Wellcome Trust grant reference 090532/Z/09/Z and MRC Hub grant G0900747 91070. This study was supported by Wellcome Trust Strategic Award 082030 (CCG), Wellcome Trust Studentship 094446/Z/10/Z (KRB), the Oxford NIHR Biomedical Research Centre, and the MRC Human Immunology Unit (RJC). AJR and GL were supported by Wellcome Trust grant 090532/Z/ 09/Z, CCG and AE by a Major initiative Award from the Clive and Vera Ramaciotti Foundation, and AE by an NHMRC Career Development Award. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript
    corecore