77 research outputs found

    Modeling the evolution space of breakage fusion bridge cycles with a stochastic folding process

    Get PDF
    Breakage-Fusion-Bridge cycles in cancer arise when a broken segment of DNA is duplicated and an end from each copy joined together. This structure then 'unfolds' into a new piece of palindromic DNA. This is one mechanism responsible for the localised amplicons observed in cancer genome data. The process has parallels with paper folding sequences that arise when a piece of paper is folded several times and then unfolded. Here we adapt such methods to study the breakage-fusion-bridge structures in detail. We firstly consider discrete representations of this space with 2-d trees to demonstrate that there are 2^(n(n-1)/2) qualitatively distinct evolutions involving n breakage-fusion-bridge cycles. Secondly we consider the stochastic nature of the fold positions, to determine evolution likelihoods, and also describe how amplicons become localised. Finally we highlight these methods by inferring the evolution of breakage-fusion-bridge cycles with data from primary tissue cancer samples

    cn.FARMS: a latent variable model to detect copy number variations in microarray data with a low false discovery rate

    Get PDF
    Cost-effective oligonucleotide genotyping arrays like the Affymetrix SNP 6.0 are still the predominant technique to measure DNA copy number variations (CNVs). However, CNV detection methods for microarrays overestimate both the number and the size of CNV regions and, consequently, suffer from a high false discovery rate (FDR). A high FDR means that many CNVs are wrongly detected and therefore not associated with a disease in a clinical study, though correction for multiple testing takes them into account and thereby decreases the study's discovery power. For controlling the FDR, we propose a probabilistic latent variable model, ‘cn.FARMS’, which is optimized by a Bayesian maximum a posteriori approach. cn.FARMS controls the FDR through the information gain of the posterior over the prior. The prior represents the null hypothesis of copy number 2 for all samples from which the posterior can only deviate by strong and consistent signals in the data. On HapMap data, cn.FARMS clearly outperformed the two most prevalent methods with respect to sensitivity and FDR. The software cn.FARMS is publicly available as a R package at http://www.bioinf.jku.at/software/cnfarms/cnfarms.html

    Primary brain calcification: an international study reporting novel variants and associated phenotypes.

    Get PDF
    Primary familial brain calcification (PFBC) is a rare cerebral microvascular calcifying disorder with a wide spectrum of motor, cognitive, and neuropsychiatric symptoms. It is typically inherited as an autosomal-dominant trait with four causative genes identified so far: SLC20A2, PDGFRB, PDGFB, and XPR1. Our study aimed at screening the coding regions of these genes in a series of 177 unrelated probands that fulfilled the diagnostic criteria for primary brain calcification regardless of their family history. Sequence variants were classified as pathogenic, likely pathogenic, or of uncertain significance (VUS), based on the ACMG-AMP recommendations. We identified 45 probands (25.4%) carrying either pathogenic or likely pathogenic variants (n = 34, 19.2%) or VUS (n = 11, 6.2%). SLC20A2 provided the highest contribution (16.9%), followed by XPR1 and PDGFB (3.4% each), and PDGFRB (1.7%). A total of 81.5% of carriers were symptomatic and the most recurrent symptoms were parkinsonism, cognitive impairment, and psychiatric disturbances (52.3%, 40.9%, and 38.6% of symptomatic individuals, respectively), with a wide range of age at onset (from childhood to 81 years). While the pathogenic and likely pathogenic variants identified in this study can be used for genetic counseling, the VUS will require additional evidence, such as recurrence in unrelated patients, in order to be classified as pathogenic

    Capturing the cloud of diversity reveals complexity and heterogeneity of MRSA carriage, infection and transmission.

    Get PDF
    Genome sequencing is revolutionizing clinical microbiology and our understanding of infectious diseases. Previous studies have largely relied on the sequencing of a single isolate from each individual. However, it is not clear what degree of bacterial diversity exists within, and is transmitted between individuals. Understanding this 'cloud of diversity' is key to accurate identification of transmission pathways. Here, we report the deep sequencing of methicillin-resistant Staphylococcus aureus among staff and animal patients involved in a transmission network at a veterinary hospital. We demonstrate considerable within-host diversity and that within-host diversity may rise and fall over time. Isolates from invasive disease contained multiple mutations in the same genes, including inactivation of a global regulator of virulence and changes in phage copy number. This study highlights the need for sequencing of multiple isolates from individuals to gain an accurate picture of transmission networks and to further understand the basis of pathogenesis.Thanks to Dr Alex O’Neill, University of Leeds and Dr Matthew Ellington, Public Health England for provision of RN4220 and RN4200mutS. We thank the core sequencing and informatics team at the Wellcome Trust Sanger Institute for sequencing of the isolates described in this study. This work was supported by a Medical Research Council Partnership grant (G1001787/1) held between the Department of Veterinary Medicine, University of Cambridge (M.A.H.), the School of Clinical Medicine, University of Cambridge (S.J.P.), the Moredun Research Institute, and the Wellcome Trust Sanger Institute (J.P. and S.J.P). S.J.P. receives support from the NIHR Cambridge Biomedical Research Centre. M.T.G.H., S.R.H. and J.P. were funded by Wellcome Trust grant no. 098051. G.G.R.M. was funded by an MRC studentship.This is the final version of the article. It first appeared from Nature Publishing Group via http://dx.doi.org/10.1038/ncomms756

    Joint sequencing of human and pathogen genomes reveals the genetics of pneumococcal meningitis.

    Get PDF
    Streptococcus pneumoniae is a common nasopharyngeal colonizer, but can also cause life-threatening invasive diseases such as empyema, bacteremia and meningitis. Genetic variation of host and pathogen is known to play a role in invasive pneumococcal disease, though to what extent is unknown. In a genome-wide association study of human and pathogen we show that human variation explains almost half of variation in susceptibility to pneumococcal meningitis and one-third of variation in severity, identifying variants in CCDC33 associated with susceptibility. Pneumococcal genetic variation explains a large amount of invasive potential (70%), but has no effect on severity. Serotype alone is insufficient to explain invasiveness, suggesting other pneumococcal factors are involved in progression to invasive disease. We identify pneumococcal genes involved in invasiveness including pspC and zmpD, and perform a human-bacteria interaction analysis. These genes are potential candidates for the development of more broadly-acting pneumococcal vaccines

    PGen: large-scale genomic variations analysis workflow and browser in SoyKB

    Full text link
    Background: With the advances in next-generation sequencing (NGS) technology and significant reductions in sequencing costs, it is now possible to sequence large collections of germplasm in crops for detecting genome-scale genetic variations and to apply the knowledge towards improvements in traits. To efficiently facilitate large-scale NGS resequencing data analysis of genomic variations, we have developed " PGen", an integrated and optimized workflow using the Extreme Science and Engineering Discovery Environment (XSEDE) high-performance computing (HPC) virtual system, iPlant cloud data storage resources and Pegasus workflow management system (Pegasus-WMS). The workflow allows users to identify single nucleotide polymorphisms (SNPs) and insertion-deletions (indels), perform SNP annotations and conduct copy number variation analyses on multiple resequencing datasets in a user-friendly and seamless way. Results: We have developed both a Linux version in GitHub (https:// github. com/ pegasus-isi/ PGen-GenomicVariationsWorkflow) and a web-based implementation of the PGen workflow integrated within the Soybean Knowledge Base (SoyKB), (http:// soykb. org/ Pegasus/ index. php). Using PGen, we identified 10,218,140 single-nucleotide polymorphisms (SNPs) and 1,398,982 indels from analysis of 106 soybean lines sequenced at 15X coverage. 297,245 non-synonymous SNPs and 3330 copy number variation (CNV) regions were identified from this analysis. SNPs identified using PGen from additional soybean resequencing projects adding to 500+ soybean germplasm lines in total have been integrated. These SNPs are being utilized for trait improvement using genotype to phenotype prediction approaches developed in-house. In order to browse and access NGS data easily, we have also developed an NGS resequencing data browser (http:// soykb. org/ NGS_ Resequence/ NGS_ index. php) within SoyKB to provide easy access to SNP and downstream analysis results for soybean researchers. Conclusion: PGen workflow has been optimized for the most efficient analysis of soybean data using thorough testing and validation. This research serves as an example of best practices for development of genomics data analysis workflows by integrating remote HPC resources and efficient data management with ease of use for biological users. PGen workflow can also be easily customized for analysis of data in other species.Missouri Soybean Merchandising Council [368]; United Soybean Board [1320-532-5615]This item from the UA Faculty Publications collection is made available by the University of Arizona with support from the University of Arizona Libraries. If you have questions, please contact us at [email protected]

    Community assessment to advance computational prediction of cancer drug combinations in a pharmacogenomic screen

    Get PDF
    The effectiveness of most cancer targeted therapies is short-lived. Tumors often develop resistance that might be overcome with drug combinations. However, the number of possible combinations is vast, necessitating data-driven approaches to find optimal patient-specific treatments. Here we report AstraZeneca’s large drug combination dataset, consisting of 11,576 experiments from 910 combinations across 85 molecularly characterized cancer cell lines, and results of a DREAM Challenge to evaluate computational strategies for predicting synergistic drug pairs and biomarkers. 160 teams participated to provide a comprehensive methodological development and benchmarking. Winning methods incorporate prior knowledge of drug-target interactions. Synergy is predicted with an accuracy matching biological replicates for >60% of combinations. However, 20% of drug combinations are poorly predicted by all methods. Genomic rationale for synergy predictions are identified, including ADAM17 inhibitor antagonism when combined with PIK3CB/D inhibition contrasting to synergy when combined with other PI3K-pathway inhibitors in PIK3CA mutant cells.Peer reviewe
    corecore