1,180 research outputs found

    A randomized approach to speed up the analysis of large-scale read-count data in the application of CNV detection

    Get PDF
    Abstract Background The application of high-throughput sequencing in a broad range of quantitative genomic assays (e.g., DNA-seq, ChIP-seq) has created a high demand for the analysis of large-scale read-count data. Typically, the genome is divided into tiling windows and windowed read-count data is generated for the entire genome from which genomic signals are detected (e.g. copy number changes in DNA-seq, enrichment peaks in ChIP-seq). For accurate analysis of read-count data, many state-of-the-art statistical methods use generalized linear models (GLM) coupled with the negative-binomial (NB) distribution by leveraging its ability for simultaneous bias correction and signal detection. However, although statistically powerful, the GLM+NB method has a quadratic computational complexity and therefore suffers from slow running time when applied to large-scale windowed read-count data. In this study, we aimed to speed up substantially the GLM+NB method by using a randomized algorithm and we demonstrate here the utility of our approach in the application of detecting copy number variants (CNVs) using a real example. Results We propose an efficient estimator, the randomized GLM+NB coefficients estimator (RGE), for speeding up the GLM+NB method. RGE samples the read-count data and solves the estimation problem on a smaller scale. We first theoretically validated the consistency and the variance properties of RGE. We then applied RGE to GENSENG, a GLM+NB based method for detecting CNVs. We named the resulting method as “R-GENSENG". Based on extensive evaluation using both simulated and empirical data, we concluded that R-GENSENG is ten times faster than the original GENSENG while maintaining GENSENG’s accuracy in CNV detection. Conclusions Our results suggest that RGE strategy developed here could be applied to other GLM+NB based read-count analyses, i.e. ChIP-seq data analysis, to substantially improve their computational efficiency while preserving the analytic power

    Visual Impairment and Blindness

    Get PDF
    Blindness and vision impairment affect at least 2.2 billion people worldwide with most individuals having a preventable vision impairment. The majority of people with vision impairment are older than 50 years, however, vision loss can affect people of all ages. Reduced eyesight can have major and long-lasting effects on all aspects of life, including daily personal activities, interacting with the community, school and work opportunities, and the ability to access public services. This book provides an overview of the effects of blindness and visual impairment in the context of the most common causes of blindness in older adults as well as children, including retinal disorders, cataracts, glaucoma, and macular or corneal degeneration

    Massively parallel sequencing in preimplantation and prenatal genetic diagnosis

    Get PDF

    Bayesian localization of CNV candidates in WGS data within minutes

    Get PDF
    Background: Full Bayesian inference for detecting copy number variants (CNV) from whole-genome sequencing (WGS) data is still largely infeasible due to computational demands. A recently introduced approach to perform Forward-Backward Gibbs sampling using dynamic Haar wavelet compression has alleviated issues of convergence and, to some extent, speed. Yet, the problem remains challenging in practice. Results: In this paper, we propose an improved algorithmic framework for this approach. We provide new space-efficient data structures to query sufficient statistics in logarithmic time, based on a linear-Time, in-place transform of the data, which also improves on the compression ratio. We also propose a new approach to efficiently store and update marginal state counts obtained from the Gibbs sampler. Conclusions: Using this approach, we discover several CNV candidates in two rat populations divergently selected for tame and aggressive behavior, consistent with earlier results concerning the domestication syndrome as well as experimental observations. Computationally, we observe a 29.5-fold decrease in memory, an average 5.8-fold speedup, as well as a 191-fold decrease in minor page faults. We also observe that metrics varied greatly in the old implementation, but not the new one. We conjecture that this is due to the better compression scheme. The fully Bayesian segmentation of the entire WGS data set required 3.5 min and 1.24 GB of memory, and can hence be performed on a commodity laptop

    Integrated Genomics Of Susceptiblity To Therapy-Related Leukemia

    Get PDF
    Therapy-related acute myeloid leukemia t-AML is a secondary, generally incurable, malignancy attributable to the chemotherapeutic treatment of an initial disease. Although there is a genetic component to susceptibility to therapy-related leukemias in mice, little is understood either about the contributing loci, or the mechanisms by which susceptibility factors mediate their effect. An improved understanding of susceptibility factors and the biological processes in which they act may lead to the development of t-AML prevention strategies. In this thesis work, we identified expression networks that are associated with t-AML susceptibility in mice. These networks are robust in that they emerge from distinct methods of analysis and from different gene expression data sets of hematopoietic stem and progenitor lineages. These networks are enriched in genes involved in cell cycle and DNA repair, suggesting that these processes play a role in susceptibility. By integrating gene expression and genetic information we prioritized network nodes for experimental validation as contributors to expression networks and t-AML susceptibility. Network analysis and node prioritization required a comprehensive map of genetic variation in mouse, which was not available at the outset of this thesis work. Specifically, DNA copy number variations: CNVs), defined as genomic sequences that are polymorphic in copy number and range in length from 1,000 to several million base pairs, were largely uncharacterized in inbred mice. We developed a computational approach, Washington University Hidden Markov Model: wuHMM), to identify CNVs from high-density array comparative genomic hybridization data, accounting for the high degree of polymorphism that occur between mouse strains. Using wuHMM we analyzed the copy number content of the mouse genome: 20 strains) to a sub-10-kb resolution, finding over 1,300 CNV-regions: CNVRs), most of which are \u3c 10 kb in length, are found in more than one strain, and span 3.2%: 85 Mb) of the reference genome. These CNVRs, along with haplotype blocks we derived from publicly available SNP data, were integrated into susceptibility expression network analysis. In addition to addressing questions regarding t-MDS/AML susceptibility, we also used this data to assess the potential functional impact of copy number variation by mapping expression profiles to CNVRs. In hematopoietic stem and progenitor cells, up to 28% of strain-dependent expression variation is associated with copy number variation, supporting the role of germline CNVs as key contributors to natural phenotypic variation

    Detect Copy Number Variations from Read-depth of High-throughput Sequencing Data

    Get PDF
    Copy-number variation (CNV) is a major form of genetic variation and a risk factor for various human diseases, so it is crucial to accurately detect and characterize CNVs. High-throughput sequencing (HTS) technologies promise to revolutionize CNV detection but present substantial analytic challenges. This dissertation investigates improving the CNV detection using HTS data mainly from the following aspects. It is observed that various sources of experimental biases in HTS confound read-depth estimation, and bias correction has not been adequately addressed by existing methods. This dissertation presents a novel read-depth-based method, GENSENG, which identify regions of discrete copy-number changes while simultaneously accounting for the effects of multiple confounders. It is conceivable that allele-specific reads from HTS data could be leveraged to both enhance CNV detection as well as produce allele-specific copy number (ASCN) calls. Although statistical methods have been developed to detect CNVs using whole-genome sequence (WGS) and/or whole-exome sequence (WES) data, information from allele-specific read counts has not yet been adequately exploited. This dissertation presents an integrated method, called AS-GENSENG, which incorporates allele-specific read counts in CNV detection and estimates ASCN using either WGS or WES data. Although statistically powerful, the GLM+NB method used in GENSENG and AS-GENSENG has a quadric computational complexity and therefore suffers from slow running time when applied to large-scale sequencing data. This dissertation aims to substantially speed up the GLM+NB method by using a randomized algorithm and demonstrate the utility of our approach by providing R-GENSENG, a speeded up version of GENSENG.Doctor of Philosoph

    Age-Related Macular Degeneration and Diabetic Retinopathy

    Get PDF
    This reprint includes contributions from leaders in the field of personalized medicine in ophthalmology. The contributions are diverse and cover pre-clinical and clinical topics. We hope you enjoy reading the articles

    Data analysis methods for copy number discovery and interpretation

    Get PDF
    Copy number variation (CNV) is an important type of genetic variation that can give rise to a wide variety of phenotypic traits. Differences in copy number are thought to play major roles in processes that involve dosage sensitive genes, providing beneficial, deleterious or neutral modifications to individual phenotypes. Copy number analysis has long been a standard in clinical cytogenetic laboratories. Gene deletions and duplications can often be linked with genetic Syndromes such as: the 7q11.23 deletion of Williams-­‐Bueren Syndrome, the 22q11 deletion of DiGeorge syndrome and the 17q11.2 duplication of Potocki-­‐Lupski syndrome. Interestingly, copy number based genomic disorders often display reciprocal deletion / duplication syndromes, with the latter frequently exhibiting milder symptoms. Moreover, the study of chromosomal imbalances plays a key role in cancer research. The datasets used for the development of analysis methods during this project are generated as part of the cutting-­‐edge translational project, Deciphering Developmental Disorders (DDD). This project, the DDD, is the first of its kind and will directly apply state of the art technologies, in the form of ultra-­‐high resolution microarray and next generation sequencing (NGS), to real-­‐time genetic clinical practice. It is collaboration between the Wellcome Trust Sanger Institute (WTSI) and the National Health Service (NHS) involving the 24 regional genetic services across the UK and Ireland. Although the application of DNA microarrays for the detection of CNVs is well established, individual change point detection algorithms often display variable performances. The definition of an optimal set of parameters for achieving a certain level of performance is rarely straightforward, especially where data qualities vary ... [cont.]

    Development and analysis of the Software Implemented Fault-Tolerance (SIFT) computer

    Get PDF
    SIFT (Software Implemented Fault Tolerance) is an experimental, fault-tolerant computer system designed to meet the extreme reliability requirements for safety-critical functions in advanced aircraft. Errors are masked by performing a majority voting operation over the results of identical computations, and faulty processors are removed from service by reassigning computations to the nonfaulty processors. This scheme has been implemented in a special architecture using a set of standard Bendix BDX930 processors, augmented by a special asynchronous-broadcast communication interface that provides direct, processor to processor communication among all processors. Fault isolation is accomplished in hardware; all other fault-tolerance functions, together with scheduling and synchronization are implemented exclusively by executive system software. The system reliability is predicted by a Markov model. Mathematical consistency of the system software with respect to the reliability model has been partially verified, using recently developed tools for machine-aided proof of program correctness
    corecore