Search CORE

1,180 research outputs found

A randomized approach to speed up the analysis of large-scale read-count data in the application of CNV detection

Author: Sun Wei
Szatkiewicz Jin
Wang Wei
Wang WeiBo
Publication venue: BioMed Central
Publication date: 01/03/2018
Field of study

Abstract Background The application of high-throughput sequencing in a broad range of quantitative genomic assays (e.g., DNA-seq, ChIP-seq) has created a high demand for the analysis of large-scale read-count data. Typically, the genome is divided into tiling windows and windowed read-count data is generated for the entire genome from which genomic signals are detected (e.g. copy number changes in DNA-seq, enrichment peaks in ChIP-seq). For accurate analysis of read-count data, many state-of-the-art statistical methods use generalized linear models (GLM) coupled with the negative-binomial (NB) distribution by leveraging its ability for simultaneous bias correction and signal detection. However, although statistically powerful, the GLM+NB method has a quadratic computational complexity and therefore suffers from slow running time when applied to large-scale windowed read-count data. In this study, we aimed to speed up substantially the GLM+NB method by using a randomized algorithm and we demonstrate here the utility of our approach in the application of detecting copy number variants (CNVs) using a real example. Results We propose an efficient estimator, the randomized GLM+NB coefficients estimator (RGE), for speeding up the GLM+NB method. RGE samples the read-count data and solves the estimation problem on a smaller scale. We first theoretically validated the consistency and the variance properties of RGE. We then applied RGE to GENSENG, a GLM+NB based method for detecting CNVs. We named the resulting method as “R-GENSENG". Based on extensive evaluation using both simulated and empirical data, we concluded that R-GENSENG is ten times faster than the original GENSENG while maintaining GENSENG’s accuracy in CNV detection. Conclusions Our results suggest that RGE strategy developed here could be applied to other GLM+NB based read-count analyses, i.e. ChIP-seq data analysis, to substantially improve their computational efficiency while preserving the analytic power

Directory of Open Access Journals

Carolina Digital Repository

eScholarship - University of California

Visual Impairment and Blindness

Author
Publication venue: 'IntechOpen'
Publication date: 20/04/2021
Field of study

Blindness and vision impairment affect at least 2.2 billion people worldwide with most individuals having a preventable vision impairment. The majority of people with vision impairment are older than 50 years, however, vision loss can affect people of all ages. Reduced eyesight can have major and long-lasting effects on all aspects of life, including daily personal activities, interacting with the community, school and work opportunities, and the ability to access public services. This book provides an overview of the effects of blindness and visual impairment in the context of the most common causes of blindness in older adults as well as children, including retinal disorders, cataracts, glaucoma, and macular or corneal degeneration

Directory of Open Access Books (DOAB)

Massively parallel sequencing in preimplantation and prenatal genetic diagnosis

Author: Deleye Lieselot
Publication venue: Ghent University. Faculty of Pharmaceutical Sciences
Publication date: 01/01/2017
Field of study

Ghent University Academic Bibliography

Bayesian localization of CNV candidates in WGS data within minutes

Author: Cagan Alex
Gulevich Rimma
Kozhemyakina Rimma
Schliep Alexander
Wiedenhoeft John
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2019
Field of study

Background: Full Bayesian inference for detecting copy number variants (CNV) from whole-genome sequencing (WGS) data is still largely infeasible due to computational demands. A recently introduced approach to perform Forward-Backward Gibbs sampling using dynamic Haar wavelet compression has alleviated issues of convergence and, to some extent, speed. Yet, the problem remains challenging in practice. Results: In this paper, we propose an improved algorithmic framework for this approach. We provide new space-efficient data structures to query sufficient statistics in logarithmic time, based on a linear-Time, in-place transform of the data, which also improves on the compression ratio. We also propose a new approach to efficiently store and update marginal state counts obtained from the Gibbs sampler. Conclusions: Using this approach, we discover several CNV candidates in two rat populations divergently selected for tame and aggressive behavior, consistent with earlier results concerning the domestication syndrome as well as experimental observations. Computationally, we observe a 29.5-fold decrease in memory, an average 5.8-fold speedup, as well as a 191-fold decrease in minor page faults. We also observe that metrics varied greatly in the old implementation, but not the new one. We conjecture that this is due to the better compression scheme. The fully Bayesian segmentation of the entire WGS data set required 3.5 min and 1.24 GB of memory, and can hence be performed on a commodity laptop

Chalmers Research

MPG.PuRe

Integrated Genomics Of Susceptiblity To Therapy-Related Leukemia

Author: Cahan Patrick
Publication venue: Washington University Open Scholarship
Publication date: 24/05/2009
Field of study

Therapy-related acute myeloid leukemia t-AML is a secondary, generally incurable, malignancy attributable to the chemotherapeutic treatment of an initial disease. Although there is a genetic component to susceptibility to therapy-related leukemias in mice, little is understood either about the contributing loci, or the mechanisms by which susceptibility factors mediate their effect. An improved understanding of susceptibility factors and the biological processes in which they act may lead to the development of t-AML prevention strategies. In this thesis work, we identified expression networks that are associated with t-AML susceptibility in mice. These networks are robust in that they emerge from distinct methods of analysis and from different gene expression data sets of hematopoietic stem and progenitor lineages. These networks are enriched in genes involved in cell cycle and DNA repair, suggesting that these processes play a role in susceptibility. By integrating gene expression and genetic information we prioritized network nodes for experimental validation as contributors to expression networks and t-AML susceptibility. Network analysis and node prioritization required a comprehensive map of genetic variation in mouse, which was not available at the outset of this thesis work. Specifically, DNA copy number variations: CNVs), defined as genomic sequences that are polymorphic in copy number and range in length from 1,000 to several million base pairs, were largely uncharacterized in inbred mice. We developed a computational approach, Washington University Hidden Markov Model: wuHMM), to identify CNVs from high-density array comparative genomic hybridization data, accounting for the high degree of polymorphism that occur between mouse strains. Using wuHMM we analyzed the copy number content of the mouse genome: 20 strains) to a sub-10-kb resolution, finding over 1,300 CNV-regions: CNVRs), most of which are \u3c 10 kb in length, are found in more than one strain, and span 3.2%: 85 Mb) of the reference genome. These CNVRs, along with haplotype blocks we derived from publicly available SNP data, were integrated into susceptibility expression network analysis. In addition to addressing questions regarding t-MDS/AML susceptibility, we also used this data to assess the potential functional impact of copy number variation by mapping expression profiles to CNVRs. In hematopoietic stem and progenitor cells, up to 28% of strain-dependent expression variation is associated with copy number variation, supporting the role of germline CNVs as key contributors to natural phenotypic variation

Washington University St. Louis: Open Scholarship

Detect Copy Number Variations from Read-depth of High-throughput Sequencing Data

Author: Wang Weibo
Publication venue: University of North Carolina at Chapel Hill Graduate School
Publication date: 01/01/2015
Field of study

Copy-number variation (CNV) is a major form of genetic variation and a risk factor for various human diseases, so it is crucial to accurately detect and characterize CNVs. High-throughput sequencing (HTS) technologies promise to revolutionize CNV detection but present substantial analytic challenges. This dissertation investigates improving the CNV detection using HTS data mainly from the following aspects. It is observed that various sources of experimental biases in HTS confound read-depth estimation, and bias correction has not been adequately addressed by existing methods. This dissertation presents a novel read-depth-based method, GENSENG, which identify regions of discrete copy-number changes while simultaneously accounting for the effects of multiple confounders. It is conceivable that allele-specific reads from HTS data could be leveraged to both enhance CNV detection as well as produce allele-specific copy number (ASCN) calls. Although statistical methods have been developed to detect CNVs using whole-genome sequence (WGS) and/or whole-exome sequence (WES) data, information from allele-specific read counts has not yet been adequately exploited. This dissertation presents an integrated method, called AS-GENSENG, which incorporates allele-specific read counts in CNV detection and estimates ASCN using either WGS or WES data. Although statistically powerful, the GLM+NB method used in GENSENG and AS-GENSENG has a quadric computational complexity and therefore suffers from slow running time when applied to large-scale sequencing data. This dissertation aims to substantially speed up the GLM+NB method by using a randomized algorithm and demonstrate the utility of our approach by providing R-GENSENG, a speeded up version of GENSENG.Doctor of Philosoph

Carolina Digital Repository

Recommended from our members

Gene Copy Number Variation in Natural Populations of Plasmodium falciparum

Author: Simam Joan Jebet
Publication venue
Publication date: 01/01/2015
Field of study

Gene copy number variants (CNVs), which consist of gene deletions and amplifications contribute to the great diversity in the Plasmodium falciparum genome. CNVs may influence the expression of genes and hence may affect important parasite phenotypes such as virulence, drug resistance, persistence and transmissibility. The hypothesis underlying the studies in this thesis is that CNVs may be important for adaptation of the parasite to its variable environments. To investigate this hypothesis, a population wide survey of CNVs in 183 fresh field isolates from four populations with different transmission intensities was conducted. To detect CNVs, comparative genome hybridization was performed using a 70mer microarray. This is the first large scale survey for CNVs in natural populations of parasites. A total of 98 different CNVs, consisting of 225 genes, were identified. Various systematic aspects that could affect detection of CNVs were explored and the population of origin of the isolate was found to be the only factor that affects CNV detection. Some of these CNVs showed high differentiation in frequency between populations suggestive of the action of directional selection. Other CNVs showed no or low differentiation in frequencies between populations, indicative of action of neutral evolutionary processes. Validation of the CNVs identified using microarrays was done using whole genome sequencing. Very low concordance was observed between the CNVs identified by the two technologies. These differences may be attributed to technical and analytic differences between the two technologies. Furthermore, the effect of CNVs on gene expression levels was analysed. A number of CNVs were found to be significantly associated (positively or negatively) with the expression levels of genes located inside and also outside the CNVs

Open Research Online (The Open University)

Age-Related Macular Degeneration and Diabetic Retinopathy

Author
Publication venue: 'MDPI AG'
Publication date: 21/06/2022
Field of study

This reprint includes contributions from leaders in the field of personalized medicine in ophthalmology. The contributions are diverse and cover pre-clinical and clinical topics. We hope you enjoy reading the articles

Directory of Open Access Books (DOAB)

Data analysis methods for copy number discovery and interpretation

Author: Fitzgerald Tomas W.
Publication venue: Cranfield University
Publication date: 01/10/2014
Field of study

Copy number variation (CNV) is an important type of genetic variation that can give rise to a wide variety of phenotypic traits. Differences in copy number are thought to play major roles in processes that involve dosage sensitive genes, providing beneficial, deleterious or neutral modifications to individual phenotypes. Copy number analysis has long been a standard in clinical cytogenetic laboratories. Gene deletions and duplications can often be linked with genetic Syndromes such as: the 7q11.23 deletion of Williams-‐Bueren Syndrome, the 22q11 deletion of DiGeorge syndrome and the 17q11.2 duplication of Potocki-‐Lupski syndrome. Interestingly, copy number based genomic disorders often display reciprocal deletion / duplication syndromes, with the latter frequently exhibiting milder symptoms. Moreover, the study of chromosomal imbalances plays a key role in cancer research. The datasets used for the development of analysis methods during this project are generated as part of the cutting-‐edge translational project, Deciphering Developmental Disorders (DDD). This project, the DDD, is the first of its kind and will directly apply state of the art technologies, in the form of ultra-‐high resolution microarray and next generation sequencing (NGS), to real-‐time genetic clinical practice. It is collaboration between the Wellcome Trust Sanger Institute (WTSI) and the National Health Service (NHS) involving the 24 regional genetic services across the UK and Ireland. Although the application of DNA microarrays for the detection of CNVs is well established, individual change point detection algorithms often display variable performances. The definition of an optimal set of parameters for achieving a certain level of performance is rarely straightforward, especially where data qualities vary ... [cont.]

Cranfield CERES

Development and analysis of the Software Implemented Fault-Tolerance (SIFT) computer

Author: Goldberg J.
Green M. W.
Kautz W. H.
Levitt K. N.
Melliar-Smith P. M.
Schwartz R. L.
Weinstock C. B.
Publication venue
Publication date
Field of study

SIFT (Software Implemented Fault Tolerance) is an experimental, fault-tolerant computer system designed to meet the extreme reliability requirements for safety-critical functions in advanced aircraft. Errors are masked by performing a majority voting operation over the results of identical computations, and faulty processors are removed from service by reassigning computations to the nonfaulty processors. This scheme has been implemented in a special architecture using a set of standard Bendix BDX930 processors, augmented by a special asynchronous-broadcast communication interface that provides direct, processor to processor communication among all processors. Fault isolation is accomplished in hardware; all other fault-tolerance functions, together with scheduling and synchronization are implemented exclusively by executive system software. The system reliability is predicted by a Markov model. Mathematical consistency of the system software with respect to the reliability model has been partially verified, using recently developed tools for machine-aided proof of program correctness

NASA Technical Reports Server