1,180 research outputs found
A randomized approach to speed up the analysis of large-scale read-count data in the application of CNV detection
Abstract Background The application of high-throughput sequencing in a broad range of quantitative genomic assays (e.g., DNA-seq, ChIP-seq) has created a high demand for the analysis of large-scale read-count data. Typically, the genome is divided into tiling windows and windowed read-count data is generated for the entire genome from which genomic signals are detected (e.g. copy number changes in DNA-seq, enrichment peaks in ChIP-seq). For accurate analysis of read-count data, many state-of-the-art statistical methods use generalized linear models (GLM) coupled with the negative-binomial (NB) distribution by leveraging its ability for simultaneous bias correction and signal detection. However, although statistically powerful, the GLM+NB method has a quadratic computational complexity and therefore suffers from slow running time when applied to large-scale windowed read-count data. In this study, we aimed to speed up substantially the GLM+NB method by using a randomized algorithm and we demonstrate here the utility of our approach in the application of detecting copy number variants (CNVs) using a real example. Results We propose an efficient estimator, the randomized GLM+NB coefficients estimator (RGE), for speeding up the GLM+NB method. RGE samples the read-count data and solves the estimation problem on a smaller scale. We first theoretically validated the consistency and the variance properties of RGE. We then applied RGE to GENSENG, a GLM+NB based method for detecting CNVs. We named the resulting method as “R-GENSENG". Based on extensive evaluation using both simulated and empirical data, we concluded that R-GENSENG is ten times faster than the original GENSENG while maintaining GENSENG’s accuracy in CNV detection. Conclusions Our results suggest that RGE strategy developed here could be applied to other GLM+NB based read-count analyses, i.e. ChIP-seq data analysis, to substantially improve their computational efficiency while preserving the analytic power
Visual Impairment and Blindness
Blindness and vision impairment affect at least 2.2 billion people worldwide with most individuals having a preventable vision impairment. The majority of people with vision impairment are older than 50 years, however, vision loss can affect people of all ages. Reduced eyesight can have major and long-lasting effects on all aspects of life, including daily personal activities, interacting with the community, school and work opportunities, and the ability to access public services. This book provides an overview of the effects of blindness and visual impairment in the context of the most common causes of blindness in older adults as well as children, including retinal disorders, cataracts, glaucoma, and macular or corneal degeneration
Bayesian localization of CNV candidates in WGS data within minutes
Background: Full Bayesian inference for detecting copy number variants (CNV) from whole-genome sequencing (WGS) data is still largely infeasible due to computational demands. A recently introduced approach to perform Forward-Backward Gibbs sampling using dynamic Haar wavelet compression has alleviated issues of convergence and, to some extent, speed. Yet, the problem remains challenging in practice. Results: In this paper, we propose an improved algorithmic framework for this approach. We provide new space-efficient data structures to query sufficient statistics in logarithmic time, based on a linear-Time, in-place transform of the data, which also improves on the compression ratio. We also propose a new approach to efficiently store and update marginal state counts obtained from the Gibbs sampler. Conclusions: Using this approach, we discover several CNV candidates in two rat populations divergently selected for tame and aggressive behavior, consistent with earlier results concerning the domestication syndrome as well as experimental observations. Computationally, we observe a 29.5-fold decrease in memory, an average 5.8-fold speedup, as well as a 191-fold decrease in minor page faults. We also observe that metrics varied greatly in the old implementation, but not the new one. We conjecture that this is due to the better compression scheme. The fully Bayesian segmentation of the entire WGS data set required 3.5 min and 1.24 GB of memory, and can hence be performed on a commodity laptop
Integrated Genomics Of Susceptiblity To Therapy-Related Leukemia
Therapy-related acute myeloid leukemia t-AML is a secondary, generally incurable, malignancy attributable to the chemotherapeutic treatment of an initial disease. Although there is a genetic component to susceptibility to therapy-related leukemias in mice, little is understood either about the contributing loci, or the mechanisms by which susceptibility factors mediate their effect. An improved understanding of susceptibility factors and the biological processes in which they act may lead to the development of t-AML prevention strategies. In this thesis work, we identified expression networks that are associated with t-AML susceptibility in mice. These networks are robust in that they emerge from distinct methods of analysis and from different gene expression data sets of hematopoietic stem and progenitor lineages. These networks are enriched in genes involved in cell cycle and DNA repair, suggesting that these processes play a role in susceptibility. By integrating gene expression and genetic information we prioritized network nodes for experimental validation as contributors to expression networks and t-AML susceptibility. Network analysis and node prioritization required a comprehensive map of genetic variation in mouse, which was not available at the outset of this thesis work. Specifically, DNA copy number variations: CNVs), defined as genomic sequences that are polymorphic in copy number and range in length from 1,000 to several million base pairs, were largely uncharacterized in inbred mice. We developed a computational approach, Washington University Hidden Markov Model: wuHMM), to identify CNVs from high-density array comparative genomic hybridization data, accounting for the high degree of polymorphism that occur between mouse strains. Using wuHMM we analyzed the copy number content of the mouse genome: 20 strains) to a sub-10-kb resolution, finding over 1,300 CNV-regions: CNVRs), most of which are \u3c 10 kb in length, are found in more than one strain, and span 3.2%: 85 Mb) of the reference genome. These CNVRs, along with haplotype blocks we derived from publicly available SNP data, were integrated into susceptibility expression network analysis. In addition to addressing questions regarding t-MDS/AML susceptibility, we also used this data to assess the potential functional impact of copy number variation by mapping expression profiles to CNVRs. In hematopoietic stem and progenitor cells, up to 28% of strain-dependent expression variation is associated with copy number variation, supporting the role of germline CNVs as key contributors to natural phenotypic variation
Detect Copy Number Variations from Read-depth of High-throughput Sequencing Data
Copy-number variation (CNV) is a major form of genetic variation and a risk factor for various human diseases, so it is crucial to accurately detect and characterize CNVs. High-throughput sequencing (HTS) technologies promise to revolutionize CNV detection but present substantial analytic challenges. This dissertation investigates improving the CNV detection using HTS data mainly from the following aspects. It is observed that various sources of experimental biases in HTS confound read-depth estimation, and bias correction has not been adequately addressed by existing methods. This dissertation presents a novel read-depth-based method, GENSENG, which identify regions of discrete copy-number changes while simultaneously accounting for the effects of multiple confounders. It is conceivable that allele-specific reads from HTS data could be leveraged to both enhance CNV detection as well as produce allele-specific copy number (ASCN) calls. Although statistical methods have been developed to detect CNVs using whole-genome sequence (WGS) and/or whole-exome sequence (WES) data, information from allele-specific read counts has not yet been adequately exploited. This dissertation presents an integrated method, called AS-GENSENG, which incorporates allele-specific read counts in CNV detection and estimates ASCN using either WGS or WES data. Although statistically powerful, the GLM+NB method used in GENSENG and AS-GENSENG has a quadric computational complexity and therefore suffers from slow running time when applied to large-scale sequencing data. This dissertation aims to substantially speed up the GLM+NB method by using a randomized algorithm and demonstrate the utility of our approach by providing R-GENSENG, a speeded up version of GENSENG.Doctor of Philosoph
Recommended from our members
Gene Copy Number Variation in Natural Populations of Plasmodium falciparum
Gene copy number variants (CNVs), which consist of gene deletions and amplifications contribute to the great diversity in the Plasmodium falciparum genome. CNVs may influence the expression of genes and hence may affect important parasite phenotypes such as virulence, drug resistance, persistence and transmissibility. The hypothesis underlying the studies in this thesis is that CNVs may be important for adaptation of the parasite to its variable environments. To investigate this hypothesis, a population wide survey of CNVs in 183 fresh field isolates from four populations with different transmission intensities was conducted. To detect CNVs, comparative genome hybridization was performed using a 70mer microarray. This is the first large scale survey for CNVs in natural populations of parasites. A total of 98 different CNVs, consisting of 225 genes, were identified. Various systematic aspects that could affect detection of CNVs were explored and the population of origin of the isolate was found to be the only factor that affects CNV detection. Some of these CNVs showed high differentiation in frequency between populations suggestive of the action of directional selection. Other CNVs showed no or low differentiation in frequencies between populations, indicative of action of neutral evolutionary processes. Validation of the CNVs identified using microarrays was done using whole genome sequencing. Very low concordance was observed between the CNVs identified by the two technologies. These differences may be attributed to technical and analytic differences between the two technologies. Furthermore, the effect of CNVs on gene expression levels was analysed. A number of CNVs were found to be significantly associated (positively or negatively) with the expression levels of genes located inside and also outside the CNVs
Age-Related Macular Degeneration and Diabetic Retinopathy
This reprint includes contributions from leaders in the field of personalized medicine in ophthalmology. The contributions are diverse and cover pre-clinical and clinical topics. We hope you enjoy reading the articles
Data analysis methods for copy number discovery and interpretation
Copy
number
variation
(CNV)
is
an
important
type
of
genetic
variation
that
can
give
rise
to
a
wide
variety
of
phenotypic
traits.
Differences
in
copy
number
are
thought
to
play
major
roles
in
processes
that
involve
dosage
sensitive
genes,
providing
beneficial,
deleterious
or
neutral
modifications
to
individual
phenotypes.
Copy
number
analysis
has
long
been
a
standard
in
clinical
cytogenetic
laboratories.
Gene
deletions
and
duplications
can
often
be
linked
with
genetic
Syndromes
such
as:
the
7q11.23
deletion
of
Williams-‐Bueren
Syndrome,
the
22q11
deletion
of
DiGeorge
syndrome
and
the
17q11.2
duplication
of
Potocki-‐Lupski
syndrome.
Interestingly,
copy
number
based
genomic
disorders
often
display
reciprocal
deletion
/
duplication
syndromes,
with
the
latter
frequently
exhibiting
milder
symptoms.
Moreover,
the
study
of
chromosomal
imbalances
plays
a
key
role
in
cancer
research.
The
datasets
used
for
the
development
of
analysis
methods
during
this
project
are
generated
as
part
of
the
cutting-‐edge
translational
project,
Deciphering
Developmental
Disorders
(DDD).
This
project,
the
DDD,
is
the
first
of
its
kind
and
will
directly
apply
state
of
the
art
technologies,
in
the
form
of
ultra-‐high
resolution
microarray
and
next
generation
sequencing
(NGS),
to
real-‐time
genetic
clinical
practice.
It
is
collaboration
between
the
Wellcome
Trust
Sanger
Institute
(WTSI)
and
the
National
Health
Service
(NHS)
involving
the
24
regional
genetic
services
across
the
UK
and
Ireland.
Although
the
application
of
DNA
microarrays
for
the
detection
of
CNVs
is
well
established,
individual
change
point
detection
algorithms
often
display
variable
performances.
The
definition
of
an
optimal
set
of
parameters
for
achieving
a
certain
level
of
performance
is
rarely
straightforward,
especially
where
data
qualities
vary ... [cont.]
Development and analysis of the Software Implemented Fault-Tolerance (SIFT) computer
SIFT (Software Implemented Fault Tolerance) is an experimental, fault-tolerant computer system designed to meet the extreme reliability requirements for safety-critical functions in advanced aircraft. Errors are masked by performing a majority voting operation over the results of identical computations, and faulty processors are removed from service by reassigning computations to the nonfaulty processors. This scheme has been implemented in a special architecture using a set of standard Bendix BDX930 processors, augmented by a special asynchronous-broadcast communication interface that provides direct, processor to processor communication among all processors. Fault isolation is accomplished in hardware; all other fault-tolerance functions, together with scheduling and synchronization are implemented exclusively by executive system software. The system reliability is predicted by a Markov model. Mathematical consistency of the system software with respect to the reliability model has been partially verified, using recently developed tools for machine-aided proof of program correctness
- …