23,957 research outputs found
A computational method for estimating the PCR duplication rate in DNA and RNA-seq experiments.
BackgroundPCR amplification is an important step in the preparation of DNA sequencing libraries prior to high-throughput sequencing. PCR amplification introduces redundant reads in the sequence data and estimating the PCR duplication rate is important to assess the frequency of such reads. Existing computational methods do not distinguish PCR duplicates from "natural" read duplicates that represent independent DNA fragments and therefore, over-estimate the PCR duplication rate for DNA-seq and RNA-seq experiments.ResultsIn this paper, we present a computational method to estimate the average PCR duplication rate of high-throughput sequence datasets that accounts for natural read duplicates by leveraging heterozygous variants in an individual genome. Analysis of simulated data and exome sequence data from the 1000 Genomes project demonstrated that our method can accurately estimate the PCR duplication rate on paired-end as well as single-end read datasets which contain a high proportion of natural read duplicates. Further, analysis of exome datasets prepared using the Nextera library preparation method indicated that 45-50% of read duplicates correspond to natural read duplicates likely due to fragmentation bias. Finally, analysis of RNA-seq datasets from individuals in the 1000 Genomes project demonstrated that 70-95% of read duplicates observed in such datasets correspond to natural duplicates sampled from genes with high expression and identified outlier samples with a 2-fold greater PCR duplication rate than other samples.ConclusionsThe method described here is a useful tool for estimating the PCR duplication rate of high-throughput sequence datasets and for assessing the fraction of read duplicates that correspond to natural read duplicates. An implementation of the method is available at https://github.com/vibansal/PCRduplicates
Progressive Mauve: Multiple alignment of genomes with gene flux and rearrangement
Multiple genome alignment remains a challenging problem. Effects of
recombination including rearrangement, segmental duplication, gain, and loss
can create a mosaic pattern of homology even among closely related organisms.
We describe a method to align two or more genomes that have undergone
large-scale recombination, particularly genomes that have undergone substantial
amounts of gene gain and loss (gene flux). The method utilizes a novel
alignment objective score, referred to as a sum-of-pairs breakpoint score. We
also apply a probabilistic alignment filtering method to remove erroneous
alignments of unrelated sequences, which are commonly observed in other genome
alignment methods. We describe new metrics for quantifying genome alignment
accuracy which measure the quality of rearrangement breakpoint predictions and
indel predictions. The progressive genome alignment algorithm demonstrates
markedly improved accuracy over previous approaches in situations where genomes
have undergone realistic amounts of genome rearrangement, gene gain, loss, and
duplication. We apply the progressive genome alignment algorithm to a set of 23
completely sequenced genomes from the genera Escherichia, Shigella, and
Salmonella. The 23 enterobacteria have an estimated 2.46Mbp of genomic content
conserved among all taxa and total unique content of 15.2Mbp. We document
substantial population-level variability among these organisms driven by
homologous recombination, gene gain, and gene loss. Free, open-source software
implementing the described genome alignment approach is available from
http://gel.ahabs.wisc.edu/mauve .Comment: Revision dated June 19, 200
Refactoring, reengineering and evolution: paths to Geant4 uncertainty quantification and performance improvement
Ongoing investigations for the improvement of Geant4 accuracy and
computational performance resulting by refactoring and reengineering parts of
the code are discussed. Issues in refactoring that are specific to the domain
of physics simulation are identified and their impact is elucidated.
Preliminary quantitative results are reported.Comment: To be published in the Proc. CHEP (Computing in High Energy Physics)
201
Making Every Dollar Count: How Expected Return Can Transform Philanthropy
Describes the benefits and methods of a quantitative process for evaluating potential program investments -- based on benefit, likelihood of success, the foundation's contribution, and cost -- to maximize return on resources
A Dynamic Knowledge Management Framework for the High Value Manufacturing Industry
Dynamic Knowledge Management (KM) is a combination of cultural and technological factors, including the cultural factors of people and their motivations, technological factors of content and infrastructure and, where these both come together, interface factors. In this paper a Dynamic KM framework is described in the context of employees being motivated to create profit for their company through product development in high value manufacturing. It is reported how the framework was discussed during a meeting of the collaborating company’s (BAE Systems) project stakeholders. Participants agreed the framework would have most benefit at the start of the product lifecycle before key decisions were made. The framework has been designed to support organisational learning and to reward employees that improve the position of the company in the market place
Development of a Security-Focused Multi-Channel Communication Protocol and Associated Quality of Secure Service (QoSS) Metrics
The threat of eavesdropping, and the challenge of recognizing and correcting for corrupted or suppressed information in communication systems is a consistent challenge. Effectively managing protection mechanisms requires an ability to accurately gauge the likelihood or severity of a threat, and adapt the security features available in a system to mitigate the threat. This research focuses on the design and development of a security-focused communication protocol at the session-layer based on a re-prioritized communication architecture model and associated metrics. From a probabilistic model that considers data leakage and data corruption as surrogates for breaches of confidentiality and integrity, a set of metrics allows the direct and repeatable quantification of the security available in single- or multi-channel networks. The quantification of security is based directly upon the probabilities that adversarial listeners and malicious disruptors are able to gain access to or change the original message. Fragmenting data across multiple channels demonstrates potential improvements to confidentiality, while duplication improves the integrity of the data against disruptions. Finally, the model and metrics are exercised in simulation. The ultimate goal is to minimize the information available to adversaries
Overregulation of Health Care: Musings on Disruptive Innovation Theory
Disruptive innovation theory provides one lens through which to describe how regulations may stifle innovation and increase costs. Basing their discussion on this theory, Curtis and Schulman consider some of the effects that regulatory controls may have on innovation in the health sector
An evaluation framework to drive future evolution of a research prototype
The Open Source Component Artefact Repository (OSCAR) requires
evaluation to confirm its suitability as a development environment
for distributed software engineers. The evaluation will take note of
several factors including usability of OSCAR as a stand-alone system,
scalability and maintainability of the system and novel features not
provided by existing artefact management systems. Additionally, the
evaluation design attempts to address some of the omissions (due to
time constraints) from the industrial partner evaluations.
This evaluation is intended to be a prelude to the evaluation of the
awareness support being added to OSCAR; thus establishing a baseline
to which the effects of awareness support may be compared
- …