23,957 research outputs found

    A computational method for estimating the PCR duplication rate in DNA and RNA-seq experiments.

    Get PDF
    BackgroundPCR amplification is an important step in the preparation of DNA sequencing libraries prior to high-throughput sequencing. PCR amplification introduces redundant reads in the sequence data and estimating the PCR duplication rate is important to assess the frequency of such reads. Existing computational methods do not distinguish PCR duplicates from "natural" read duplicates that represent independent DNA fragments and therefore, over-estimate the PCR duplication rate for DNA-seq and RNA-seq experiments.ResultsIn this paper, we present a computational method to estimate the average PCR duplication rate of high-throughput sequence datasets that accounts for natural read duplicates by leveraging heterozygous variants in an individual genome. Analysis of simulated data and exome sequence data from the 1000 Genomes project demonstrated that our method can accurately estimate the PCR duplication rate on paired-end as well as single-end read datasets which contain a high proportion of natural read duplicates. Further, analysis of exome datasets prepared using the Nextera library preparation method indicated that 45-50% of read duplicates correspond to natural read duplicates likely due to fragmentation bias. Finally, analysis of RNA-seq datasets from individuals in the 1000 Genomes project demonstrated that 70-95% of read duplicates observed in such datasets correspond to natural duplicates sampled from genes with high expression and identified outlier samples with a 2-fold greater PCR duplication rate than other samples.ConclusionsThe method described here is a useful tool for estimating the PCR duplication rate of high-throughput sequence datasets and for assessing the fraction of read duplicates that correspond to natural read duplicates. An implementation of the method is available at https://github.com/vibansal/PCRduplicates

    Progressive Mauve: Multiple alignment of genomes with gene flux and rearrangement

    Full text link
    Multiple genome alignment remains a challenging problem. Effects of recombination including rearrangement, segmental duplication, gain, and loss can create a mosaic pattern of homology even among closely related organisms. We describe a method to align two or more genomes that have undergone large-scale recombination, particularly genomes that have undergone substantial amounts of gene gain and loss (gene flux). The method utilizes a novel alignment objective score, referred to as a sum-of-pairs breakpoint score. We also apply a probabilistic alignment filtering method to remove erroneous alignments of unrelated sequences, which are commonly observed in other genome alignment methods. We describe new metrics for quantifying genome alignment accuracy which measure the quality of rearrangement breakpoint predictions and indel predictions. The progressive genome alignment algorithm demonstrates markedly improved accuracy over previous approaches in situations where genomes have undergone realistic amounts of genome rearrangement, gene gain, loss, and duplication. We apply the progressive genome alignment algorithm to a set of 23 completely sequenced genomes from the genera Escherichia, Shigella, and Salmonella. The 23 enterobacteria have an estimated 2.46Mbp of genomic content conserved among all taxa and total unique content of 15.2Mbp. We document substantial population-level variability among these organisms driven by homologous recombination, gene gain, and gene loss. Free, open-source software implementing the described genome alignment approach is available from http://gel.ahabs.wisc.edu/mauve .Comment: Revision dated June 19, 200

    Wales report : review of employment and skills

    Get PDF

    Refactoring, reengineering and evolution: paths to Geant4 uncertainty quantification and performance improvement

    Full text link
    Ongoing investigations for the improvement of Geant4 accuracy and computational performance resulting by refactoring and reengineering parts of the code are discussed. Issues in refactoring that are specific to the domain of physics simulation are identified and their impact is elucidated. Preliminary quantitative results are reported.Comment: To be published in the Proc. CHEP (Computing in High Energy Physics) 201

    Making Every Dollar Count: How Expected Return Can Transform Philanthropy

    Get PDF
    Describes the benefits and methods of a quantitative process for evaluating potential program investments -- based on benefit, likelihood of success, the foundation's contribution, and cost -- to maximize return on resources

    A Dynamic Knowledge Management Framework for the High Value Manufacturing Industry

    Get PDF
    Dynamic Knowledge Management (KM) is a combination of cultural and technological factors, including the cultural factors of people and their motivations, technological factors of content and infrastructure and, where these both come together, interface factors. In this paper a Dynamic KM framework is described in the context of employees being motivated to create profit for their company through product development in high value manufacturing. It is reported how the framework was discussed during a meeting of the collaborating company’s (BAE Systems) project stakeholders. Participants agreed the framework would have most benefit at the start of the product lifecycle before key decisions were made. The framework has been designed to support organisational learning and to reward employees that improve the position of the company in the market place

    Development of a Security-Focused Multi-Channel Communication Protocol and Associated Quality of Secure Service (QoSS) Metrics

    Get PDF
    The threat of eavesdropping, and the challenge of recognizing and correcting for corrupted or suppressed information in communication systems is a consistent challenge. Effectively managing protection mechanisms requires an ability to accurately gauge the likelihood or severity of a threat, and adapt the security features available in a system to mitigate the threat. This research focuses on the design and development of a security-focused communication protocol at the session-layer based on a re-prioritized communication architecture model and associated metrics. From a probabilistic model that considers data leakage and data corruption as surrogates for breaches of confidentiality and integrity, a set of metrics allows the direct and repeatable quantification of the security available in single- or multi-channel networks. The quantification of security is based directly upon the probabilities that adversarial listeners and malicious disruptors are able to gain access to or change the original message. Fragmenting data across multiple channels demonstrates potential improvements to confidentiality, while duplication improves the integrity of the data against disruptions. Finally, the model and metrics are exercised in simulation. The ultimate goal is to minimize the information available to adversaries

    Overregulation of Health Care: Musings on Disruptive Innovation Theory

    Get PDF
    Disruptive innovation theory provides one lens through which to describe how regulations may stifle innovation and increase costs. Basing their discussion on this theory, Curtis and Schulman consider some of the effects that regulatory controls may have on innovation in the health sector

    An evaluation framework to drive future evolution of a research prototype

    Get PDF
    The Open Source Component Artefact Repository (OSCAR) requires evaluation to confirm its suitability as a development environment for distributed software engineers. The evaluation will take note of several factors including usability of OSCAR as a stand-alone system, scalability and maintainability of the system and novel features not provided by existing artefact management systems. Additionally, the evaluation design attempts to address some of the omissions (due to time constraints) from the industrial partner evaluations. This evaluation is intended to be a prelude to the evaluation of the awareness support being added to OSCAR; thus establishing a baseline to which the effects of awareness support may be compared
    • …
    corecore