322 research outputs found

    BFAST: An Alignment Tool for Large Scale Genome Resequencing

    Get PDF
    BACKGROUND:The new generation of massively parallel DNA sequencers, combined with the challenge of whole human genome resequencing, result in the need for rapid and accurate alignment of billions of short DNA sequence reads to a large reference genome. Speed is obviously of great importance, but equally important is maintaining alignment accuracy of short reads, in the 25-100 base range, in the presence of errors and true biological variation. METHODOLOGY:We introduce a new algorithm specifically optimized for this task, as well as a freely available implementation, BFAST, which can align data produced by any of current sequencing platforms, allows for user-customizable levels of speed and accuracy, supports paired end data, and provides for efficient parallel and multi-threaded computation on a computer cluster. The new method is based on creating flexible, efficient whole genome indexes to rapidly map reads to candidate alignment locations, with arbitrary multiple independent indexes allowed to achieve robustness against read errors and sequence variants. The final local alignment uses a Smith-Waterman method, with gaps to support the detection of small indels. CONCLUSIONS:We compare BFAST to a selection of large-scale alignment tools -- BLAT, MAQ, SHRiMP, and SOAP -- in terms of both speed and accuracy, using simulated and real-world datasets. We show BFAST can achieve substantially greater sensitivity of alignment in the context of errors and true variants, especially insertions and deletions, and minimize false mappings, while maintaining adequate speed compared to other current methods. We show BFAST can align the amount of data needed to fully resequence a human genome, one billion reads, with high sensitivity and accuracy, on a modest computer cluster in less than 24 hours. BFAST is available at (http://bfast.sourceforge.net)

    Assessing the capacity of local ecosystems to meet industrial demand for ecosystem services

    Get PDF
    Despite the importance of ecosystems, engineering activities continue to ignore or greatly undervalue their role. Consequently, engineered systems often overshoot nature's capacity to support them, causing ecological degradation. Such systems tend to be inherently unsustainable, and they often fail to benefit from nature's ability to provide essential goods and services. This work explores the idea of including ecosystems in chemical processes, and assesses whether such a techno-ecological synergistic system can operate within ecological constraints. The demand for ecosystem services is quantified by emissions and resources used, while the supply is provided by ecosystems on the manufacturing site. Application to a biodiesel manufacturing site demonstrates that ecosystems can be economically and environmentally superior to conventional technologies for making progress toward zero emissions and net positive impact manufacturing. These results highlight the need for shifting the paradigm of engineering from that of dominating nature to embracing nature and respecting its limits

    Caught you: threats to confidentiality due to the public release of large-scale genetic data sets

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Large-scale genetic data sets are frequently shared with other research groups and even released on the Internet to allow for secondary analysis. Study participants are usually not informed about such data sharing because data sets are assumed to be anonymous after stripping off personal identifiers.</p> <p>Discussion</p> <p>The assumption of anonymity of genetic data sets, however, is tenuous because genetic data are intrinsically self-identifying. Two types of re-identification are possible: the "Netflix" type and the "profiling" type. The "Netflix" type needs another small genetic data set, usually with less than 100 SNPs but including a personal identifier. This second data set might originate from another clinical examination, a study of leftover samples or forensic testing. When merged to the primary, unidentified set it will re-identify all samples of that individual.</p> <p>Even with no second data set at hand, a "profiling" strategy can be developed to extract as much information as possible from a sample collection. Starting with the identification of ethnic subgroups along with predictions of body characteristics and diseases, the asthma kids case as a real-life example is used to illustrate that approach.</p> <p>Summary</p> <p>Depending on the degree of supplemental information, there is a good chance that at least a few individuals can be identified from an anonymized data set. Any re-identification, however, may potentially harm study participants because it will release individual genetic disease risks to the public.</p

    Local alignment of two-base encoded DNA sequence

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>DNA sequence comparison is based on optimal local alignment of two sequences using a similarity score. However, some new DNA sequencing technologies do not directly measure the base sequence, but rather an encoded form, such as the two-base encoding considered here. In order to compare such data to a reference sequence, the data must be decoded into sequence. The decoding is deterministic, but the possibility of measurement errors requires searching among all possible error modes and resulting alignments to achieve an optimal balance of fewer errors versus greater sequence similarity.</p> <p>Results</p> <p>We present an extension of the standard dynamic programming method for local alignment, which simultaneously decodes the data and performs the alignment, maximizing a similarity score based on a weighted combination of errors and edits, and allowing an affine gap penalty. We also present simulations that demonstrate the performance characteristics of our two base encoded alignment method and contrast those with standard DNA sequence alignment under the same conditions.</p> <p>Conclusion</p> <p>The new local alignment algorithm for two-base encoded data has substantial power to properly detect and correct measurement errors while identifying underlying sequence variants, and facilitating genome re-sequencing efforts based on this form of sequence data.</p

    U87MG Decoded: The Genomic Sequence of a Cytogenetically Aberrant Human Cancer Cell Line

    Get PDF
    U87MG is a commonly studied grade IV glioma cell line that has been analyzed in at least 1,700 publications over four decades. In order to comprehensively characterize the genome of this cell line and to serve as a model of broad cancer genome sequencing, we have generated greater than 30× genomic sequence coverage using a novel 50-base mate paired strategy with a 1.4kb mean insert library. A total of 1,014,984,286 mate-end and 120,691,623 single-end two-base encoded reads were generated from five slides. All data were aligned using a custom designed tool called BFAST, allowing optimal color space read alignment and accurate identification of DNA variants. The aligned sequence reads and mate-pair information identified 35 interchromosomal translocation events, 1,315 structural variations (>100 bp), 191,743 small (<21 bp) insertions and deletions (indels), and 2,384,470 single nucleotide variations (SNVs). Among these observations, the known homozygous mutation in PTEN was robustly identified, and genes involved in cell adhesion were overrepresented in the mutated gene list. Data were compared to 219,187 heterozygous single nucleotide polymorphisms assayed by Illumina 1M Duo genotyping array to assess accuracy: 93.83% of all SNPs were reliably detected at filtering thresholds that yield greater than 99.99% sequence accuracy. Protein coding sequences were disrupted predominantly in this cancer cell line due to small indels, large deletions, and translocations. In total, 512 genes were homozygously mutated, including 154 by SNVs, 178 by small indels, 145 by large microdeletions, and 35 by interchromosomal translocations to reveal a highly mutated cell line genome. Of the small homozygously mutated variants, 8 SNVs and 99 indels were novel events not present in dbSNP. These data demonstrate that routine generation of broad cancer genome sequence is possible outside of genome centers. The sequence analysis of U87MG provides an unparalleled level of mutational resolution compared to any cell line to date

    First administration to man of Org 25435, an intravenous anaesthetic: A Phase 1 Clinical Trial

    Get PDF
    BACKGROUND: Org 25435 is a new water-soluble alpha-amino acid ester intravenous anaesthetic which proved satisfactory in animal studies. This study aimed to assess the safety, tolerability and efficacy of Org 25435 and to obtain preliminary pharmacodynamic and pharmacokinetic data. METHODS: In the Short Infusion study 8 healthy male volunteers received a 1 minute infusion of 0.25, 0.5, 1.0, or 2.0 mg/kg (n = 2 per group); a further 10 received 3.0 mg/kg (n = 5) or 4.0 mg/kg (n = 5). Following preliminary pharmacokinetic modelling 7 subjects received a titrated 30 minute Target Controlled Infusion (TCI), total dose 5.8-20 mg/kg. RESULTS: Within the Short Infusion study, all subjects were successfully anaesthetised at 3 and 4 mg/kg. Within the TCI study 5 subjects were anaesthetised and 2 showed signs of sedation. Org 25435 caused hypotension and tachycardia at doses over 2 mg/kg. Recovery from anaesthesia after a 30 min administration of Org 25435 was slow (13.7 min). Pharmacokinetic modelling suggests that the context sensitive half-time of Org 25435 is slightly shorter than that of propofol in infusions up to 20 minutes but progressively longer thereafter. CONCLUSIONS: Org 25435 is an effective intravenous anaesthetic in man at doses of 3 and 4 mg/kg given over 1 minute. Longer infusions can maintain anaesthesia but recovery is slow. Hypotension and tachycardia during anaesthesia and slow recovery of consciousness after cessation of drug administration suggest this compound has no advantages over currently available intravenous anaesthetics

    Routes for breaching and protecting genetic privacy

    Full text link
    We are entering the era of ubiquitous genetic information for research, clinical care, and personal curiosity. Sharing these datasets is vital for rapid progress in understanding the genetic basis of human diseases. However, one growing concern is the ability to protect the genetic privacy of the data originators. Here, we technically map threats to genetic privacy and discuss potential mitigation strategies for privacy-preserving dissemination of genetic data.Comment: Draft for comment

    How can photo sharing inspire sharing genomes?

    Get PDF
    People usually are aware of the privacy risks of publish-ing photos online, but these risks are less evident when sharing humangenomes. Modern photos and sequenced genomes are both digital rep-resentations of real lives. They contain private information that maycompromise people’s privacy, and still, their highest value is most oftimes achieved only when sharing them with others. In this work, wepresent an analogy between the privacy aspects of sharing photos andsharing genomes, which clarifies the privacy risks in the latter to thegeneral public. Additionally, we illustrate an alternative informed modelto share genomic data according to the privacy-sensitivity level of eachportion. This article is a call to arms for a collaborative work between ge-neticists and security experts to build more effective methods to system-atically protect privacy, whilst promoting the accessibility and sharingof genome

    Cloud-Assisted Read Alignment and Privacy

    Get PDF
    Thanks to the rapid advances in sequencing technologies, genomic data is now being produced at an unprecedented rate. To adapt to this growth, several algorithms and paradigm shifts have been proposed to increase the throughput of the classical DNA workflow, e.g. by relying on the cloud to perform CPU intensive operations. However, the scientific community raised an alarm due to the possible privacy-related attacks that can be executed on genomic data. In this paper we review the state of the art in cloud-based alignment algorithms that have been developed for performance. We then present several privacy-preserving mechanisms that have been, or could be, used to align reads at an incremental performance cost. We finally argue for the use of risk analysis throughout the DNA workflow, to strike a balance between performance and protection of data

    Public Access to Genome-Wide Data: Five Views on Balancing Research with Privacy and Protection

    Get PDF
    Introductory paragraph: Just over twelve months ago, PLoS Genetics published a paper [1] demonstrating that, given genome-wide genotype data from an individual, it is, in principle, possible to ascertain whether that individual is a member of a larger group defined solely by aggregate genotype frequencies, such as a forensic sample or a cohort of participants in a genome-wide association study (GWAS). As a consequence, the National Institutes of Health (NIH) and Wellcome Trust agreed to shut down public access not just to individual genotype data but even to aggregate genotype frequency data from each study published using their funding. Reactions to this decision span the full breadth of opinion, from ‘‘too little, too late—the public trust has been breached’’ to ‘‘a heavy-handed bureaucratic response to a practically minimal risk that will unnecessarily inhibit scientific research.’’ Scientific concerns have also been raised over the conditions under which individual identity can truly be accurately determined from GWAS data. These concerns are addressed in two papers published in this month’s issue of PLoS Genetics [2,3]. We received several submissions on this topic and decided to assemble these viewpoints as a contribution to the debate and ask readers to contribute their thoughts through the PLoS online commentary features. Five viewpoints are included. The Public Population Project in Genomics (P3G) is calling for a universal researcher ID with an access permit mechanism for bona fide researchers. The contribution by Catherine Heeney, Naomi Hawkins, Jantina de Vries, Paula Boddington, and Jane Kaye of the University of Oxford Ethox Centre outlines some of the concerns over possible misuse of individual identification in conjunction with medical and family history data, and points out that if geneticists mishandle public trust, it will backfire on their ability to conduct further research. George Church posits that actions directed toward restricting data access are likely to exclude researchers who might provide the most novel insights into the data and instead makes the argument that full disclosure and consent to the release of genomic information should be sought from study participants, rather than making difficult-to-guarantee promises of anonymity. Martin Bobrow weighs the risks and benefits and proposes four steps that represent a middle ground: Retain restricted access for now, make malicious de-identification practices illegal, increase public awareness of the issues, and encourage recognition that scientists have a special professional relationship of trust with study participants. Finally, Bruce Weir provides a commentary on the contribution of the two research articles from Braun et al. [2] and Visscher and Hill [3]
    corecore