42 research outputs found
Efficient counting of k-mers in DNA sequences using a bloom filter
<p>Abstract</p> <p>Background</p> <p>Counting <it>k</it>-mers (substrings of length <it>k </it>in DNA sequence data) is an essential component of many methods in bioinformatics, including for genome and transcriptome assembly, for metagenomic sequencing, and for error correction of sequence reads. Although simple in principle, counting <it>k</it>-mers in large modern sequence data sets can easily overwhelm the memory capacity of standard computers. In current data sets, a large fraction-often more than 50%-of the storage capacity may be spent on storing <it>k</it>-mers that contain sequencing errors and which are typically observed only a single time in the data. These singleton <it>k</it>-mers are uninformative for many algorithms without some kind of error correction.</p> <p>Results</p> <p>We present a new method that identifies all the <it>k</it>-mers that occur more than once in a DNA sequence data set. Our method does this using a Bloom filter, a probabilistic data structure that stores all the observed <it>k</it>-mers implicitly in memory with greatly reduced memory requirements. We then make a second sweep through the data to provide exact counts of all nonunique <it>k</it>-mers. For example data sets, we report up to 50% savings in memory usage compared to current software, with modest costs in computational speed. This approach may reduce memory requirements for any algorithm that starts by counting <it>k</it>-mers in sequence data with errors.</p> <p>Conclusions</p> <p>A reference implementation for this methodology, BFCounter, is written in C++ and is GPL licensed. It is available for free download at <url>http://pritch.bsd.uchicago.edu/bfcounter.html</url></p
Inheritance of Acquired Behaviour Adaptations and Brain Gene Expression in Chickens
Background: Environmental challenges may affect both the exposed individuals and their offspring. We investigated possible adaptive aspects of such cross-generation transmissions, and hypothesized that chronic unpredictable food access would cause chickens to show a more conservative feeding strategy and to be more dominant, and that these adaptations would be transmitted to the offspring. Methodology/Principal Findings: Parents were raised in an unpredictable (UL) or in predictable diurnal light rhythm (PL, 12:12 h light:dark). In a foraging test, UL birds pecked more at freely available, rather than at hidden and more attractive food, compared to birds from the PL group. Female offspring of UL birds, raised in predictable light conditions without parental contact, showed a similar foraging behavior, differing from offspring of PL birds. Furthermore, adult offspring of UL birds performed more food pecks in a dominance test, showed a higher preference for high energy food, survived better, and were heavier than offspring of PL parents. Using cDNA microarrays, we found that the differential brain gene expression caused by the challenge was mirrored in the offspring. In particular, several immunoglobulin genes seemed to be affected similarly in both UL parents and their offspring. Estradiol levels were significantly higher in egg yolk from UL birds, suggesting one possible mechanism for these effects. Conclusions/Significance: Our findings suggest that unpredictable food access caused seemingly adaptive responses in feeding behavior, which may have been transmitted to the offspring by means of epigenetic mechanisms, including regulation of immune genes. This may have prepared the offspring for coping with an unpredictable environment. Citation: Nätt D, Lindqvist N, Stranneheim H, Lundeberg J, Torjesen PA, et al. (2009) Inheritance of Acquired Behaviour Adaptations and Brain Gene Expression in Chickens. PLoS ONE 4(7): e6405. doi:10.1371/journal.pone.0006405 Editor: Tom Pizzari, University of Oxford, United Kingdom Received: March 26, 2009; Accepted: June 30, 2009; Published: July 28, 2009 Copyright: © 2009 Nätt et al. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. Funding: This project was funded by the Swedish Research Council (VR; www.vr.se; grant nrs 50280101 and 50280102) and the Swedish Research Council for Environment, Agricultural Sciences and Spatial Planning (Formas; www.formas.se; grant no 221-2005-270). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the mauscript. Competing interests: The authors have declared that no competing interests exist.  Original Publication:Daniel Nätt, Niclas Lindqvist, Henrik Stranneheim, Joakim Lundeberg, Peter A. Torjesen and Per Jensen, Inheritance of Acquired Behaviour Adaptions and Brain Gene Expression in Chickens, 2009, PLoS ONE, (4), 7, e6405.http://dx.doi.org/10.1371/journal.pone.0006405Copyright: Author
An international effort towards developing standards for best practices in analysis, interpretation and reporting of clinical genome sequencing results in the CLARITY Challenge
There is tremendous potential for genome sequencing to improve clinical diagnosis and care once it becomes routinely accessible, but this will require formalizing research methods into clinical best practices in the areas of sequence data generation, analysis, interpretation and reporting. The CLARITY Challenge was designed to spur convergence in methods for diagnosing genetic disease starting from clinical case history and genome sequencing data. DNA samples were obtained from three families with heritable genetic disorders and genomic sequence data were donated by sequencing platform vendors. The challenge was to analyze and interpret these data with the goals of identifying disease-causing variants and reporting the findings in a clinically useful format. Participating contestant groups were solicited broadly, and an independent panel of judges evaluated their performance.
RESULTS:
A total of 30 international groups were engaged. The entries reveal a general convergence of practices on most elements of the analysis and interpretation process. However, even given this commonality of approach, only two groups identified the consensus candidate variants in all disease cases, demonstrating a need for consistent fine-tuning of the generally accepted methods. There was greater diversity of the final clinical report content and in the patient consenting process, demonstrating that these areas require additional exploration and standardization.
CONCLUSIONS:
The CLARITY Challenge provides a comprehensive assessment of current practices for using genome sequencing to diagnose and report genetic diseases. There is remarkable convergence in bioinformatic techniques, but medical interpretation and reporting are areas that require further development by many groups