14 research outputs found
Harvard Personal Genome Project: lessons from participatory public research
Background: Since its initiation in 2005, the Harvard Personal Genome Project has enrolled thousands of volunteers interested in publicly sharing their genome, health and trait data. Because these data are highly identifiable, we use an ‘open consent’ framework that purposefully excludes promises about privacy and requires participants to demonstrate comprehension prior to enrollment. Discussion Our model of non-anonymous, public genomes has led us to a highly participatory model of researcher-participant communication and interaction. The participants, who are highly committed volunteers, self-pursue and donate research-relevant datasets, and are actively engaged in conversations with both our staff and other Personal Genome Project participants. We have quantitatively assessed these communications and donations, and report our experiences with returning research-grade whole genome data to participants. We also observe some of the community growth and discussion that has occurred related to our project. Summary We find that public non-anonymous data is valuable and leads to a participatory research model, which we encourage others to consider. The implementation of this model is greatly facilitated by web-based tools and methods and participant education. Project results are long-term proactive participant involvement and the growth of a community that benefits both researchers and participants
Recommended from our members
Highly parallel assays of tissue-specific enhancers in whole Drosophila embryos
Transcriptional enhancers are a primary mechanism by which tissue-specific gene expression is achieved. Despite the importance of these regulatory elements in development, responses to environmental stresses, and disease, testing enhancer activity in animals remains tedious, with a minority of enhancers having been characterized. Here, we have developed ‘enhancer-FACS-Seq’ (eFS) technology for highly parallel identification of active, tissue-specific enhancers in Drosophila embryos. Analysis of enhancers identified by eFS to be active in mesodermal tissues revealed enriched DNA binding site motifs of known and putative, novel mesodermal transcription factors (TFs). Naïve Bayes classifiers using TF binding site motifs accurately predicted mesodermal enhancer activity. Application of eFS to other cell types and organisms should accelerate the cataloging of enhancers and understanding how transcriptional regulation is encoded within them
Recommended from our members
Extensive sequencing of seven human genomes to characterize benchmark reference materials
The Genome in a Bottle Consortium, hosted by the National Institute of Standards and Technology (NIST) is creating reference materials and data for human genome sequencing, as well as methods for genome comparison and benchmarking. Here, we describe a large, diverse set of sequencing data for seven human genomes; five are current or candidate NIST Reference Materials. The pilot genome, NA12878, has been released as NIST RM 8398. We also describe data from two Personal Genome Project trios, one of Ashkenazim Jewish ancestry and one of Chinese ancestry. The data come from 12 technologies: BioNano Genomics, Complete Genomics paired-end and LFR, Ion Proton exome, Oxford Nanopore, Pacific Biosciences, SOLiD, 10X Genomics GemCode WGS, and Illumina exome and WGS paired-end, mate-pair, and synthetic long reads. Cell lines, DNA, and data from these individuals are publicly available. Therefore, we expect these data to be useful for revealing novel information about the human genome and improving sequencing technologies, SNP, indel, and structural variant calling, and de novo assembly
The complete genome and genetic characteristics of SRV-4 isolated from cynomolgus monkeys (Macaca fascicularis)
AbstractAt least 5 serotypes of exogenous simian retrovirus type D (SRV/D) have been found in nonhuman primates, but only SRV-1, 2 and 3 have been completely sequenced. SRV-4 was recovered once from cynomolgus macaques in California in 1984, but its genome sequences are unknown. Here we report the second identification of SRV-4 and its complete genome from infected cynomolgus macaques with Indochinese and Indonesian/Indochinese mixed ancestry. Phylogenetic analysis demonstrated that SRV-4 was distantly related to SRV-1, 2, 3, 5, 6 and 7. SRV/D-T, a new SRV/D recovered in 2005 from cynomolgus monkeys at Tsukuba Primate Center in Japan, clustered with the SRV-4 isolates from California and Texas and was shown to be another occurrence of SRV-4 infection. The repeated occurrence of SRV-4 in cynomolgus monkeys in different areas of the world and across 25years suggests that this species is the natural host of SRV-4
Recommended from our members
The whole genome sequences and experimentally phased haplotypes of over 100 personal genomes
Background: Since the completion of the Human Genome Project in 2003, it is estimated that more than 200,000 individual whole human genomes have been sequenced. A stunning accomplishment in such a short period of time. However, most of these were sequenced without experimental haplotype data and are therefore missing an important aspect of genome biology. In addition, much of the genomic data is not available to the public and lacks phenotypic information. Findings: As part of the Personal Genome Project, blood samples from 184 participants were collected and processed using Complete Genomics’ Long Fragment Read technology. Here, we present the experimental whole genome haplotyping and sequencing of these samples to an average read coverage depth of 100X. This is approximately three-fold higher than the read coverage applied to most whole human genome assemblies and ensures the highest quality results. Currently, 114 genomes from this dataset are freely available in the GigaDB repository and are associated with rich phenotypic data; the remaining 70 should be added in the near future as they are approved through the PGP data release process. For reproducibility analyses, 20 genomes were sequenced at least twice using independent LFR barcoded libraries. Seven genomes were also sequenced using Complete Genomics’ standard non-barcoded library process. In addition, we report 2.6 million high-quality, rare variants not previously identified in the Single Nucleotide Polymorphisms database or the 1000 Genomes Project Phase 3 data. Conclusions: These genomes represent a unique source of haplotype and phenotype data for the scientific community and should help to expand our understanding of human genome evolution and function. Electronic supplementary material The online version of this article (doi:10.1186/s13742-016-0148-z) contains supplementary material, which is available to authorized users