10 research outputs found

    Benchmarking long-read genome sequence alignment tools for human genomics applications

    Get PDF
    Background The utility of long-read genome sequencing platforms has been shown in many fields including whole genome assembly, metagenomics, and amplicon sequencing. Less clear is the applicability of long reads to reference-guided human genomics, which is the foundation of genomic medicine. Here, we benchmark available platform-agnostic alignment tools on datasets from nanopore and single-molecule real-time platforms to understand their suitability in producing a genome representation. Results For this study, we leveraged publicly-available data from sample NA12878 generated on Oxford Nanopore and sample NA24385 on Pacific Biosciences platforms. We employed state of the art sequence alignment tools including GraphMap2, long-read aligner (LRA), Minimap2, CoNvex Gap-cost alignMents for Long Reads (NGMLR), and Winnowmap2. Minimap2 and Winnowmap2 were computationally lightweight enough for use at scale, while GraphMap2 was not. NGMLR took a long time and required many resources, but produced alignments each time. LRA was fast, but only worked on Pacific Biosciences data. Each tool widely disagreed on which reads to leave unaligned, affecting the end genome coverage and the number of discoverable breakpoints. No alignment tool independently resolved all large structural variants (1,001–100,000 base pairs) present in the Database of Genome Variants (DGV) for sample NA12878 or the truthset for NA24385. Conclusions These results suggest a combined approach is needed for LRS alignments for human genomics. Specifically, leveraging alignments from three tools will be more effective in generating a complete picture of genomic variability. It should be best practice to use an analysis pipeline that generates alignments with both Minimap2 and Winnowmap2 as they are lightweight and yield different views of the genome. Depending on the question at hand, the data available, and the time constraints, NGMLR and LRA are good options for a third tool. If computational resources and time are not a factor for a given case or experiment, NGMLR will provide another view, and another chance to resolve a case. LRA, while fast, did not work on the nanopore data for our cluster, but PacBio results were promising in that those computations completed faster than Minimap2. Due to its significant burden on computational resources and slow run time, Graphmap2 is not an ideal tool for exploration of a whole human genome generated on a long-read sequencing platform

    Open Science? Conceptualizing Openness as an Emerging Moral Economy of Science

    Get PDF
    In this paper we aim to address a few of the complexities that revolve around “openness” of science as an emerging moral economy of science. First, we briefly assess the current state of discussion when it comes to Open Science in the academic literature. We show that these discussions have begun a more analytical look at Open Science, yet the term remains tied to opinions and emotional response. Accordingly, we pose that a more distant perspective is needed. We establish that, since openness is the goal of Open Science, it provides a useful term for the coalescing of discussion. Indeed, this term can be used to identify an emerging moral economy within science. Then, we discuss why this is the case - the changing context, as well as the dynamics inherent in science as an enterprise. We finish this article with an initial discussion of how the use of this mode of thinking will impact science and the study of science. This positions us to consider the needs for the study of openness in science as a moral economy, the potential models which could be to assess different interpretations of openness, and finally the questions which this mode of thinking may help us ask

    Using a chat-based informed consent tool in large-scale genomic research

    No full text
    OBJECTIVE: We implemented a chatbot consent tool to shift the time burden from study staff in support of a national genomics research study. MATERIALS AND METHODS: We created an Institutional Review Board-approved script for automated chat-based consent. We compared data from prospective participants who used the tool or had traditional consent conversations with study staff. RESULTS: Chat-based consent, completed on a user\u27s schedule, was shorter than the traditional conversation. This did not lead to a significant change in affirmative consents. Within affirmative consents and declines, more prospective participants completed the chat-based process. A quiz to assess chat-based consent user understanding had a high pass rate with no reported negative experiences. CONCLUSION: Our report shows that a structured script can convey important information while realizing the benefits of automation and burden shifting. Analysis suggests that it may be advantageous to use chatbots to scale this rate-limiting step in large research projects

    Increased diagnostic yield from negative whole genome-slice panels using automated reanalysis

    No full text
    We evaluated the diagnostic yield using genome-slice panel reanalysis in the clinical setting using an automated phenotype/gene ranking system. We analyzed whole genome sequencing (WGS) data produced from clinically ordered panels built as bioinformatic slices for 16 clinically diverse, undiagnosed cases referred to the Pediatric Mendelian Genomics Research Center, an NHGRI-funded GREGoR Consortium site. Genome-wide reanalysis was performed using Moon™, a machine-learning-based tool for variant prioritization. In five out of 16 cases, we discovered a potentially clinically significant variant. In four of these cases, the variant was found in a gene not included in the original panel due to phenotypic expansion of a disorder or incomplete initial phenotyping of the patient. In the fifth case, the gene containing the variant was included in the original panel, but being a complex structural rearrangement with intronic breakpoints outside the clinically analyzed regions, it was not initially identified. Automated genome-wide reanalysis of clinical WGS data generated during targeted panels testing yielded a 25% increase in diagnostic findings and a possibly clinically relevant finding in one additional case, underscoring the added value of analyses versus those routinely performed in the clinical setting

    Baseline human gut microbiota profile in healthy people and standard reporting template

    No full text
    A comprehensive knowledge of the types and ratios of microbes that inhabit the healthy human gut is necessary before any kind of pre-clinical or clinical study can be performed that attempts to alter the microbiome to treat a condition or improve therapy outcome. To address this need we present an innovative scalable comprehensive analysis workflow, a healthy human reference microbiome list and abundance profile (GutFeelingKB), and a novel Fecal Biome Population Report (FecalBiome) with clinical applicability. GutFeelingKB provides a list of 157 organisms (8 phyla, 18 classes, 23 orders, 38 families, 59 genera and 109 species) that forms the baseline biome and therefore can be used as healthy controls for studies related to dysbiosis. This list can be expanded to 863 organisms if closely related proteomes are considered. The incorporation of microbiome science into routine clinical practice necessitates a standard report for comparison of an individual\u27s microbiome to the growing knowledgebase of “normal” microbiome data. The FecalBiome and the underlying technology of GutFeelingKB address this need. The knowledgebase can be useful to regulatory agencies for the assessment of fecal transplant and other microbiome products, as it contains a list of organisms from healthy individuals. In addition to the list of organisms and their abundances, this study also generated a collection of assembled contiguous sequences (contigs) of metagenomics dark matter. In this study, metagenomic dark matter represents sequences that cannot be mapped to any known sequence but can be assembled into contigs of 10,000 nucleotides or higher. These sequences can be used to create primers to study potential novel organisms
    corecore