Search CORE

10 research outputs found

Benchmarking long-read genome sequence alignment tools for human genomics applications

Author: Emmanuele Delot
Eric Vilain
Jonathan LoTempio
Publication venue: PeerJ Inc.
Publication date: 01/01/2023
Field of study

Background The utility of long-read genome sequencing platforms has been shown in many fields including whole genome assembly, metagenomics, and amplicon sequencing. Less clear is the applicability of long reads to reference-guided human genomics, which is the foundation of genomic medicine. Here, we benchmark available platform-agnostic alignment tools on datasets from nanopore and single-molecule real-time platforms to understand their suitability in producing a genome representation. Results For this study, we leveraged publicly-available data from sample NA12878 generated on Oxford Nanopore and sample NA24385 on Pacific Biosciences platforms. We employed state of the art sequence alignment tools including GraphMap2, long-read aligner (LRA), Minimap2, CoNvex Gap-cost alignMents for Long Reads (NGMLR), and Winnowmap2. Minimap2 and Winnowmap2 were computationally lightweight enough for use at scale, while GraphMap2 was not. NGMLR took a long time and required many resources, but produced alignments each time. LRA was fast, but only worked on Pacific Biosciences data. Each tool widely disagreed on which reads to leave unaligned, affecting the end genome coverage and the number of discoverable breakpoints. No alignment tool independently resolved all large structural variants (1,001–100,000 base pairs) present in the Database of Genome Variants (DGV) for sample NA12878 or the truthset for NA24385. Conclusions These results suggest a combined approach is needed for LRS alignments for human genomics. Specifically, leveraging alignments from three tools will be more effective in generating a complete picture of genomic variability. It should be best practice to use an analysis pipeline that generates alignments with both Minimap2 and Winnowmap2 as they are lightweight and yield different views of the genome. Depending on the question at hand, the data available, and the time constraints, NGMLR and LRA are good options for a third tool. If computational resources and time are not a factor for a given case or experiment, NGMLR will provide another view, and another chance to resolve a case. LRA, while fast, did not work on the nanopore data for our cluster, but PacBio results were promising in that those computations completed faster than Minimap2. Due to its significant burden on computational resources and slow run time, Graphmap2 is not an ideal tool for exploration of a whole human genome generated on a long-read sequencing platform

Directory of Open Access Journals

eScholarship - University of California

Open Science? Conceptualizing Openness as an Emerging Moral Economy of Science

Author: Koenig Thomas
LoTempio Jonathan
Vilain Eric
Publication venue: Verlag der Technischen Universität Graz
Publication date: 01/01/2024
Field of study

In this paper we aim to address a few of the complexities that revolve around “openness” of science as an emerging moral economy of science. First, we briefly assess the current state of discussion when it comes to Open Science in the academic literature. We show that these discussions have begun a more analytical look at Open Science, yet the term remains tied to opinions and emotional response. Accordingly, we pose that a more distant perspective is needed. We establish that, since openness is the goal of Open Science, it provides a useful term for the coalescing of discussion. Indeed, this term can be used to identify an emerging moral economy within science. Then, we discuss why this is the case - the changing context, as well as the dynamics inherent in science as an enterprise. We finish this article with an initial discussion of how the use of this mode of thinking will impact science and the study of science. This positions us to consider the needs for the study of openness in science as a moral economy, the potential models which could be to assess different interpretations of openness, and finally the questions which this mode of thinking may help us ask

IRIHS - Institutional Repository at IHS

Recommended from our members

Benchmarking long-read genome sequence alignment tools for human genomics applications.

Author: Delot Emmanuele
LoTempio Jonathan
Vilain Eric
Publication venue: eScholarship, University of California
Publication date: 01/01/2023
Field of study

BACKGROUND: The utility of long-read genome sequencing platforms has been shown in many fields including whole genome assembly, metagenomics, and amplicon sequencing. Less clear is the applicability of long reads to reference-guided human genomics, which is the foundation of genomic medicine. Here, we benchmark available platform-agnostic alignment tools on datasets from nanopore and single-molecule real-time platforms to understand their suitability in producing a genome representation. RESULTS: For this study, we leveraged publicly-available data from sample NA12878 generated on Oxford Nanopore and sample NA24385 on Pacific Biosciences platforms. We employed state of the art sequence alignment tools including GraphMap2, long-read aligner (LRA), Minimap2, CoNvex Gap-cost alignMents for Long Reads (NGMLR), and Winnowmap2. Minimap2 and Winnowmap2 were computationally lightweight enough for use at scale, while GraphMap2 was not. NGMLR took a long time and required many resources, but produced alignments each time. LRA was fast, but only worked on Pacific Biosciences data. Each tool widely disagreed on which reads to leave unaligned, affecting the end genome coverage and the number of discoverable breakpoints. No alignment tool independently resolved all large structural variants (1,001-100,000 base pairs) present in the Database of Genome Variants (DGV) for sample NA12878 or the truthset for NA24385. CONCLUSIONS: These results suggest a combined approach is needed for LRS alignments for human genomics. Specifically, leveraging alignments from three tools will be more effective in generating a complete picture of genomic variability. It should be best practice to use an analysis pipeline that generates alignments with both Minimap2 and Winnowmap2 as they are lightweight and yield different views of the genome. Depending on the question at hand, the data available, and the time constraints, NGMLR and LRA are good options for a third tool. If computational resources and time are not a factor for a given case or experiment, NGMLR will provide another view, and another chance to resolve a case. LRA, while fast, did not work on the nanopore data for our cluster, but PacBio results were promising in that those computations completed faster than Minimap2. Due to its significant burden on computational resources and slow run time, Graphmap2 is not an ideal tool for exploration of a whole human genome generated on a long-read sequencing platform

eScholarship - University of California

Recommended from our members

Using a chat-based informed consent tool in large-scale genomic research.

Author: Andrew E
Berger Seth
Cohen Andrea
Délot Emmanuèle
Fusaro Vincent
Kahn-Kirby Amanda
LoTempio Jonathan
Mas Gloria
Nussbaum Robert
Pitsava Georgia
Savage Sarah
Smith Erica
Vilain Eric
Publication venue: eScholarship, University of California
Publication date: 18/01/2024
Field of study

OBJECTIVE: We implemented a chatbot consent tool to shift the time burden from study staff in support of a national genomics research study. MATERIALS AND METHODS: We created an Institutional Review Board-approved script for automated chat-based consent. We compared data from prospective participants who used the tool or had traditional consent conversations with study staff. RESULTS: Chat-based consent, completed on a users schedule, was shorter than the traditional conversation. This did not lead to a significant change in affirmative consents. Within affirmative consents and declines, more prospective participants completed the chat-based process. A quiz to assess chat-based consent user understanding had a high pass rate with no reported negative experiences. CONCLUSION: Our report shows that a structured script can convey important information while realizing the benefits of automation and burden shifting. Analysis suggests that it may be advantageous to use chatbots to scale this rate-limiting step in large research projects

eScholarship - University of California

P821: Long-read sequencing resolves CYP21A2 alleles in congenital adrenal hyperplasia*

Author: Alexander Robertson
Benedict Paten
Brandy McNulty
Emmanuèle Délot
Eric Vilain
Hayk Barseghyan
Jean Monlong
Jonah Cool
Jonathan LoTempio
Karen Miga
Sara O'Rourke
Seth Berger
Shloka Negi
Surajit Bhattacharya
William Rowell
Xiao Chen
Publication venue: Elsevier
Publication date: 01/01/2024
Field of study

Directory of Open Access Journals

Recommended from our members

Using a chat-based informed consent tool in large-scale genomic research

Author: Andrew E Hallie
Berger Seth
Cohen Andrea J
Délot Emmanuèle
Fusaro Vincent A
Kahn-Kirby Amanda H
LoTempio Jonathan
Mas Gloria
Nussbaum Robert
Pitsava Georgia
Savage Sarah K
Smith Erica D
Vilain Eric
Publication venue: Health Sciences Research Commons
Publication date: 04/09/2023
Field of study

OBJECTIVE: We implemented a chatbot consent tool to shift the time burden from study staff in support of a national genomics research study. MATERIALS AND METHODS: We created an Institutional Review Board-approved script for automated chat-based consent. We compared data from prospective participants who used the tool or had traditional consent conversations with study staff. RESULTS: Chat-based consent, completed on a user\u27s schedule, was shorter than the traditional conversation. This did not lead to a significant change in affirmative consents. Within affirmative consents and declines, more prospective participants completed the chat-based process. A quiz to assess chat-based consent user understanding had a high pass rate with no reported negative experiences. CONCLUSION: Our report shows that a structured script can convey important information while realizing the benefits of automation and burden shifting. Analysis suggests that it may be advantageous to use chatbots to scale this rate-limiting step in large research projects

eScholarship - University of California

George Washington University: Health Sciences Research Commons (HSRC)

Recommended from our members

Increased diagnostic yield from negative whole genome-slice panels using automated reanalysis

Author: Albert Jessica
Andrew Erin Hallie
Berger Seth I
Cohen Andrea J
Délot Emmanuèle C
Fraser Jamie
Fusaro Vincent A
Kahn-Kirby Amanda H
Knoblach Susan
Ko Arthur
LoTempio Jonathan
Marmolejos Sofia
Martin Gloria Mas
Meltzer Beatrix
Pitsava Georgia
Regier Debra S
Smith Erica
Vilain Eric
Publication venue: Health Sciences Research Commons
Publication date: 17/05/2023
Field of study

We evaluated the diagnostic yield using genome-slice panel reanalysis in the clinical setting using an automated phenotype/gene ranking system. We analyzed whole genome sequencing (WGS) data produced from clinically ordered panels built as bioinformatic slices for 16 clinically diverse, undiagnosed cases referred to the Pediatric Mendelian Genomics Research Center, an NHGRI-funded GREGoR Consortium site. Genome-wide reanalysis was performed using Moon™, a machine-learning-based tool for variant prioritization. In five out of 16 cases, we discovered a potentially clinically significant variant. In four of these cases, the variant was found in a gene not included in the original panel due to phenotypic expansion of a disorder or incomplete initial phenotyping of the patient. In the fifth case, the gene containing the variant was included in the original panel, but being a complex structural rearrangement with intronic breakpoints outside the clinically analyzed regions, it was not initially identified. Automated genome-wide reanalysis of clinical WGS data generated during targeted panels testing yielded a 25% increase in diagnostic findings and a possibly clinically relevant finding in one additional case, underscoring the added value of analyses versus those routinely performed in the clinical setting

eScholarship - University of California

George Washington University: Health Sciences Research Commons (HSRC)

Baseline human gut microbiota profile in healthy people and standard reporting template

Author: Ayanyan Shant
Carrie Jill
Crandall Keith A.
Desai Hiral
Fochtman Brian C.
Gasparyan Lusine
Gulzar Naila
Howell Paul
Issa Najy
King Charles H.
Krampis Konstantinos
LoTempio Jonathan
Mazumder Raja
Mishra Lopa
Morizono Hiroki
Pisegna Joseph R.
Rao Shuyun
Ren Yao
Simonyan Vahan
Smith Krista
Sylvetsky Allison C.
VedBrat Sharanjit
Yao Michael D.
Publication venue: Health Sciences Research Commons
Publication date: 01/09/2019
Field of study

A comprehensive knowledge of the types and ratios of microbes that inhabit the healthy human gut is necessary before any kind of pre-clinical or clinical study can be performed that attempts to alter the microbiome to treat a condition or improve therapy outcome. To address this need we present an innovative scalable comprehensive analysis workflow, a healthy human reference microbiome list and abundance profile (GutFeelingKB), and a novel Fecal Biome Population Report (FecalBiome) with clinical applicability. GutFeelingKB provides a list of 157 organisms (8 phyla, 18 classes, 23 orders, 38 families, 59 genera and 109 species) that forms the baseline biome and therefore can be used as healthy controls for studies related to dysbiosis. This list can be expanded to 863 organisms if closely related proteomes are considered. The incorporation of microbiome science into routine clinical practice necessitates a standard report for comparison of an individual\u27s microbiome to the growing knowledgebase of “normal” microbiome data. The FecalBiome and the underlying technology of GutFeelingKB address this need. The knowledgebase can be useful to regulatory agencies for the assessment of fecal transplant and other microbiome products, as it contains a list of organisms from healthy individuals. In addition to the list of organisms and their abundances, this study also generated a collection of assembled contiguous sequences (contigs) of metagenomics dark matter. In this study, metagenomic dark matter represents sequences that cannot be mapped to any known sequence but can be assembled into contigs of 10,000 nucleotides or higher. These sequences can be used to create primers to study potential novel organisms

George Washington University: Health Sciences Research Commons (HSRC)

Recommended from our members

Baseline human gut microbiota profile in healthy people and standard reporting template.

Author: Ayanyan Shant
Carrie Jill
Crandall Keith A
Desai Hiral
Fochtman Brian C
Gasparyan Lusine
Gulzar Naila
Howell Paul
Issa Najy
King Charles H.
Krampis Konstantinos
LoTempio Jonathan
Mazumder Raja
Mishra Lopa
Morizono Hiroki
Pisegna Joseph R
Rao Shuyun
Ren Yao
Simonyan Vahan
Smith Krista
Sylvetsky Allison C.
VedBrat Sharanjit
Yao Michael D
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 01/01/2019
Field of study

A comprehensive knowledge of the types and ratios of microbes that inhabit the healthy human gut is necessary before any kind of pre-clinical or clinical study can be performed that attempts to alter the microbiome to treat a condition or improve therapy outcome. To address this need we present an innovative scalable comprehensive analysis workflow, a healthy human reference microbiome list and abundance profile (GutFeelingKB), and a novel Fecal Biome Population Report (FecalBiome) with clinical applicability. GutFeelingKB provides a list of 157 organisms (8 phyla, 18 classes, 23 orders, 38 families, 59 genera and 109 species) that forms the baseline biome and therefore can be used as healthy controls for studies related to dysbiosis. This list can be expanded to 863 organisms if closely related proteomes are considered. The incorporation of microbiome science into routine clinical practice necessitates a standard report for comparison of an individual's microbiome to the growing knowledgebase of "normal" microbiome data. The FecalBiome and the underlying technology of GutFeelingKB address this need. The knowledgebase can be useful to regulatory agencies for the assessment of fecal transplant and other microbiome products, as it contains a list of organisms from healthy individuals. In addition to the list of organisms and their abundances, this study also generated a collection of assembled contiguous sequences (contigs) of metagenomics dark matter. In this study, metagenomic dark matter represents sequences that cannot be mapped to any known sequence but can be assembled into contigs of 10,000 nucleotides or higher. These sequences can be used to create primers to study potential novel organisms. All data is freely available from https://hive.biochemistry.gwu.edu/gfkb and NCBI's Short Read Archive

eScholarship - University of California

George Washington University: Health Sciences Research Commons (HSRC)