Search CORE

2 research outputs found

Inverting the model of genomics data sharing with the NHGRI Genomic Data Science Analysis, Visualization, and Informatics Lab-space

Author: Abel Haley J
Afgan Enis
Baker Dannon
Banasiewicz M Katie
Banks Eric
Baumann Alexander
Baumann Michael
Bernard Clare
Blauvelt Lon
Cabansay Louise
Caetano-Anollés Derek
Canas Justin
Carey Vincent J
Carroll Robert J
Chaluvadi Sushma
Chilton John
Clements Dave
Cox Katherine EL
Culotti Alessandro
Di Francesco Valentina
Disman William
Ellrott Kyle
Geistlinger Ludwig
Ghanaim Elena M
Goecks Jeremy
Golitsynskiy Sergey
Grossman Robert L
Gupta Namrata
Hajian Allie
Hall Ira M
Hannafious Brian
Hansen Kasper D
Harris Tim
Hastie Mim
Herman Kate
Hutter Carolyn
Jalili Vahid
Kammers Kai
Kiernan Elizabeth
Kovalsy Anton
Kucher Nataliya
Lawson Jonathan
Leek Jeffrey T
Lucas Julian
Luria Anne O’Donnell
Mahmoud Alexandru
McDade Frances
Morgan Martin
Mosher Stephen
Munshi Ruchi
Nekrutenko Anton
Oh Sehyun
Osborn Kevin
Ostrovsky Alexander
Overbeck Charles
O’Connor Brian D
O’Farrell Ash
Paten Benedict
Patterson Candace
Philippakis Anthony A
Ramos Marcel
Reddy Radhika
Reeves Valerie
Reid Charles
Rogers Dave
Rula Andrew
s Yuen Deni
Sargent Luke
Schatz Michael C
Sen Shurjo K
Sheets Elizabeth A
Shepherd Lori
Simeon Marianie
Steinberg David Charles
Stevens Ana
Stubbs BJ
Suderman Keith
Tan Frederick J
Taylor Casey Overby
Taylor M Morgan
Thomas Salin
Title Robert
Torstenson Eric
Turaga Nitesh
Van der Auwera Geraldine A
Vessio Jennifer
Vizzier Benton A
Vosburg Trish
Waldron Levi
Walker Jason
Walsh Brian
Wang Qi
Wang Ting
Warren Noah
Wellington Christopher
Wheelan Sarah J
Wiley Ken L
Wuichet Kristin
Yuksel Kaan
Zarate Samantha
Publication venue: 'Elsevier BV'
Publication date: 12/01/2022
Field of study

The NHGRI Genomic Data Science Analysis, Visualization, and Informatics Lab-space (AnVIL; https://anvilproject.org) was developed to address a widespread community need for a unified computing environment for genomics data storage, management, and analysis. In this perspective, we present AnVIL, describe its ecosystem and interoperability with other platforms, and highlight how this platform and associated initiatives contribute to improved genomic data sharing efforts. The AnVIL is a federated cloud platform designed to manage and store genomics and related data, enable population-scale analysis, and facilitate collaboration through the sharing of data, code, and analysis results. By inverting the traditional model of data sharing, the AnVIL eliminates the need for data movement while also adding security measures for active threat detection and monitoring and provides scalable, shared computing resources for any researcher. We describe the core data management and analysis components of the AnVIL, which currently consists of Terra, Gen3, Galaxy, RStudio/Bioconductor, Dockstore, and Jupyter, and describe several flagship genomics datasets available within the AnVIL. We continue to extend and innovate the AnVIL ecosystem by implementing new capabilities, including mechanisms for interoperability and responsible data sharing, while streamlining access management. The AnVIL opens many new opportunities for analysis, collaboration, and data sharing that are needed to drive research and to make discoveries through the joint analysis of hundreds of thousands to millions of genomes along with associated clinical and molecular data types

Cold Spring Harbor Laboratory Institutional Repository

broadinstitute/viral-pipelines: v2.1.19.0

Author: Aaron Lin
Chris Tomkins-Tinch
cmloreth
Daniel Park
Hanna
Hayden Metsky
Ilya Shlyakhter
Irwin Jungreis
Katrin Leinweber
llangit-broad
Lydia Andreyevna Krasilnikova
Mike Lin
pvanheus
Simon Ye
Sushma Chaluvadi
Vang Le
Publication venue
Publication date: 26/01/2021
Field of study

Added new workflow: sarscov2_sra_to_genbank -- this takes sequencing reads from INSDC (via NCBI SRA), assembles, annotates, and QCs genomes, and produces Genbank and GISAID submission bundles based on the metadata in NCBI (SRA and BioSample). The Genbank submission will be tied to the same source BioProject and BioSamples that the reads were linked to in SRA. This workflow is able to merge together multiple read sets (SRA records) from the same BioSample and produce one assembly per BioSample. It will automatically detect sequencing platform (only Illumina and Oxford Nanopore currently supported) as well as amplicon vs metagenomic library designs based on the SRA metadata, and assemble appropriately. This has been tested on Illumina reads, ONT reads, amplicon libraries, metagenomic libraries, reads submitted to NCBI SRA, and reads originally submitted to ENA and synced with NCBI. [#197, #200] Minor changes and fixes to sarscov2_illumina_full: filter genbank/gisaid submission packages to only sequences present in biosample attributes file [#200] relax minimum genome unambig bp cutoff from 20kb to 15kb [#200] allow for merging multiple biosample attributes tsvs together in sarscov2_illumina_full [#200] add "Sequencing Technology" column to both genbank and gisaid submission packages [#200] greatly simplify the final assembly metrics metadata output from both workflows (single tsv instead of compound array structures) [#200] makes filename outputs a bit more organized [#200] exposes cleaned_bam_uris text file output for easy SRA submission [#200] replace the first several steps with an invocation of demux_deplete as a subworkflow to reduce code duplication [#197] Other minor changes: sarscov2_lineages and sarscov2_illumina_full: rename output variable pangolin_clade to pango_lineage to stay in line with the nomenclature of the PANGOLIN authors. [#197] increase default RAM for GATK UG consensus calling in assemble_refbased from 7GB to 15GB. [#200] bump nextclade image and pangoLEARN database to latest [#198]. nextclade update improves deletion variant naming. pangolin update keeps up with latest lineage assignments. bump viral-core docker 2.1.18 to 2.1.19 to fix demux scenario with single-index/paired-reads [#199

NEUROSURGERY ENTHUSIASTIC WOMEN SOCIETY