26 research outputs found

    LiFSBrowse: a Visual, User Environment for the Linking File System

    No full text
    The Linking File System introduces a new storage paradigm for enhanced user productivity through relationships between files, yet it opens up new challenges in usability. LiFSBrowse is a GUI for LiFS that attempts to meet those challenges through giving customizable graphical views of the file system. LiFS-Browse supports interaction through link manipulation and file system querying. We describe the layout of LiFSBrowse in detail and give some examples of its usability through a sample file system view. 1

    Coordinating an operational data distribution network for CMIP6 data

    Get PDF
    The distribution of data contributed to the Coupled Model Intercomparison Project Phase 6 (CMIP6) is via the Earth System Grid Federation (ESGF). The ESGF is a network of internationally distributed sites that together work as a federated data archive. Data records from climate modelling institutes are published to the ESGF and then shared around the world. It is anticipated that CMIP6 will produce approximately 20 PB of data to be published and distributed via the ESGF. In addition to this large volume of data a number of value-added CMIP6 services are required to interact with the ESGF; for example the citation and errata services both interact with the ESGF but are not a core part of its infrastructure. With a number of interacting services and a large volume of data anticipated for CMIP6, the CMIP Data Node Operations Team (CDNOT) was formed. The CDNOT coordinated and implemented a series of CMIP6 preparation data challenges to test all the interacting components in the ESGF CMIP6 software ecosystem. This ensured that when CMIP6 data were released they could be reliably distributed.This international collaborative work was funded through various agencies. Co-authors at Lawrence Berkeley National Laboratory were funded under contract no. DE-AC02-05CH11231, and co-authors at Lawrence Livermore National Laboratory under contract DE-AC52-07NA27344 with the US Department of Energy. European co-authors were supported by the European Union Horizon 2020 IS-ENES3 project (grant agreement no. 824084). CNRM participants were additionally funded by the French National Research Agency project CONVERGENCE (grant ANR-13-MONU-0008-02). Co-authors from NCI were supported by the National Collaborative Research Infrastructure Strategy (NCRIS)-funded National Computational Infrastructure (NCI) Australia and the Australian Research Data Commons (ARDC).Peer Reviewed"Article signat per 38 autors: Ruth Petrie, Sébastien Denvil, Sasha Ames, Guillaume Levavasseur, Sandro Fiore, Chris Allen, Fabrizio Antonio, Katharina Berger, Pierre-Antoine Bretonnière, Luca Cinquini, Eli Dart, Prashanth Dwarakanath, Kelsey Druken, Ben Evans, Laurent Franchistéguy, Sébastien Gardoll, Eric Gerbier, Mark Greenslade, David Hassell, Alan Iwi, Martin Juckes, Stephan Kindermann, Lukasz Lacinski, Maria Mirto, Atef Ben Nasser, Paola Nassisi, Eric Nienhouse, Sergey Nikonov, Alessandra Nuzzo, Clare Richards, Syazwan Ridzwan, Michel Rixen, Kim Serradell, Kate Snow, Ag Stephens, Martina Stockhause, Hans Vahlenkamp, and Rick Wagner"Postprint (published version

    2014 IEEE 28th International Parallel & Distributed Processing Symposium Workshops Design and Optimization of a Metagenomics Analysis Workflow for NVRAM

    No full text
    Abstract—Metagenomic analysis, the study of microbial communities found in environmental samples, presents considerable challenges in quantity of data and computational cost. We present a novel metagenomic analysis pipeline that leverages emerging large address space compute nodes with NVRAM to hold a searchable, memory-mapped “k-mer ” database of all known genomes and their taxonomic lineage. We describe challenges to creating the many hundred gigabytes-sized databases and describe database organization optimizations that enable our Livermore Metagenomic Analysis Toolkit (LMAT) software to effectively query the k-mer key-value store, which resides in high performance flash storage, as if fully in memory. To make database creation tractable, we have designed, implemented, and evaluated an optimized ingest pipeline. To optimize query performance for the database, we present a twolevel index scheme that yields speedups of 8.4 ⇥ 74 ⇥ over a conventional hash table index. LMAT, including the ingest pipeline, is available as open source at SourceForge. I
    corecore