213,744 research outputs found

    User Applications Driven by the Community Contribution Framework MPContribs in the Materials Project

    Full text link
    This work discusses how the MPContribs framework in the Materials Project (MP) allows user-contributed data to be shown and analyzed alongside the core MP database. The Materials Project is a searchable database of electronic structure properties of over 65,000 bulk solid materials that is accessible through a web-based science-gateway. We describe the motivation for enabling user contributions to the materials data and present the framework's features and challenges in the context of two real applications. These use-cases illustrate how scientific collaborations can build applications with their own "user-contributed" data using MPContribs. The Nanoporous Materials Explorer application provides a unique search interface to a novel dataset of hundreds of thousands of materials, each with tables of user-contributed values related to material adsorption and density at varying temperature and pressure. The Unified Theoretical and Experimental x-ray Spectroscopy application discusses a full workflow for the association, dissemination and combined analyses of experimental data from the Advanced Light Source with MP's theoretical core data, using MPContribs tools for data formatting, management and exploration. The capabilities being developed for these collaborations are serving as the model for how new materials data can be incorporated into the Materials Project website with minimal staff overhead while giving powerful tools for data search and display to the user community.Comment: 12 pages, 5 figures, Proceedings of 10th Gateway Computing Environments Workshop (2015), to be published in "Concurrency in Computation: Practice and Experience

    An ECOOP web portal for visualising and comparing distributed coastal oceanography model and in situ data

    Get PDF
    As part of a large European coastal operational oceanography project (ECOOP), we have developed a web portal for the display and comparison of model and in situ marine data. The distributed model and in situ datasets are accessed via an Open Geospatial Consortium Web Map Service (WMS) and Web Feature Service (WFS) respectively. These services were developed independently and readily integrated for the purposes of the ECOOP project, illustrating the ease of interoperability resulting from adherence to international standards. The key feature of the portal is the ability to display co-plotted timeseries of the in situ and model data and the quantification of misfits between the two. By using standards-based web technology we allow the user to quickly and easily explore over twenty model data feeds and compare these with dozens of in situ data feeds without being concerned with the low level details of differing file formats or the physical location of the data. Scientific and operational benefits to this work include model validation, quality control of observations, data assimilation and decision support in near real time. In these areas it is essential to be able to bring different data streams together from often disparate locations

    Serving GODAE Data and Products to the Ocean Community

    Get PDF
    The Global Ocean Data Assimilation Experiment (GODAE [http:// www.godae.org]) has spanned a decade of rapid technological development. The ever-increasing volume and diversity of oceanographic data produced by in situ instruments, remote-sensing platforms, and computer simulations have driven the development of a number of innovative technologies that are essential for connecting scientists with the data that they need. This paper gives an overview of the technologies that have been developed and applied in the course of GODAE, which now provide users of oceanographic data with the capability to discover, evaluate, visualize, download, and analyze data from all over the world. The key to this capability is the ability to reduce the inherent complexity of oceanographic data by providing a consistent, harmonized view of the various data products. The challenges of data serving have been addressed over the last 10 years through the cooperative skills and energies of many individuals

    The Design and Operation of The Keck Observatory Archive

    Get PDF
    The Infrared Processing and Analysis Center (IPAC) and the W. M. Keck Observatory (WMKO) operate an archive for the Keck Observatory. At the end of 2013, KOA completed the ingestion of data from all eight active observatory instruments. KOA will continue to ingest all newly obtained observations, at an anticipated volume of 4 TB per year. The data are transmitted electronically from WMKO to IPAC for storage and curation. Access to data is governed by a data use policy, and approximately two-thirds of the data in the archive are public.Comment: 12 pages, 4 figs, 4 tables. Presented at Software and Cyberinfrastructure for Astronomy III, SPIE Astronomical Telescopes + Instrumentation 2014. June 2014, Montreal, Canad

    The LIGO Open Science Center

    Get PDF
    The LIGO Open Science Center (LOSC) fulfills LIGO's commitment to release, archive, and serve LIGO data in a broadly accessible way to the scientific community and to the public, and to provide the information and tools necessary to understand and use the data. In August 2014, the LOSC published the full dataset from Initial LIGO's "S5" run at design sensitivity, the first such large-scale release and a valuable testbed to explore the use of LIGO data by non-LIGO researchers and by the public, and to help teach gravitational-wave data analysis to students across the world. In addition to serving the S5 data, the LOSC web portal (losc.ligo.org) now offers documentation, data-location and data-quality queries, tutorials and example code, and more. We review the mission and plans of the LOSC, focusing on the S5 data release.Comment: 8 pages, 1 figure, proceedings of the 10th LISA Symposium, University of Florida, Gainesville, May 18-23, 2014; final published version; see losc.ligo.org for the S5 data release and more information about the LIGO Open Science Cente

    Web-Based Visualization of Very Large Scientific Astronomy Imagery

    Full text link
    Visualizing and navigating through large astronomy images from a remote location with current astronomy display tools can be a frustrating experience in terms of speed and ergonomics, especially on mobile devices. In this paper, we present a high performance, versatile and robust client-server system for remote visualization and analysis of extremely large scientific images. Applications of this work include survey image quality control, interactive data query and exploration, citizen science, as well as public outreach. The proposed software is entirely open source and is designed to be generic and applicable to a variety of datasets. It provides access to floating point data at terabyte scales, with the ability to precisely adjust image settings in real-time. The proposed clients are light-weight, platform-independent web applications built on standard HTML5 web technologies and compatible with both touch and mouse-based devices. We put the system to the test and assess the performance of the system and show that a single server can comfortably handle more than a hundred simultaneous users accessing full precision 32 bit astronomy data.Comment: Published in Astronomy & Computing. IIPImage server available from http://iipimage.sourceforge.net . Visiomatic code and demos available from http://www.visiomatic.org

    SciTokens: Capability-Based Secure Access to Remote Scientific Data

    Full text link
    The management of security credentials (e.g., passwords, secret keys) for computational science workflows is a burden for scientists and information security officers. Problems with credentials (e.g., expiration, privilege mismatch) cause workflows to fail to fetch needed input data or store valuable scientific results, distracting scientists from their research by requiring them to diagnose the problems, re-run their computations, and wait longer for their results. In this paper, we introduce SciTokens, open source software to help scientists manage their security credentials more reliably and securely. We describe the SciTokens system architecture, design, and implementation addressing use cases from the Laser Interferometer Gravitational-Wave Observatory (LIGO) Scientific Collaboration and the Large Synoptic Survey Telescope (LSST) projects. We also present our integration with widely-used software that supports distributed scientific computing, including HTCondor, CVMFS, and XrootD. SciTokens uses IETF-standard OAuth tokens for capability-based secure access to remote scientific data. The access tokens convey the specific authorizations needed by the workflows, rather than general-purpose authentication impersonation credentials, to address the risks of scientific workflows running on distributed infrastructure including NSF resources (e.g., LIGO Data Grid, Open Science Grid, XSEDE) and public clouds (e.g., Amazon Web Services, Google Cloud, Microsoft Azure). By improving the interoperability and security of scientific workflows, SciTokens 1) enables use of distributed computing for scientific domains that require greater data protection and 2) enables use of more widely distributed computing resources by reducing the risk of credential abuse on remote systems.Comment: 8 pages, 6 figures, PEARC '18: Practice and Experience in Advanced Research Computing, July 22--26, 2018, Pittsburgh, PA, US

    Cache policies for cloud-based systems: To keep or not to keep

    Full text link
    In this paper, we study cache policies for cloud-based caching. Cloud-based caching uses cloud storage services such as Amazon S3 as a cache for data items that would have been recomputed otherwise. Cloud-based caching departs from classical caching: cloud resources are potentially infinite and only paid when used, while classical caching relies on a fixed storage capacity and its main monetary cost comes from the initial investment. To deal with this new context, we design and evaluate a new caching policy that minimizes the overall cost of a cloud-based system. The policy takes into account the frequency of consumption of an item and the cloud cost model. We show that this policy is easier to operate, that it scales with the demand and that it outperforms classical policies managing a fixed capacity.Comment: Proceedings of IEEE International Conference on Cloud Computing 2014 (CLOUD 14

    funcX: A Federated Function Serving Fabric for Science

    Full text link
    Exploding data volumes and velocities, new computational methods and platforms, and ubiquitous connectivity demand new approaches to computation in the sciences. These new approaches must enable computation to be mobile, so that, for example, it can occur near data, be triggered by events (e.g., arrival of new data), be offloaded to specialized accelerators, or run remotely where resources are available. They also require new design approaches in which monolithic applications can be decomposed into smaller components, that may in turn be executed separately and on the most suitable resources. To address these needs we present funcX---a distributed function as a service (FaaS) platform that enables flexible, scalable, and high performance remote function execution. funcX's endpoint software can transform existing clouds, clusters, and supercomputers into function serving systems, while funcX's cloud-hosted service provides transparent, secure, and reliable function execution across a federated ecosystem of endpoints. We motivate the need for funcX with several scientific case studies, present our prototype design and implementation, show optimizations that deliver throughput in excess of 1 million functions per second, and demonstrate, via experiments on two supercomputers, that funcX can scale to more than more than 130000 concurrent workers.Comment: Accepted to ACM Symposium on High-Performance Parallel and Distributed Computing (HPDC 2020). arXiv admin note: substantial text overlap with arXiv:1908.0490
    corecore