213,744 research outputs found
User Applications Driven by the Community Contribution Framework MPContribs in the Materials Project
This work discusses how the MPContribs framework in the Materials Project
(MP) allows user-contributed data to be shown and analyzed alongside the core
MP database. The Materials Project is a searchable database of electronic
structure properties of over 65,000 bulk solid materials that is accessible
through a web-based science-gateway. We describe the motivation for enabling
user contributions to the materials data and present the framework's features
and challenges in the context of two real applications. These use-cases
illustrate how scientific collaborations can build applications with their own
"user-contributed" data using MPContribs. The Nanoporous Materials Explorer
application provides a unique search interface to a novel dataset of hundreds
of thousands of materials, each with tables of user-contributed values related
to material adsorption and density at varying temperature and pressure. The
Unified Theoretical and Experimental x-ray Spectroscopy application discusses a
full workflow for the association, dissemination and combined analyses of
experimental data from the Advanced Light Source with MP's theoretical core
data, using MPContribs tools for data formatting, management and exploration.
The capabilities being developed for these collaborations are serving as the
model for how new materials data can be incorporated into the Materials Project
website with minimal staff overhead while giving powerful tools for data search
and display to the user community.Comment: 12 pages, 5 figures, Proceedings of 10th Gateway Computing
Environments Workshop (2015), to be published in "Concurrency in Computation:
Practice and Experience
An ECOOP web portal for visualising and comparing distributed coastal oceanography model and in situ data
As part of a large European coastal operational oceanography project (ECOOP), we have developed a web portal for the display and comparison of model and in situ marine data. The distributed model and in situ datasets are accessed via an Open Geospatial Consortium Web Map Service (WMS) and Web Feature Service (WFS) respectively. These services were developed independently and readily integrated for the purposes of the ECOOP project, illustrating the ease of interoperability resulting from adherence to international standards. The key feature of the portal is the ability to display co-plotted timeseries of the in situ and model data and the quantification of misfits between the two. By using standards-based web technology we allow the user to quickly and easily explore over twenty model data feeds and compare these with dozens of in situ data feeds without being concerned with the low level details of differing file formats or the physical location of the data. Scientific and operational benefits to this work include model validation, quality control of observations, data assimilation and decision support in near real time. In these areas it is essential to be able to bring different data streams together from often disparate locations
Serving GODAE Data and Products to the Ocean Community
The Global Ocean Data Assimilation Experiment (GODAE [http://
www.godae.org]) has spanned a decade of rapid technological development. The ever-increasing volume and diversity of oceanographic data produced by in situ instruments, remote-sensing platforms, and computer simulations have driven
the development of a number of innovative technologies that are essential for connecting scientists with the data that they need. This paper gives an overview of the technologies that have been developed and applied in the course of GODAE, which now provide users of oceanographic data with the capability to discover, evaluate, visualize, download, and analyze data from all over the world. The key to this
capability is the ability to reduce the inherent complexity of oceanographic data by providing a consistent, harmonized view of the various data products. The challenges of data serving have been addressed over the last 10 years through the cooperative skills and energies of many individuals
The Design and Operation of The Keck Observatory Archive
The Infrared Processing and Analysis Center (IPAC) and the W. M. Keck
Observatory (WMKO) operate an archive for the Keck Observatory. At the end of
2013, KOA completed the ingestion of data from all eight active observatory
instruments. KOA will continue to ingest all newly obtained observations, at an
anticipated volume of 4 TB per year. The data are transmitted electronically
from WMKO to IPAC for storage and curation. Access to data is governed by a
data use policy, and approximately two-thirds of the data in the archive are
public.Comment: 12 pages, 4 figs, 4 tables. Presented at Software and
Cyberinfrastructure for Astronomy III, SPIE Astronomical Telescopes +
Instrumentation 2014. June 2014, Montreal, Canad
The LIGO Open Science Center
The LIGO Open Science Center (LOSC) fulfills LIGO's commitment to release,
archive, and serve LIGO data in a broadly accessible way to the scientific
community and to the public, and to provide the information and tools necessary
to understand and use the data. In August 2014, the LOSC published the full
dataset from Initial LIGO's "S5" run at design sensitivity, the first such
large-scale release and a valuable testbed to explore the use of LIGO data by
non-LIGO researchers and by the public, and to help teach gravitational-wave
data analysis to students across the world. In addition to serving the S5 data,
the LOSC web portal (losc.ligo.org) now offers documentation, data-location and
data-quality queries, tutorials and example code, and more. We review the
mission and plans of the LOSC, focusing on the S5 data release.Comment: 8 pages, 1 figure, proceedings of the 10th LISA Symposium, University
of Florida, Gainesville, May 18-23, 2014; final published version; see
losc.ligo.org for the S5 data release and more information about the LIGO
Open Science Cente
Web-Based Visualization of Very Large Scientific Astronomy Imagery
Visualizing and navigating through large astronomy images from a remote
location with current astronomy display tools can be a frustrating experience
in terms of speed and ergonomics, especially on mobile devices. In this paper,
we present a high performance, versatile and robust client-server system for
remote visualization and analysis of extremely large scientific images.
Applications of this work include survey image quality control, interactive
data query and exploration, citizen science, as well as public outreach. The
proposed software is entirely open source and is designed to be generic and
applicable to a variety of datasets. It provides access to floating point data
at terabyte scales, with the ability to precisely adjust image settings in
real-time. The proposed clients are light-weight, platform-independent web
applications built on standard HTML5 web technologies and compatible with both
touch and mouse-based devices. We put the system to the test and assess the
performance of the system and show that a single server can comfortably handle
more than a hundred simultaneous users accessing full precision 32 bit
astronomy data.Comment: Published in Astronomy & Computing. IIPImage server available from
http://iipimage.sourceforge.net . Visiomatic code and demos available from
http://www.visiomatic.org
SciTokens: Capability-Based Secure Access to Remote Scientific Data
The management of security credentials (e.g., passwords, secret keys) for
computational science workflows is a burden for scientists and information
security officers. Problems with credentials (e.g., expiration, privilege
mismatch) cause workflows to fail to fetch needed input data or store valuable
scientific results, distracting scientists from their research by requiring
them to diagnose the problems, re-run their computations, and wait longer for
their results. In this paper, we introduce SciTokens, open source software to
help scientists manage their security credentials more reliably and securely.
We describe the SciTokens system architecture, design, and implementation
addressing use cases from the Laser Interferometer Gravitational-Wave
Observatory (LIGO) Scientific Collaboration and the Large Synoptic Survey
Telescope (LSST) projects. We also present our integration with widely-used
software that supports distributed scientific computing, including HTCondor,
CVMFS, and XrootD. SciTokens uses IETF-standard OAuth tokens for
capability-based secure access to remote scientific data. The access tokens
convey the specific authorizations needed by the workflows, rather than
general-purpose authentication impersonation credentials, to address the risks
of scientific workflows running on distributed infrastructure including NSF
resources (e.g., LIGO Data Grid, Open Science Grid, XSEDE) and public clouds
(e.g., Amazon Web Services, Google Cloud, Microsoft Azure). By improving the
interoperability and security of scientific workflows, SciTokens 1) enables use
of distributed computing for scientific domains that require greater data
protection and 2) enables use of more widely distributed computing resources by
reducing the risk of credential abuse on remote systems.Comment: 8 pages, 6 figures, PEARC '18: Practice and Experience in Advanced
Research Computing, July 22--26, 2018, Pittsburgh, PA, US
Cache policies for cloud-based systems: To keep or not to keep
In this paper, we study cache policies for cloud-based caching. Cloud-based
caching uses cloud storage services such as Amazon S3 as a cache for data items
that would have been recomputed otherwise. Cloud-based caching departs from
classical caching: cloud resources are potentially infinite and only paid when
used, while classical caching relies on a fixed storage capacity and its main
monetary cost comes from the initial investment. To deal with this new context,
we design and evaluate a new caching policy that minimizes the overall cost of
a cloud-based system. The policy takes into account the frequency of
consumption of an item and the cloud cost model. We show that this policy is
easier to operate, that it scales with the demand and that it outperforms
classical policies managing a fixed capacity.Comment: Proceedings of IEEE International Conference on Cloud Computing 2014
(CLOUD 14
funcX: A Federated Function Serving Fabric for Science
Exploding data volumes and velocities, new computational methods and
platforms, and ubiquitous connectivity demand new approaches to computation in
the sciences. These new approaches must enable computation to be mobile, so
that, for example, it can occur near data, be triggered by events (e.g.,
arrival of new data), be offloaded to specialized accelerators, or run remotely
where resources are available. They also require new design approaches in which
monolithic applications can be decomposed into smaller components, that may in
turn be executed separately and on the most suitable resources. To address
these needs we present funcX---a distributed function as a service (FaaS)
platform that enables flexible, scalable, and high performance remote function
execution. funcX's endpoint software can transform existing clouds, clusters,
and supercomputers into function serving systems, while funcX's cloud-hosted
service provides transparent, secure, and reliable function execution across a
federated ecosystem of endpoints. We motivate the need for funcX with several
scientific case studies, present our prototype design and implementation, show
optimizations that deliver throughput in excess of 1 million functions per
second, and demonstrate, via experiments on two supercomputers, that funcX can
scale to more than more than 130000 concurrent workers.Comment: Accepted to ACM Symposium on High-Performance Parallel and
Distributed Computing (HPDC 2020). arXiv admin note: substantial text overlap
with arXiv:1908.0490
- …