1,409 research outputs found
Deploying Jupyter Notebooks at scale on XSEDE resources for Science Gateways and workshops
Jupyter Notebooks have become a mainstream tool for interactive computing in
every field of science. Jupyter Notebooks are suitable as companion
applications for Science Gateways, providing more flexibility and
post-processing capability to the users. Moreover they are often used in
training events and workshops to provide immediate access to a pre-configured
interactive computing environment. The Jupyter team released the JupyterHub web
application to provide a platform where multiple users can login and access a
Jupyter Notebook environment. When the number of users and memory requirements
are low, it is easy to setup JupyterHub on a single server. However, setup
becomes more complicated when we need to serve Jupyter Notebooks at scale to
tens or hundreds of users. In this paper we will present three strategies for
deploying JupyterHub at scale on XSEDE resources. All options share the
deployment of JupyterHub on a Virtual Machine on XSEDE Jetstream. In the first
scenario, JupyterHub connects to a supercomputer and launches a single node job
on behalf of each user and proxies back the Notebook from the computing node
back to the user's browser. In the second scenario, implemented in the context
of a XSEDE consultation for the IRIS consortium for Seismology, we deploy
Docker in Swarm mode to coordinate many XSEDE Jetstream virtual machines to
provide Notebooks with persistent storage and quota. In the last scenario we
install the Kubernetes containers orchestration framework on Jetstream to
provide a fault-tolerant JupyterHub deployment with a distributed filesystem
and capability to scale to thousands of users. In the conclusion section we
provide a link to step-by-step tutorials complete with all the necessary
commands and configuration files to replicate these deployments.Comment: 7 pages, 3 figures, PEARC '18: Practice and Experience in Advanced
Research Computing, July 22--26, 2018, Pittsburgh, PA, US
Harnessing the Power of Many: Extensible Toolkit for Scalable Ensemble Applications
Many scientific problems require multiple distinct computational tasks to be
executed in order to achieve a desired solution. We introduce the Ensemble
Toolkit (EnTK) to address the challenges of scale, diversity and reliability
they pose. We describe the design and implementation of EnTK, characterize its
performance and integrate it with two distinct exemplar use cases: seismic
inversion and adaptive analog ensembles. We perform nine experiments,
characterizing EnTK overheads, strong and weak scalability, and the performance
of two use case implementations, at scale and on production infrastructures. We
show how EnTK meets the following general requirements: (i) implementing
dedicated abstractions to support the description and execution of ensemble
applications; (ii) support for execution on heterogeneous computing
infrastructures; (iii) efficient scalability up to O(10^4) tasks; and (iv)
fault tolerance. We discuss novel computational capabilities that EnTK enables
and the scientific advantages arising thereof. We propose EnTK as an important
addition to the suite of tools in support of production scientific computing
Data Access for LIGO on the OSG
During 2015 and 2016, the Laser Interferometer Gravitational-Wave Observatory
(LIGO) conducted a three-month observing campaign. These observations delivered
the first direct detection of gravitational waves from binary black hole
mergers. To search for these signals, the LIGO Scientific Collaboration uses
the PyCBC search pipeline. To deliver science results in a timely manner, LIGO
collaborated with the Open Science Grid (OSG) to distribute the required
computation across a series of dedicated, opportunistic, and allocated
resources. To deliver the petabytes necessary for such a large-scale
computation, our team deployed a distributed data access infrastructure based
on the XRootD server suite and the CernVM File System (CVMFS). This data access
strategy grew from simply accessing remote storage to a POSIX-based interface
underpinned by distributed, secure caches across the OSG.Comment: 6 pages, 3 figures, submitted to PEARC1
Educational Technology as Seen Through the Eyes of the Readers
In this paper, I present the evaluation of a novel knowledge domain
visualization of educational technology. The interactive visualization is based
on readership patterns in the online reference management system Mendeley. It
comprises of 13 topic areas, spanning psychological, pedagogical, and
methodological foundations, learning methods and technologies, and social and
technological developments. The visualization was evaluated with (1) a
qualitative comparison to knowledge domain visualizations based on citations,
and (2) expert interviews. The results show that the co-readership
visualization is a recent representation of pedagogical and psychological
research in educational technology. Furthermore, the co-readership analysis
covers more areas than comparable visualizations based on co-citation patterns.
Areas related to computer science, however, are missing from the co-readership
visualization and more research is needed to explore the interpretations of
size and placement of research areas on the map.Comment: Forthcoming article in the International Journal of Technology
Enhanced Learnin
A Taxonomy for Management and Optimization of Multiple Resources in Edge Computing
Edge computing is promoted to meet increasing performance needs of
data-driven services using computational and storage resources close to the end
devices, at the edge of the current network. To achieve higher performance in
this new paradigm one has to consider how to combine the efficiency of resource
usage at all three layers of architecture: end devices, edge devices, and the
cloud. While cloud capacity is elastically extendable, end devices and edge
devices are to various degrees resource-constrained. Hence, an efficient
resource management is essential to make edge computing a reality. In this
work, we first present terminology and architectures to characterize current
works within the field of edge computing. Then, we review a wide range of
recent articles and categorize relevant aspects in terms of 4 perspectives:
resource type, resource management objective, resource location, and resource
use. This taxonomy and the ensuing analysis is used to identify some gaps in
the existing research. Among several research gaps, we found that research is
less prevalent on data, storage, and energy as a resource, and less extensive
towards the estimation, discovery and sharing objectives. As for resource
types, the most well-studied resources are computation and communication
resources. Our analysis shows that resource management at the edge requires a
deeper understanding of how methods applied at different levels and geared
towards different resource types interact. Specifically, the impact of mobility
and collaboration schemes requiring incentives are expected to be different in
edge architectures compared to the classic cloud solutions. Finally, we find
that fewer works are dedicated to the study of non-functional properties or to
quantifying the footprint of resource management techniques, including
edge-specific means of migrating data and services.Comment: Accepted in the Special Issue Mobile Edge Computing of the Wireless
Communications and Mobile Computing journa
Jetstream: A self-provisoned, scalable science and engineering cloud environment
The paper describes the motivation behind Jetstream, its functions, hardware configuration, software environment, user interface, design, use cases, relationships with other projects such as Wrangler and iPlant, and challenges in implementation.Funded by the National Science Foundation Award #ACI - 144560
- …