4,840 research outputs found
Digital curation and the cloud
Digital curation involves a wide range of activities, many of which could benefit from cloud
deployment to a greater or lesser extent. These range from infrequent, resource-intensive tasks
which benefit from the ability to rapidly provision resources to day-to-day collaborative activities
which can be facilitated by networked cloud services. Associated benefits are offset by risks
such as loss of data or service level, legal and governance incompatibilities and transfer
bottlenecks. There is considerable variability across both risks and benefits according to the
service and deployment models being adopted and the context in which activities are
performed. Some risks, such as legal liabilities, are mitigated by the use of alternative, e.g.,
private cloud models, but this is typically at the expense of benefits such as resource elasticity
and economies of scale. Infrastructure as a Service model may provide a basis on which more
specialised software services may be provided.
There is considerable work to be done in helping institutions understand the cloud and its
associated costs, risks and benefits, and how these compare to their current working methods,
in order that the most beneficial uses of cloud technologies may be identified. Specific
proposals, echoing recent work coordinated by EPSRC and JISC are the development of
advisory, costing and brokering services to facilitate appropriate cloud deployments, the
exploration of opportunities for certifying or accrediting cloud preservation providers, and
the targeted publicity of outputs from pilot studies to the full range of stakeholders within the
curation lifecycle, including data creators and owners, repositories, institutional IT support
professionals and senior manager
On Evaluating Commercial Cloud Services: A Systematic Review
Background: Cloud Computing is increasingly booming in industry with many
competing providers and services. Accordingly, evaluation of commercial Cloud
services is necessary. However, the existing evaluation studies are relatively
chaotic. There exists tremendous confusion and gap between practices and theory
about Cloud services evaluation. Aim: To facilitate relieving the
aforementioned chaos, this work aims to synthesize the existing evaluation
implementations to outline the state-of-the-practice and also identify research
opportunities in Cloud services evaluation. Method: Based on a conceptual
evaluation model comprising six steps, the Systematic Literature Review (SLR)
method was employed to collect relevant evidence to investigate the Cloud
services evaluation step by step. Results: This SLR identified 82 relevant
evaluation studies. The overall data collected from these studies essentially
represent the current practical landscape of implementing Cloud services
evaluation, and in turn can be reused to facilitate future evaluation work.
Conclusions: Evaluation of commercial Cloud services has become a world-wide
research topic. Some of the findings of this SLR identify several research gaps
in the area of Cloud services evaluation (e.g., the Elasticity and Security
evaluation of commercial Cloud services could be a long-term challenge), while
some other findings suggest the trend of applying commercial Cloud services
(e.g., compared with PaaS, IaaS seems more suitable for customers and is
particularly important in industry). This SLR study itself also confirms some
previous experiences and reveals new Evidence-Based Software Engineering (EBSE)
lessons
Enabling Interactive Analytics of Secure Data using Cloud Kotta
Research, especially in the social sciences and humanities, is increasingly
reliant on the application of data science methods to analyze large amounts of
(often private) data. Secure data enclaves provide a solution for managing and
analyzing private data. However, such enclaves do not readily support discovery
science---a form of exploratory or interactive analysis by which researchers
execute a range of (sometimes large) analyses in an iterative and collaborative
manner. The batch computing model offered by many data enclaves is well suited
to executing large compute tasks; however it is far from ideal for day-to-day
discovery science. As researchers must submit jobs to queues and wait for
results, the high latencies inherent in queue-based, batch computing systems
hinder interactive analysis. In this paper we describe how we have augmented
the Cloud Kotta secure data enclave to support collaborative and interactive
analysis of sensitive data. Our model uses Jupyter notebooks as a flexible
analysis environment and Python language constructs to support the execution of
arbitrary functions on private data within this secure framework.Comment: To appear in Proceedings of Workshop on Scientific Cloud Computing,
Washington, DC USA, June 2017 (ScienceCloud 2017), 7 page
Performance Measurements of Supercomputing and Cloud Storage Solutions
Increasing amounts of data from varied sources, particularly in the fields of
machine learning and graph analytics, are causing storage requirements to grow
rapidly. A variety of technologies exist for storing and sharing these data,
ranging from parallel file systems used by supercomputers to distributed block
storage systems found in clouds. Relatively few comparative measurements exist
to inform decisions about which storage systems are best suited for particular
tasks. This work provides these measurements for two of the most popular
storage technologies: Lustre and Amazon S3. Lustre is an open-source, high
performance, parallel file system used by many of the largest supercomputers in
the world. Amazon's Simple Storage Service, or S3, is part of the Amazon Web
Services offering, and offers a scalable, distributed option to store and
retrieve data from anywhere on the Internet. Parallel processing is essential
for achieving high performance on modern storage systems. The performance tests
used span the gamut of parallel I/O scenarios, ranging from single-client,
single-node Amazon S3 and Lustre performance to a large-scale, multi-client
test designed to demonstrate the capabilities of a modern storage appliance
under heavy load. These results show that, when parallel I/O is used correctly
(i.e., many simultaneous read or write processes), full network bandwidth
performance is achievable and ranged from 10 gigabits/s over a 10 GigE S3
connection to 0.35 terabits/s using Lustre on a 1200 port 10 GigE switch. These
results demonstrate that S3 is well-suited to sharing vast quantities of data
over the Internet, while Lustre is well-suited to processing large quantities
of data locally.Comment: 5 pages, 4 figures, to appear in IEEE HPEC 201
- …