191,653 research outputs found
Cold Storage Data Archives: More Than Just a Bunch of Tapes
The abundance of available sensor and derived data from large scientific
experiments, such as earth observation programs, radio astronomy sky surveys,
and high-energy physics already exceeds the storage hardware globally
fabricated per year. To that end, cold storage data archives are the---often
overlooked---spearheads of modern big data analytics in scientific,
data-intensive application domains. While high-performance data analytics has
received much attention from the research community, the growing number of
problems in designing and deploying cold storage archives has only received
very little attention.
In this paper, we take the first step towards bridging this gap in knowledge
by presenting an analysis of four real-world cold storage archives from three
different application domains. In doing so, we highlight (i) workload
characteristics that differentiate these archives from traditional,
performance-sensitive data analytics, (ii) design trade-offs involved in
building cold storage systems for these archives, and (iii) deployment
trade-offs with respect to migration to the public cloud. Based on our
analysis, we discuss several other important research challenges that need to
be addressed by the data management community
Managing scientific data with named data networking
Many scientific domains, such as climate science and High Energy Physics (HEP), have data management requirements that are not well supported by the IP network architecture. Named Data Networking (NDN) is a new network architecture whose service model is better aligned with the needs of data-oriented applications. NDN provides features such as best-location retrieval, caching, load sharing, and transparent failover that would otherwise be painstakingly (re-)implemented by each application using point-to-point semantics in an IP network.
We present the first scientific data management application designed and implemented on top of NDN. We use this application to manage climate and HEP data over a dedicated, high-performance, testbed. Our application has two main components: a UI for dataset discovery queries and a federation of synchronized name catalogs. We show how NDN primitives can be used to implement common data management operations such as publishing, search, efficient retrieval, and publication access control
Resource provisioning in Science Clouds: Requirements and challenges
Cloud computing has permeated into the information technology industry in the
last few years, and it is emerging nowadays in scientific environments. Science
user communities are demanding a broad range of computing power to satisfy the
needs of high-performance applications, such as local clusters,
high-performance computing systems, and computing grids. Different workloads
are needed from different computational models, and the cloud is already
considered as a promising paradigm. The scheduling and allocation of resources
is always a challenging matter in any form of computation and clouds are not an
exception. Science applications have unique features that differentiate their
workloads, hence, their requirements have to be taken into consideration to be
fulfilled when building a Science Cloud. This paper will discuss what are the
main scheduling and resource allocation challenges for any Infrastructure as a
Service provider supporting scientific applications
Polish grid infrastructure for science and research
Structure, functionality, parameters and organization of the computing Grid
in Poland is described, mainly from the perspective of high-energy particle
physics community, currently its largest consumer and developer. It represents
distributed Tier-2 in the worldwide Grid infrastructure. It also provides
services and resources for data-intensive applications in other sciences.Comment: Proceeedings of IEEE Eurocon 2007, Warsaw, Poland, 9-12 Sep. 2007,
p.44
- …