44,864 research outputs found
Lifecycle information for e-literature: full report from the LIFE project
This Report is a record of the LIFE Project. The Project has been run for one year and its aim is to deliver crucial information about the cost and management of digital material. This information should then in turn be able to be applied to any institution that has an interest in preserving and providing access to electronic collections.
The Project is a joint venture between The British Library and UCL Library Services. The Project is funded by JISC under programme area (i) as listed in paragraph 16 of the JISC 4/04 circular- Institutional Management Support and
Collaboration and as such has set requirements and outcomes which must be met and the Project has done its best to do so. Where the Project has been unable to answer specific questions, strong recommendations have been made for future Project work to do so.
The outcomes of this Project are expected to be a practical set of guidelines and a framework within which costs can be applied to digital collections in order to answer the following questions:
• What is the long term cost of preserving digital material;
• Who is going to do it;
• What are the long term costs for a library in HE/FE to partner with another institution to carry out long term archiving;
• What are the comparative long-term costs of a paper and digital copy of
the same publication;
• At what point will there be sufficient confidence in the stability and
maturity of digital preservation to switch from paper for publications
available in parallel formats;
• What are the relative risks of digital versus paper archiving.
The Project has attempted to answer these questions by using a developing
lifecycle methodology and three diverse collections of digital content. The LIFE Project team chose UCL e-journals, BL Web Archiving and the BL VDEP digital collections to provide a strong challenge to the methodology as well as to help reach the key Project aim of attributing long term cost to digital collections. The results from the Case Studies and the Project findings are both surprising
and illuminating
A Taxonomy of Data Grids for Distributed Data Sharing, Management and Processing
Data Grids have been adopted as the platform for scientific communities that
need to share, access, transport, process and manage large data collections
distributed worldwide. They combine high-end computing technologies with
high-performance networking and wide-area storage management techniques. In
this paper, we discuss the key concepts behind Data Grids and compare them with
other data sharing and distribution paradigms such as content delivery
networks, peer-to-peer networks and distributed databases. We then provide
comprehensive taxonomies that cover various aspects of architecture, data
transportation, data replication and resource allocation and scheduling.
Finally, we map the proposed taxonomy to various Data Grid systems not only to
validate the taxonomy but also to identify areas for future exploration.
Through this taxonomy, we aim to categorise existing systems to better
understand their goals and their methodology. This would help evaluate their
applicability for solving similar problems. This taxonomy also provides a "gap
analysis" of this area through which researchers can potentially identify new
issues for investigation. Finally, we hope that the proposed taxonomy and
mapping also helps to provide an easy way for new practitioners to understand
this complex area of research.Comment: 46 pages, 16 figures, Technical Repor
Towards Exascale Scientific Metadata Management
Advances in technology and computing hardware are enabling scientists from
all areas of science to produce massive amounts of data using large-scale
simulations or observational facilities. In this era of data deluge, effective
coordination between the data production and the analysis phases hinges on the
availability of metadata that describe the scientific datasets. Existing
workflow engines have been capturing a limited form of metadata to provide
provenance information about the identity and lineage of the data. However,
much of the data produced by simulations, experiments, and analyses still need
to be annotated manually in an ad hoc manner by domain scientists. Systematic
and transparent acquisition of rich metadata becomes a crucial prerequisite to
sustain and accelerate the pace of scientific innovation. Yet, ubiquitous and
domain-agnostic metadata management infrastructure that can meet the demands of
extreme-scale science is notable by its absence.
To address this gap in scientific data management research and practice, we
present our vision for an integrated approach that (1) automatically captures
and manipulates information-rich metadata while the data is being produced or
analyzed and (2) stores metadata within each dataset to permeate
metadata-oblivious processes and to query metadata through established and
standardized data access interfaces. We motivate the need for the proposed
integrated approach using applications from plasma physics, climate modeling
and neuroscience, and then discuss research challenges and possible solutions
ElasTraS: An Elastic Transactional Data Store in the Cloud
Over the last couple of years, "Cloud Computing" or "Elastic Computing" has
emerged as a compelling and successful paradigm for internet scale computing.
One of the major contributing factors to this success is the elasticity of
resources. In spite of the elasticity provided by the infrastructure and the
scalable design of the applications, the elephant (or the underlying database),
which drives most of these web-based applications, is not very elastic and
scalable, and hence limits scalability. In this paper, we propose ElasTraS
which addresses this issue of scalability and elasticity of the data store in a
cloud computing environment to leverage from the elastic nature of the
underlying infrastructure, while providing scalable transactional data access.
This paper aims at providing the design of a system in progress, highlighting
the major design choices, analyzing the different guarantees provided by the
system, and identifying several important challenges for the research community
striving for computing in the cloud.Comment: 5 Pages, In Proc. of USENIX HotCloud 200
Distributed Computing Grid Experiences in CMS
The CMS experiment is currently developing a computing system capable of serving, processing and archiving the large number of events that will be generated when the CMS detector starts taking data. During 2004 CMS undertook a large scale data challenge to demonstrate the ability of the CMS computing system to cope with a sustained data-taking rate equivalent to 25% of startup rate. Its goals were: to run CMS event reconstruction at CERN for a sustained period at 25 Hz input rate; to distribute the data to several regional centers; and enable data access at those centers for analysis. Grid middleware was utilized to help complete all aspects of the challenge. To continue to provide scalable access from anywhere in the world to the data, CMS is developing a layer of software that uses Grid tools to gain access to data and resources, and that aims to provide physicists with a user friendly interface for submitting their analysis jobs. This paper describes the data challenge experience with Grid infrastructure and the current development of the CMS analysis system
- …