44,005 research outputs found
Querying Large Physics Data Sets Over an Information Grid
Optimising use of the Web (WWW) for LHC data analysis is a complex problem
and illustrates the challenges arising from the integration of and computation
across massive amounts of information distributed worldwide. Finding the right
piece of information can, at times, be extremely time-consuming, if not
impossible. So-called Grids have been proposed to facilitate LHC computing and
many groups have embarked on studies of data replication, data migration and
networking philosophies. Other aspects such as the role of 'middleware' for
Grids are emerging as requiring research. This paper positions the need for
appropriate middleware that enables users to resolve physics queries across
massive data sets. It identifies the role of meta-data for query resolution and
the importance of Information Grids for high-energy physics analysis rather
than just Computational or Data Grids. This paper identifies software that is
being implemented at CERN to enable the querying of very large collaborating
HEP data-sets, initially being employed for the construction of CMS detectors.Comment: 4 pages, 3 figure
Developing High Performance Computing Resources for Teaching Cluster and Grid Computing courses
High-Performance Computing (HPC) and the ability to process large amounts of data are of
paramount importance for UK business and economy as outlined by Rt Hon David Willetts
MP at the HPC and Big Data conference in February 2014. However there is a shortage of
skills and available training in HPC to prepare and expand the workforce for the HPC and
Big Data research and development. Currently, HPC skills are acquired mainly by students
and staff taking part in HPC-related research projects, MSc courses, and at the dedicated
training centres such as Edinburgh University’s EPCC. There are few UK universities teaching
the HPC, Clusters and Grid Computing courses at the undergraduate level. To address the
issue of skills shortages in the HPC it is essential to provide teaching and training as part of
both postgraduate and undergraduate courses. The design and development of such courses is
challenging since the technologies and software in the fields of large scale distributed systems
such as Cluster, Cloud and Grid computing are undergoing continuous change. The students
completing the HPC courses should be proficient in these evolving technologies and equipped
with practical and theoretical skills for future jobs in this fast developing area.
In this paper we present our experience in developing the HPC, Cluster and Grid modules
including a review of existing HPC courses offered at the UK universities. The topics covered in
the modules are described, as well as the coursework projects based on practical laboratory work.
We conclude with an evaluation based on our experience over the last ten years in developing
and delivering the HPC modules on the undergraduate courses, with suggestions for future work
CERN openlab Whitepaper on Future IT Challenges in Scientific Research
This whitepaper describes the major IT challenges in scientific research at CERN and several other European and international research laboratories and projects. Each challenge is exemplified through a set of concrete use cases drawn from the requirements of large-scale scientific programs. The paper is based on contributions from many researchers and IT experts of the participating laboratories and also input from the existing CERN openlab industrial sponsors. The views expressed in this document are those of the individual contributors and do not necessarily reflect the view of their organisations and/or affiliates
e-Social Science and Evidence-Based Policy Assessment : Challenges and Solutions
Peer reviewedPreprin
Data as a Service (DaaS) for sharing and processing of large data collections in the cloud
Data as a Service (DaaS) is among the latest kind of services being investigated in the Cloud computing community. The main aim of DaaS is to overcome limitations of state-of-the-art approaches in data technologies, according to which data is stored and accessed from repositories whose location is known and is relevant for sharing and processing. Besides limitations for the data sharing, current approaches also do not achieve to fully separate/decouple software services from data and thus impose limitations in inter-operability. In this paper we propose a DaaS approach for intelligent sharing and processing of large data collections with the aim of abstracting the data location (by making it relevant to the needs of sharing and accessing) and to fully decouple the data and its processing. The aim of our approach is to build a Cloud computing platform, offering DaaS to support large communities of users that need to share, access, and process the data for collectively building knowledge from data. We exemplify the approach from large data collections from health and biology domains.Peer ReviewedPostprint (author's final draft
- …