99,481 research outputs found
Data-Intensive architecture for scientific knowledge discovery
This paper presents a data-intensive architecture that demonstrates the ability to support applications from a wide range of application domains, and support the different types of users involved in defining, designing and executing data-intensive processing tasks. The prototype architecture is introduced, and the pivotal role of DISPEL as a canonical language is explained. The architecture promotes the exploration and exploitation of distributed and heterogeneous data and spans the complete knowledge discovery process, from data preparation, to analysis, to evaluation and reiteration. The architecture evaluation included large-scale applications from astronomy, cosmology, hydrology, functional genetics, imaging processing and seismology
Computing environments for reproducibility: Capturing the 'Whole Tale'
The act of sharing scientific knowledge is rapidly evolving away from traditional articles and presentations to the delivery of executable objects that integrate the data and computational details (e.g., scripts and workflows) upon which the findings rely. This envisioned coupling of data and process is essential to advancing science but faces technical and institutional barriers. The Whole Tale project aims to address these barriers by connecting computational, data-intensive research efforts with the larger research process—transforming the knowledge discovery and dissemination process into one where data products are united with research articles to create “living publications” or tales. The Whole Tale focuses on the full spectrum of science, empowering users in the long tail of science, and power users with demands for access to big data and compute resources. We report here on the design, architecture, and implementation of the Whole Tale environment
Collaborative e-science architecture for Reaction Kinetics research community
This paper presents a novel collaborative e-science architecture (CeSA) to address two challenging issues in e-science that arise from the management of heterogeneous distributed environments: (i) how to provide individual scientists an integrated environment to collaborate with each other in distributed, loosely coupled research communities where each member might be using a disparate range of tools; and (ii) how to provide easy access to a range of computationally intensive resources from a desktop. The Reaction Kinetics research community was used to capture the requirements and in the evaluation of the proposed architecture. The result demonstrated the feasibility of the approach and the potential benefits of the CeSA
Mapping Big Data into Knowledge Space with Cognitive Cyber-Infrastructure
Big data research has attracted great attention in science, technology,
industry and society. It is developing with the evolving scientific paradigm,
the fourth industrial revolution, and the transformational innovation of
technologies. However, its nature and fundamental challenge have not been
recognized, and its own methodology has not been formed. This paper explores
and answers the following questions: What is big data? What are the basic
methods for representing, managing and analyzing big data? What is the
relationship between big data and knowledge? Can we find a mapping from big
data into knowledge space? What kind of infrastructure is required to support
not only big data management and analysis but also knowledge discovery, sharing
and management? What is the relationship between big data and science paradigm?
What is the nature and fundamental challenge of big data computing? A
multi-dimensional perspective is presented toward a methodology of big data
computing.Comment: 59 page
Enabling e-Research in combustion research community
Abstract
This paper proposes an application of the Collaborative e-Science Architecture (CeSA) to enable e-Research in combustion research community. A major problem of the community is that data required for constructing modelling might already exist but scattered and improperly evaluated. That makes the collection of data for constructing models difficult and time-consuming. The decentralised P2P collaborative environment of the CeSA is well suited to solve this distributed problem. It opens up access to scattered data and turns them to valuable resources. Other issues of the community addressed here are the needs for computational resources, storages and interoperability amongst different data formats can also be addressed by the use of Grid environment in the CeSA
Data as a Service (DaaS) for sharing and processing of large data collections in the cloud
Data as a Service (DaaS) is among the latest kind of services being investigated in the Cloud computing community. The main aim of DaaS is to overcome limitations of state-of-the-art approaches in data technologies, according to which data is stored and accessed from repositories whose location is known and is relevant for sharing and processing. Besides limitations for the data sharing, current approaches also do not achieve to fully separate/decouple software services from data and thus impose limitations in inter-operability. In this paper we propose a DaaS approach for intelligent sharing and processing of large data collections with the aim of abstracting the data location (by making it relevant to the needs of sharing and accessing) and to fully decouple the data and its processing. The aim of our approach is to build a Cloud computing platform, offering DaaS to support large communities of users that need to share, access, and process the data for collectively building knowledge from data. We exemplify the approach from large data collections from health and biology domains.Peer ReviewedPostprint (author's final draft
Data Driven Discovery in Astrophysics
We review some aspects of the current state of data-intensive astronomy, its
methods, and some outstanding data analysis challenges. Astronomy is at the
forefront of "big data" science, with exponentially growing data volumes and
data rates, and an ever-increasing complexity, now entering the Petascale
regime. Telescopes and observatories from both ground and space, covering a
full range of wavelengths, feed the data via processing pipelines into
dedicated archives, where they can be accessed for scientific analysis. Most of
the large archives are connected through the Virtual Observatory framework,
that provides interoperability standards and services, and effectively
constitutes a global data grid of astronomy. Making discoveries in this
overabundance of data requires applications of novel, machine learning tools.
We describe some of the recent examples of such applications.Comment: Keynote talk in the proceedings of ESA-ESRIN Conference: Big Data
from Space 2014, Frascati, Italy, November 12-14, 2014, 8 pages, 2 figure
A Taxonomy of Workflow Management Systems for Grid Computing
With the advent of Grid and application technologies, scientists and
engineers are building more and more complex applications to manage and process
large data sets, and execute scientific experiments on distributed resources.
Such application scenarios require means for composing and executing complex
workflows. Therefore, many efforts have been made towards the development of
workflow management systems for Grid computing. In this paper, we propose a
taxonomy that characterizes and classifies various approaches for building and
executing workflows on Grids. We also survey several representative Grid
workflow systems developed by various projects world-wide to demonstrate the
comprehensiveness of the taxonomy. The taxonomy not only highlights the design
and engineering similarities and differences of state-of-the-art in Grid
workflow systems, but also identifies the areas that need further research.Comment: 29 pages, 15 figure
- …