135,316 research outputs found
Dossier: Distributed operating system and infrastructure for scientific data management
As scientific advancement and discovery have become increasingly data-driven and interdisciplinary, there are urging needs for advanced cyberinfrastructure to support managing and process- ing scientific data generated from day-to-day research. However, the development of data-driven cyberinfrastructure for scientific research areas has often lagged behind the development of such tools in other engineering and IT-related fields. Such the development gap is due to various diversity challenges of scientific data management and processing. First, these are the challenges in terms of the diversity of scientific data and data processing tasks, as the cyberinfrastructure should be able to support managing and processing heterogeneous types of scientific data that have been captured from scientific instruments. Second, as the cyberinfrastructure must help to shorten time from digital capture of data to interpretation and insights, it is challenging for the infrastructure to deal with the diversity of users and scientific workload. Third, it is the diversity of scientific instruments. Since there is still a significant number of scientific instruments that run their scientific software tools on old operating systems (e.g., Windows XP, Windows NT, Windows 2000), the cyberinfrastructure must help to bridge the performance and security gap between old scientific instruments and its advanced cloud-based infrastructure.
In this thesis, we aim to address the above diversity challenges by taking a holistic approach in designing a distributed operating system and infrastructure for scientific data management, named DOSSIER. At the core of DOSSIER is an adaptive control microservice infrastructure that is de- signed to tackle the aforementioned challenges of data cyberinfrastructure for distributed scientific data management. Particularly, to handle heterogeneous scientific data processing and analysis, we start with redesigning the execution environment for scientific workflows, which traditionally follows a monolithic approach, using a novel microservice architecture and latest virtualization technology (i.e., container technology). The microservice design enables dynamic composition of workflows, and thus, is efficient in dealing with heterogeneous workflows. The new microservice architecture also allows us to express system resources in a more simple way, and thus, enables the design of a new adaptive resource management mechanism to handle large-scale and dynamic scientific workloads. We are the first to apply feedback control theory to design a self-adaptation mechanism for scientific workflow management system to help shorten the time from data acquisition to insights. To address the security and performance gap issues when connecting old scientific instruments to cloud-based cyberinfrastructure, we design an edge-cloud architecture that puts cloudlet servers directly connected to the scientific instruments and act as the security shield for the aging instruments. Cloudlets will also coordinate with cloud-based backend system to tackle the performance issue by scheduling data transfer and offloading processing tasks to cloudlets to avoid traffic congestion and guarantee performance of data processing jobs across edge-cloud architecture.
By designing, developing, and testing DOSSIER in the real scientific environments, we demonstrate that an edge-cloud microservice architecture with learning-based adaptive control resource management is needed for timely distributed scientific data management
Global Grids and Software Toolkits: A Study of Four Grid Middleware Technologies
Grid is an infrastructure that involves the integrated and collaborative use
of computers, networks, databases and scientific instruments owned and managed
by multiple organizations. Grid applications often involve large amounts of
data and/or computing resources that require secure resource sharing across
organizational boundaries. This makes Grid application management and
deployment a complex undertaking. Grid middlewares provide users with seamless
computing ability and uniform access to resources in the heterogeneous Grid
environment. Several software toolkits and systems have been developed, most of
which are results of academic research projects, all over the world. This
chapter will focus on four of these middlewares--UNICORE, Globus, Legion and
Gridbus. It also presents our implementation of a resource broker for UNICORE
as this functionality was not supported in it. A comparison of these systems on
the basis of the architecture, implementation model and several other features
is included.Comment: 19 pages, 10 figure
A Taxonomy of Data Grids for Distributed Data Sharing, Management and Processing
Data Grids have been adopted as the platform for scientific communities that
need to share, access, transport, process and manage large data collections
distributed worldwide. They combine high-end computing technologies with
high-performance networking and wide-area storage management techniques. In
this paper, we discuss the key concepts behind Data Grids and compare them with
other data sharing and distribution paradigms such as content delivery
networks, peer-to-peer networks and distributed databases. We then provide
comprehensive taxonomies that cover various aspects of architecture, data
transportation, data replication and resource allocation and scheduling.
Finally, we map the proposed taxonomy to various Data Grid systems not only to
validate the taxonomy but also to identify areas for future exploration.
Through this taxonomy, we aim to categorise existing systems to better
understand their goals and their methodology. This would help evaluate their
applicability for solving similar problems. This taxonomy also provides a "gap
analysis" of this area through which researchers can potentially identify new
issues for investigation. Finally, we hope that the proposed taxonomy and
mapping also helps to provide an easy way for new practitioners to understand
this complex area of research.Comment: 46 pages, 16 figures, Technical Repor
Mapping Big Data into Knowledge Space with Cognitive Cyber-Infrastructure
Big data research has attracted great attention in science, technology,
industry and society. It is developing with the evolving scientific paradigm,
the fourth industrial revolution, and the transformational innovation of
technologies. However, its nature and fundamental challenge have not been
recognized, and its own methodology has not been formed. This paper explores
and answers the following questions: What is big data? What are the basic
methods for representing, managing and analyzing big data? What is the
relationship between big data and knowledge? Can we find a mapping from big
data into knowledge space? What kind of infrastructure is required to support
not only big data management and analysis but also knowledge discovery, sharing
and management? What is the relationship between big data and science paradigm?
What is the nature and fundamental challenge of big data computing? A
multi-dimensional perspective is presented toward a methodology of big data
computing.Comment: 59 page
A network approach for managing and processing big cancer data in clouds
Translational cancer research requires integrative analysis of multiple levels of big cancer data to identify and treat cancer. In order to address the issues that data is decentralised, growing and continually being updated, and the content living or archiving on different information sources partially overlaps creating redundancies as well as contradictions and inconsistencies, we develop a data network model and technology for constructing and managing big cancer data. To support our data network approach for data process and analysis, we employ a semantic content network approach and adopt the CELAR cloud platform. The prototype implementation shows that the CELAR cloud can satisfy the on-demanding needs of various data resources for management and process of big cancer data
Architecture of Environmental Risk Modelling: for a faster and more robust response to natural disasters
Demands on the disaster response capacity of the European Union are likely to
increase, as the impacts of disasters continue to grow both in size and
frequency. This has resulted in intensive research on issues concerning
spatially-explicit information and modelling and their multiple sources of
uncertainty. Geospatial support is one of the forms of assistance frequently
required by emergency response centres along with hazard forecast and event
management assessment. Robust modelling of natural hazards requires dynamic
simulations under an array of multiple inputs from different sources.
Uncertainty is associated with meteorological forecast and calibration of the
model parameters. Software uncertainty also derives from the data
transformation models (D-TM) needed for predicting hazard behaviour and its
consequences. On the other hand, social contributions have recently been
recognized as valuable in raw-data collection and mapping efforts traditionally
dominated by professional organizations. Here an architecture overview is
proposed for adaptive and robust modelling of natural hazards, following the
Semantic Array Programming paradigm to also include the distributed array of
social contributors called Citizen Sensor in a semantically-enhanced strategy
for D-TM modelling. The modelling architecture proposes a multicriteria
approach for assessing the array of potential impacts with qualitative rapid
assessment methods based on a Partial Open Loop Feedback Control (POLFC) schema
and complementing more traditional and accurate a-posteriori assessment. We
discuss the computational aspect of environmental risk modelling using
array-based parallel paradigms on High Performance Computing (HPC) platforms,
in order for the implications of urgency to be introduced into the systems
(Urgent-HPC).Comment: 12 pages, 1 figure, 1 text box, presented at the 3rd Conference of
Computational Interdisciplinary Sciences (CCIS 2014), Asuncion, Paragua
- …