135,316 research outputs found

    Dossier: Distributed operating system and infrastructure for scientific data management

    Get PDF
    As scientific advancement and discovery have become increasingly data-driven and interdisciplinary, there are urging needs for advanced cyberinfrastructure to support managing and process- ing scientific data generated from day-to-day research. However, the development of data-driven cyberinfrastructure for scientific research areas has often lagged behind the development of such tools in other engineering and IT-related fields. Such the development gap is due to various diversity challenges of scientific data management and processing. First, these are the challenges in terms of the diversity of scientific data and data processing tasks, as the cyberinfrastructure should be able to support managing and processing heterogeneous types of scientific data that have been captured from scientific instruments. Second, as the cyberinfrastructure must help to shorten time from digital capture of data to interpretation and insights, it is challenging for the infrastructure to deal with the diversity of users and scientific workload. Third, it is the diversity of scientific instruments. Since there is still a significant number of scientific instruments that run their scientific software tools on old operating systems (e.g., Windows XP, Windows NT, Windows 2000), the cyberinfrastructure must help to bridge the performance and security gap between old scientific instruments and its advanced cloud-based infrastructure. In this thesis, we aim to address the above diversity challenges by taking a holistic approach in designing a distributed operating system and infrastructure for scientific data management, named DOSSIER. At the core of DOSSIER is an adaptive control microservice infrastructure that is de- signed to tackle the aforementioned challenges of data cyberinfrastructure for distributed scientific data management. Particularly, to handle heterogeneous scientific data processing and analysis, we start with redesigning the execution environment for scientific workflows, which traditionally follows a monolithic approach, using a novel microservice architecture and latest virtualization technology (i.e., container technology). The microservice design enables dynamic composition of workflows, and thus, is efficient in dealing with heterogeneous workflows. The new microservice architecture also allows us to express system resources in a more simple way, and thus, enables the design of a new adaptive resource management mechanism to handle large-scale and dynamic scientific workloads. We are the first to apply feedback control theory to design a self-adaptation mechanism for scientific workflow management system to help shorten the time from data acquisition to insights. To address the security and performance gap issues when connecting old scientific instruments to cloud-based cyberinfrastructure, we design an edge-cloud architecture that puts cloudlet servers directly connected to the scientific instruments and act as the security shield for the aging instruments. Cloudlets will also coordinate with cloud-based backend system to tackle the performance issue by scheduling data transfer and offloading processing tasks to cloudlets to avoid traffic congestion and guarantee performance of data processing jobs across edge-cloud architecture. By designing, developing, and testing DOSSIER in the real scientific environments, we demonstrate that an edge-cloud microservice architecture with learning-based adaptive control resource management is needed for timely distributed scientific data management

    Global Grids and Software Toolkits: A Study of Four Grid Middleware Technologies

    Full text link
    Grid is an infrastructure that involves the integrated and collaborative use of computers, networks, databases and scientific instruments owned and managed by multiple organizations. Grid applications often involve large amounts of data and/or computing resources that require secure resource sharing across organizational boundaries. This makes Grid application management and deployment a complex undertaking. Grid middlewares provide users with seamless computing ability and uniform access to resources in the heterogeneous Grid environment. Several software toolkits and systems have been developed, most of which are results of academic research projects, all over the world. This chapter will focus on four of these middlewares--UNICORE, Globus, Legion and Gridbus. It also presents our implementation of a resource broker for UNICORE as this functionality was not supported in it. A comparison of these systems on the basis of the architecture, implementation model and several other features is included.Comment: 19 pages, 10 figure

    A Taxonomy of Data Grids for Distributed Data Sharing, Management and Processing

    Full text link
    Data Grids have been adopted as the platform for scientific communities that need to share, access, transport, process and manage large data collections distributed worldwide. They combine high-end computing technologies with high-performance networking and wide-area storage management techniques. In this paper, we discuss the key concepts behind Data Grids and compare them with other data sharing and distribution paradigms such as content delivery networks, peer-to-peer networks and distributed databases. We then provide comprehensive taxonomies that cover various aspects of architecture, data transportation, data replication and resource allocation and scheduling. Finally, we map the proposed taxonomy to various Data Grid systems not only to validate the taxonomy but also to identify areas for future exploration. Through this taxonomy, we aim to categorise existing systems to better understand their goals and their methodology. This would help evaluate their applicability for solving similar problems. This taxonomy also provides a "gap analysis" of this area through which researchers can potentially identify new issues for investigation. Finally, we hope that the proposed taxonomy and mapping also helps to provide an easy way for new practitioners to understand this complex area of research.Comment: 46 pages, 16 figures, Technical Repor

    SIMDAT

    No full text

    Mapping Big Data into Knowledge Space with Cognitive Cyber-Infrastructure

    Full text link
    Big data research has attracted great attention in science, technology, industry and society. It is developing with the evolving scientific paradigm, the fourth industrial revolution, and the transformational innovation of technologies. However, its nature and fundamental challenge have not been recognized, and its own methodology has not been formed. This paper explores and answers the following questions: What is big data? What are the basic methods for representing, managing and analyzing big data? What is the relationship between big data and knowledge? Can we find a mapping from big data into knowledge space? What kind of infrastructure is required to support not only big data management and analysis but also knowledge discovery, sharing and management? What is the relationship between big data and science paradigm? What is the nature and fundamental challenge of big data computing? A multi-dimensional perspective is presented toward a methodology of big data computing.Comment: 59 page

    A network approach for managing and processing big cancer data in clouds

    Get PDF
    Translational cancer research requires integrative analysis of multiple levels of big cancer data to identify and treat cancer. In order to address the issues that data is decentralised, growing and continually being updated, and the content living or archiving on different information sources partially overlaps creating redundancies as well as contradictions and inconsistencies, we develop a data network model and technology for constructing and managing big cancer data. To support our data network approach for data process and analysis, we employ a semantic content network approach and adopt the CELAR cloud platform. The prototype implementation shows that the CELAR cloud can satisfy the on-demanding needs of various data resources for management and process of big cancer data

    Architecture of Environmental Risk Modelling: for a faster and more robust response to natural disasters

    Full text link
    Demands on the disaster response capacity of the European Union are likely to increase, as the impacts of disasters continue to grow both in size and frequency. This has resulted in intensive research on issues concerning spatially-explicit information and modelling and their multiple sources of uncertainty. Geospatial support is one of the forms of assistance frequently required by emergency response centres along with hazard forecast and event management assessment. Robust modelling of natural hazards requires dynamic simulations under an array of multiple inputs from different sources. Uncertainty is associated with meteorological forecast and calibration of the model parameters. Software uncertainty also derives from the data transformation models (D-TM) needed for predicting hazard behaviour and its consequences. On the other hand, social contributions have recently been recognized as valuable in raw-data collection and mapping efforts traditionally dominated by professional organizations. Here an architecture overview is proposed for adaptive and robust modelling of natural hazards, following the Semantic Array Programming paradigm to also include the distributed array of social contributors called Citizen Sensor in a semantically-enhanced strategy for D-TM modelling. The modelling architecture proposes a multicriteria approach for assessing the array of potential impacts with qualitative rapid assessment methods based on a Partial Open Loop Feedback Control (POLFC) schema and complementing more traditional and accurate a-posteriori assessment. We discuss the computational aspect of environmental risk modelling using array-based parallel paradigms on High Performance Computing (HPC) platforms, in order for the implications of urgency to be introduced into the systems (Urgent-HPC).Comment: 12 pages, 1 figure, 1 text box, presented at the 3rd Conference of Computational Interdisciplinary Sciences (CCIS 2014), Asuncion, Paragua
    • …
    corecore