140 research outputs found

    Virtual Cluster Management for Analysis of Geographically Distributed and Immovable Data

    Get PDF
    Thesis (Ph.D.) - Indiana University, Informatics and Computing, 2015Scenarios exist in the era of Big Data where computational analysis needs to utilize widely distributed and remote compute clusters, especially when the data sources are sensitive or extremely large, and thus unable to move. A large dataset in Malaysia could be ecologically sensitive, for instance, and unable to be moved outside the country boundaries. Controlling an analysis experiment in this virtual cluster setting can be difficult on multiple levels: with setup and control, with managing behavior of the virtual cluster, and with interoperability issues across the compute clusters. Further, datasets can be distributed among clusters, or even across data centers, so that it becomes critical to utilize data locality information to optimize the performance of data-intensive jobs. Finally, datasets are increasingly sensitive and tied to certain administrative boundaries, though once the data has been processed, the aggregated or statistical result can be shared across the boundaries. This dissertation addresses management and control of a widely distributed virtual cluster having sensitive or otherwise immovable data sets through a controller. The Virtual Cluster Controller (VCC) gives control back to the researcher. It creates virtual clusters across multiple cloud platforms. In recognition of sensitive data, it can establish a single network overlay over widely distributed clusters. We define a novel class of data, notably immovable data that we call "pinned data", where the data is treated as a first-class citizen instead of being moved to where needed. We draw from our earlier work with a hierarchical data processing model, Hierarchical MapReduce (HMR), to process geographically distributed data, some of which are pinned data. The applications implemented in HMR use extended MapReduce model where computations are expressed as three functions: Map, Reduce, and GlobalReduce. Further, by facilitating information sharing among resources, applications, and data, the overall performance is improved. Experimental results show that the overhead of VCC is minimum. The HMR outperforms traditional MapReduce model while processing a particular class of applications. The evaluations also show that information sharing between resources and application through the VCC shortens the hierarchical data processing time, as well satisfying the constraints on the pinned data

    Towards an Energy-Aware Framework for Application Development and Execution in Heterogeneous Parallel Architectures

    Get PDF
    The Transparent heterogeneous hardware Architecture deployment for eNergy Gain in Operation (TANGO) project’s goal is to characterise factors which affect power consumption in software development and operation for Heterogeneous Parallel Hardware (HPA) environments. Its main contribution is the combination of requirements engineering and design modelling for self-adaptive software systems, with power consumption awareness in relation to these environments. The energy efficiency and application quality factors are integrated into the application lifecycle (design, implementation and operation). To support this, the key novelty of the project is a reference architecture and its implementation. Moreover, a programming model with built-in support for various hardware architectures including heterogeneous clusters, heterogeneous chips and programmable logic devices is provided. This leads to a new cross-layer programming approach for heterogeneous parallel hardware architectures featuring software and hardware modelling. Application power consumption and performance, data location and time-criticality optimization, as well as security and dependability requirements on the target hardware architecture are supported by the architecture

    Grid Information Technology as a New Technological Tool for e-Science, Healthcare and Life Science

    Get PDF
    Nowadays, scientific projects require collaborative environments and powerful computing resources capable of handling huge quantities of data, which gives rise to e-Science. These requirements are evident in the need to optimise time and efforts in activities to do with health. When e-Science focuses on the collaborative handling of all the information generated in clinical medicine and health, e-Health is the result. Scientists are taking increasing interest in an emerging technology – Grid Information Technology – that may offer a solution to their current needs. The current work aims to survey how e-Science is using this technology all around the world. We also argue that the technology may provide an ideal solution for the new challenges facing e-Health and Life Science.Hoy en día, los proyectos científicos requieren poderosos recursos de computación capaces de manejar grandes cantidades de datos, los cuales han dado paso a la ciencia electrónica (e-ciencia). Estos requerimientos se hacen evidentes en la necesidad de optimizar tiempo y esfuerzos en actividades relacionadas con la salud. Cuando la e-ciencia se enfoca en el manejo colaborativo de toda la información generada en la medicina clínica y la salud, da como resultado la salud electrónica (e-salud). Los científicos se han interesado cada vez más y más en una tecnología emergente, como lo es la Tecnología de información en red, la que puede ofrecer solución a sus necesidades cotidianas. El siguiente trabajo apunta a examinar como la e-ciencia es empleada en el mundo. También se discute que la tecnología puede proveer una solución ideal para encarar nuevos desafíos en e-salud y Ciencias de la Vida.Nowadays, scientific projects require collaborative environments and powerful computing resources capable of handling huge quantities of data, which gives rise to e-Science. These requirements are evident in the need to optimise time and efforts in activities to do with health. When e-Science focuses on the collaborative handling of all the information generated in clinical medicine and health, e-Health is the result. Scientists are taking increasing interest in an emerging technology – Grid Information Technology – that may offer a solution to their current needs. The current work aims to survey how e-Science is using this technology all around the world. We also argue that the technology may provide an ideal solution for the new challenges facing e-Health and Life Science

    Performance portability of Earth system models with user-controlled GGDML code translation

    Get PDF
    The increasing need for performance of earth system modeling and other scientific domains pushes the computing technologies in diverse architectural directions. The development of models needs technical expertise and skills of using tools that are able to exploit the hardware capabilities. The heterogeneity of architectures complicates the development and the maintainability of the models. To improve the software development process of earth system models, we provide an approach that simplifies the code maintainability by fostering separation of concerns while providing performance portability. We propose the use of high-level language extensions that reflect scientific concepts. The scientists can use the programming language of their own choice to develop models, however, they can use the language extensions optionally wherever they need. The code translation is driven by configurations that are separated from the model source code. These configurations are prepared by scientific programmers to optimally use the machine’s features. The main contribution of this paper is the demonstration of a user-controlled source-to-source translation technique of earth system models that are written with higher-level semantics. We discuss a flexible code translation technique that is driven by the users through a configuration input that is prepared especially to transform the code, and we use this technique to produce OpenMP or OpenACC enabled codes besides MPI to support multi-node configurations
    • …
    corecore