    An Extensible Timing Infrastructure for Adaptive Large-scale Applications

    Real-time access to accurate and reliable timing information is necessary to profile scientific applications, and crucial as simulations become increasingly complex, adaptive, and large-scale. The Cactus Framework provides flexible and extensible capabilities for timing information through a well designed infrastructure and timing API. Applications built with Cactus automatically gain access to built-in timers, such as gettimeofday and getrusage, system-specific hardware clocks, and high-level interfaces such as PAPI. We describe the Cactus timer interface, its motivation, and its implementation. We then demonstrate how this timing information can be used by an example scientific application to profile itself, and to dynamically adapt itself to a changing environment at run time

    Realtime reservoir characterization and beyond: cyber-infrastructure tools and technologies

    The advent of the digital oil _x000C_eld and rapidly decreasing cost of computing creates opportunities as well as challenges in simulation based reservoir studies, in particular, real-time reservoir characterization and optimization. One challenge our e_x000B_orts are directed toward is the use of real-time production data to perform live reservoir characterization using high throughput, high performance computing environments. To that end we developed the required tools of parallel reservoir simulator, parallel ensemble Kalman _x000C_lter and a scalable work ow manager. When using this collection of tools, a reservoir modeler is able to perform large scale reservoir management studies in short periods of time. This includes studies with thousands of models that are individually complex and large, involving millions of degrees of freedom. Using parallel processing, we are able to solve these models much faster than we otherwise would on a single, serial machine. This motivated the development of a fast parallel reservoir simulator. Furthermore, distributing those simulations across resources leads to a smaller total time to completion by making use of distributed processing. This allows the development of a scalable high throughput work ow manager. Finally, with thousands of models, each with millions of degrees of freedom, we end up with a super uity of model parameters. This translates directly to billions of degrees of freedom in the reservoir study. To be able to use the ensemble Kalman _x000C_lter on these models, we needed to develop a parallel implementation of the ensemble Kalman _x000C_lter. This thesis discusses the enabling tools and technologies developed to address a speci _x000C_c problem: how to accurately characterize reservoirs, using large numbers of complex detailed models. For these characterization studies to be helpful in making production decisions, the time to solution must be feasible. To that end, our work is focused on developing and extending these tools, and optimizing their performance

    Application Level Interoperability between Clouds and Grids

    Abstract—SAGA is a high-level programming interface which provides the ability to develop distributed applications in an infrastructure independent way. In an earlier paper, we discussed how SAGA was used to develop a version of MapReduce which provided the user with the ability to control the relative placement of compute and data, whilst utilizing different distributed infras-tructure. In this paper, we use the SAGA-based implementation of MapReduce, and demonstrate its interoperability across Clouds and Grids. We discuss how a range of cloud adaptors have been developed for SAGA. The major contribution of this paper is the demonstration – possibly the first ever, of interoperability between different Clouds and Grids, without any changes to the application. We analyse the performance of SAGA-MapReduce when using multiple, different, heterogeneous infrastructure concurrently for the same problem instance; However, we do not strive to provide a rigorous performance model, but to provide a proof-of-concept of application-level interoperability and illustrate its importance. I

    Project Final Report: Ubiquitous Computing and Monitoring System (UCoMS) for Discovery and Management of Energy Resources

    Continuous reservoir model updating by ensemble Kalman filter on Grid computing architectures

    A reservoir engineering Grid computing toolkit, ResGrid and its extensions, were developed and applied to designed reservoir simulation studies and continuous reservoir model updating. The toolkit provides reservoir engineers with high performance computing capacity to complete their projects without requiring them to delve into Grid resource heterogeneity, security certification, or network protocols. Continuous and real-time reservoir model updating is an important component of closed-loop model-based reservoir management. The method must rapidly and continuously update reservoir models by assimilating production data, so that the performance predictions and the associated uncertainty are up-to-date for optimization. The ensemble Kalman filter (EnKF), a Bayesian approach for model updating, uses Monte Carlo statistics for fusing observation data with forecasts from simulations to estimate a range of plausible models. The ensemble of updated models can be used for uncertainty forecasting or optimization. Grid environments aggregate geographically distributed, heterogeneous resources. Their virtual architecture can handle many large parallel simulation runs, and is thus well suited to solving model-based reservoir management problems. In the study, the ResGrid workflow for Grid-based designed reservoir simulation and an adapted workflow provide tools for building prior model ensembles, task farming and execution, extracting simulator output results, implementing the EnKF, and using a web portal for invoking those scripts. The ResGrid workflow is demonstrated for a geostatistical study of 3-D displacements in heterogeneous reservoirs. A suite of 1920 simulations assesses the effects of geostatistical methods and model parameters. Multiple runs are simultaneously executed using parallel Grid computing. Flow response analyses indicate that efficient, widely-used sequential geostatistical simulation methods may overestimate flow response variability when compared to more rigorous but computationally costly direct methods. Although the EnKF has attracted great interest in reservoir engineering, some aspects of the EnKF remain poorly understood, and are explored in the dissertation. First, guidelines are offered to select data assimilation intervals. Second, an adaptive covariance inflation method is shown to be effective to stabilize the EnKF. Third, we show that simple truncation can correct negative effects of nonlinearity and non-Gaussianity as effectively as more complex and expensive reparameterization methods

    How To Touch a Running System

    The increasing importance of distributed and decentralized software architectures entails more and more attention for adaptive software. Obtaining adaptiveness, however, is a difficult task as the software design needs to foresee and cope with a variety of situations. Using reconfiguration of components facilitates this task, as the adaptivity is conducted on an architecture level instead of directly in the code. This results in a separation of concerns; the appropriate reconfiguration can be devised on a coarse level, while the implementation of the components can remain largely unaware of reconfiguration scenarios. We study reconfiguration in component frameworks based on formal theory. We first discuss programming with components, exemplified with the development of the cmc model checker. This highly efficient model checker is made of C++ components and serves as an example for component-based software development practice in general, and also provides insights into the principles of adaptivity. However, the component model focuses on high performance and is not geared towards using the structuring principle of components for controlled reconfiguration. We thus complement this highly optimized model by a message passing-based component model which takes reconfigurability to be its central principle. Supporting reconfiguration in a framework is about alleviating the programmer from caring about the peculiarities as much as possible. We utilize the formal description of the component model to provide an algorithm for reconfiguration that retains as much flexibility as possible, while avoiding most problems that arise due to concurrency. This algorithm is embedded in a general four-stage adaptivity model inspired by physical control loops. The reconfiguration is devised to work with stateful components, retaining their data and unprocessed messages. Reconfiguration plans, which are provided with a formal semantics, form the input of the reconfiguration algorithm. We show that the algorithm achieves perceived atomicity of the reconfiguration process for an important class of plans, i.e., the whole process of reconfiguration is perceived as one atomic step, while minimizing the use of blocking of components. We illustrate the applicability of our approach to reconfiguration by providing several examples like fault-tolerance and automated resource control

    GRID superscalar: a programming model for the Grid

    Durant els darrers anys el Grid ha sorgit com una nova plataforma per la computació distribuïda. La tecnologia Gris permet unir diferents recursos de diferents dominis administratius i formar un superordinador virtual amb tots ells. Molts grups de recerca han dedicat els seus esforços a desenvolupar un conjunt de serveis bàsics per oferir un middleware de Grid: una capa que permet l'ús del Grid. De tota manera, utilitzar aquests serveis no és una tasca fácil per molts usuaris finals, cosa que empitjora si l'expertesa d'aquests usuaris no està relacionada amb la informàtica.Això té una influència negativa a l'hora de que la comunitat científica adopti la tecnologia Grid. Es veu com una tecnologia potent però molt difícil de fer servir. Per facilitar l'ús del Grid és necessària una capa extra que amagui la complexitat d'aquest i permeti als usuaris programar o portar les seves aplicacions de manera senzilla.Existeixen moltes propostes d'eines de programació pel Grid. En aquesta tesi fem un resum d'algunes d'elles, i podem veure que existeixen eines conscients i no-conscients del Grid (es programen especificant o no els detalls del Grid, respectivament). A més, molt poques d'aquestes eines poden explotar el paral·lelisme implícit de l'aplicació, i en la majoria d'elles, l'usuari ha de definir aquest paral·lelisme de manera explícita. Una altra característica que considerem important és si es basen en llenguatges de programació molt populars (com C++ o Java), cosa que facilita l'adopció per part dels usuaris finals.En aquesta tesi, el nostre objectiu principal ha estat crear un model de programació pel Grid basat en la programació seqüencial i els llenguatges més coneguts de la programació imperativa, capaç d'explotar el paral·lelisme implícit de les aplicacions i d'accelerar-les fent servir els recursos del Grid de manera concurrent. A més, com el Grid és de naturalesa distribuïda, heterogènia i dinàmica i degut també a que el nombre de recursos que pot formar un Grid pot ser molt gran, la probabilitat de que es produeixi una errada durant l'execució d'una aplicació és elevada. Per tant, un altre dels nostres objectius ha estat tractar qualsevol tipus d'error que pugui sorgir durant l'execució d'una aplicació de manera automàtica (ja siguin errors relacionats amb l'aplicació o amb el Grid). GRID superscalar (GRIDSs), la principal contribució d'aquesta tesi, és un model de programació que assoleix elsobjectius mencionats proporcionant una interfície molt petita i simple i un entorn d'execució que és capaç d'executar en paral·lel el codi proporcionat fent servir el Grid. La nostra interfície de programació permet a un usuari programar una aplicació no-conscient del Grid, amb llenguatges imperatius coneguts i populars (com C/C++, Java, Perl o Shell script) i de manera seqüencial, per tant dóna un pas important per ajudar als usuaris a adoptar la tecnologia Grid.Hem aplicat el nostre coneixement de l'arquitectura de computadors i el disseny de microprocessadors a l'entorn d'execució de GRIDSs. Tal com es fa a un processador superescalar, l'entorn d'execució de GRIDSs és capaç de realitzar un anàlisi de dependències entre les tasques que formen l'aplicació, i d'aplicar tècniques de renombrament per incrementar el seu paral·lelisme. GRIDSs genera automàticament a partir del codi principal de l'usuari un graf que descriu les dependències de dades en l'aplicació. També presentem casos d'ús reals del model de programació en els camps de la química computacional i la bioinformàtica, que demostren que els nostres objectius han estat assolits.Finalment, hem estudiat l'aplicació de diferents tècniques per detectar i tractar fallades: checkpoint, reintent i replicació de tasques. La nostra proposta és proporcionar un entorn capaç de tractar qualsevol tipus d'errors, de manera transparent a l'usuari sempre que sigui possible. El principal avantatge d'implementar aquests mecanismos al nivell del model de programació és que el coneixement a nivell de l'aplicació pot ser explotat per crear dinàmicament una estratègia de tolerància a fallades per cada aplicació, i evitar introduir sobrecàrrega en entorns lliures d'errors.During last years, the Grid has emerged as a new platform for distributed computing. The Grid technology allows joining different resources from different administrative domains and forming a virtual supercomputer with all of them.Many research groups have dedicated their efforts to develop a set of basic services to offer a Grid middleware: a layer that enables the use of the Grid. Anyway, using these services is not an easy task for many end users, even more if their expertise is not related to computer science. This has a negative influence in the adoption of the Grid technology by the scientific community. They see it as a powerful technology but very difficult to exploit. In order to ease the way the Grid must be used, there is a need for an extra layer which hides all the complexity of the Grid, and allows users to program or port their applications in an easy way.There has been many proposals of programming tools for the Grid. In this thesis we give an overview on some of them, and we can see that there exist both Grid-aware and Grid-unaware environments (programmed with or without specifying details of the Grid respectively). Besides, very few existing tools can exploit the implicit parallelism of the application and in the majority of them, the user must define the parallelism explicitly. Another important feature we consider is if they are based in widely used programming languages (as C++ or Java), so the adoption is easier for end users.In this thesis, our main objective has been to create a programming model for the Grid based on sequential programming and well-known imperative programming languages, able to exploit the implicit parallelism of applications and to speed them up by using the Grid resources concurrently. Moreover, because the Grid has a distributed, heterogeneous and dynamic nature and also because the number of resources that form a Grid can be very big, the probability that an error arises during an application's execution is big. Thus, another of our objectives has been to automatically deal with any type of errors which may arise during the execution of the application (application related or Grid related).GRID superscalar (GRIDSs), the main contribution of this thesis, is a programming model that achieves these mentioned objectives by providing a very small and simple interface and a runtime that is able to execute in parallel the code provided using the Grid. Our programming interface allows a user to program a Grid-unaware application with already known and popular imperative languages (such as C/C++, Java, Perl or Shell script) and in a sequential fashion, therefore giving an important step to assist end users in the adoption of the Grid technology.We have applied our knowledge from computer architecture and microprocessor design to the GRIDSs runtime. As it is done in a superscalar processor, the GRIDSs runtime system is able to perform a data dependence analysis between the tasks that form an application, and to apply renaming techniques in order to increase its parallelism. GRIDSs generates automatically from user's main code a graph describing the data dependencies in the application.We present real use cases of the programming model in the fields of computational chemistry and bioinformatics, which demonstrate that our objectives have been achieved.Finally, we have studied the application of several fault detection and treatment techniques: checkpointing, task retry and task replication. Our proposal is to provide an environment able to deal with all types of failures, transparently for the user whenever possible. The main advantage in implementing these mechanisms at the programming model level is that application-level knowledge can be exploited in order to dynamically create a fault tolerance strategy for each application, and avoiding to introduce overhead in error-free environments

    Autonomous vehicles that care for houseplants

    Thesis (S.M.)--Massachusetts Institute of Technology, Dept. of Mechanical Engineering, 2004.Includes bibliographical references (p. 91-95).Robotany is a system of autonomous robots that act on behalf of houseplants that rest on top of their chassis. Their duty is to do what plants would if they had the gift of mobility - namely to seek out sunlight or water when there are insufficient amounts of either at their current location. Despite the specialized application, the underlying framework of the robots is rather general and can be used in a variety of situations. The robots are designed to be easily modifiable for a given application. They are constructed using rapid-prototyping techniques that allow them to be built quickly and The software controlling Robotany utilizes a behavior-based approach, one that takes its cue from nature's solutions to problems facing any mobile being. It follows Braitenberg's model for seeking out light in an implicit manner. A new approach to obstacle avoidance is used, based on reactance to in situ sensor readings and a simplified internal map of the local environment. Robotany also incorporates a simple homeostatic system to regulate the quality of its behaviors and to determine when one behavior should take precedence over another. inexpensively. A novel design is utilized for the vehicle's suspension. This design is far simpler, cheaper, and more easily customized than traditional systems that perform the same task. The software controlling Robotany utilizes a behavior-based approach, one that takes its cue from nature's solutions to problems facing any mobile being. It follows Braitenberg's model for seeking out light in an implicit manner. A new approach to obstacle avoidance is used, based on reactance to in situ sensor readings and a simplified internal map of the local environment. Robotany also incorporates a simple homeostatic system to regulate the quality(cont.) of its behaviors and to determine when one behavior should take precedence over another. Experimental results presented in this thesis show that the robots are successful in finding sources of light while avoiding obstacles in their path.by Sara Elizabeth Cinnamon.S.M

    XSEDE: eXtreme Science and Engineering Discovery Environment Third Quarter 2012 Report

    The Extreme Science and Engineering Discovery Environment (XSEDE) is the most advanced, powerful, and robust collection of integrated digital resources and services in the world. It is an integrated cyberinfrastructure ecosystem with singular interfaces for allocations, support, and other key services that researchers can use to interactively share computing resources, data, and expertise.This a report of project activities and highlights from the third quarter of 2012.National Science Foundation, OCI-105357


