133 research outputs found

    Software Approaches to Manage Resource Tradeoffs of Power and Energy Constrained Applications

    Get PDF
    Power and energy efficiency have become an increasingly important design metric for a wide spectrum of computing devices. Battery efficiency, which requires a mixture of energy and power efficiency, is exceedingly important especially since there have been no groundbreaking advances in battery capacity recently. The need for energy and power efficiency stretches from small embedded devices to portable computers to large scale data centers. The projected future of computing demand, referred to as exascale computing, demands that researchers find ways to perform exaFLOPs of computation at a power bound much lower than would be required by simply scaling today's standards. There is a large body of work on power and energy efficiency for a wide range of applications and at different levels of abstraction. However, there is a lack of work studying the nuances of different tradeoffs that arise when operating under a power/energy budget. Moreover, there is no work on constructing a generalized model of applications running under power/energy constraints, which allows the designer to optimize their resource consumption, be it power, energy, time, bandwidth, or space. There is need for an efficient model that can provide bounds on the optimality of an application's resource consumption, becoming a basis against which online resource management heuristics can be measured. In this thesis, we tackle the problem of managing resource tradeoffs of power/energy constrained applications. We begin by studying the nuances of power/energy tradeoffs with the response time and throughput of stream processing applications. We then study the power performance tradeoff of batch processing applications to identify a power configuration that maximizes performance under a power bound. Next, we study the tradeoff of power/energy with network bandwidth and precision. Finally, we study how to combine tradeoffs into a generalized model of applications running under resource constraints. The work in this thesis presents detailed studies of the power/energy tradeoff with response time, throughput, performance, network bandwidth, and precision of stream and batch processing applications. To that end, we present an adaptive algorithm that manages stream processing tradeoffs of response time and throughput at the CPU level. At the task-level, we present an online heuristic that adaptively distributes bounded power in a cluster to improve performance, as well as an offline approach to optimally bound performance. We demonstrate how power can be used to reduce bandwidth bottlenecks and extend our offline approach to model bandwidth tradeoffs. Moreover, we present a tool that identifies parts of a program that can be downgraded in precision with minimal impact on accuracy, and maximal impact on energy consumption. Finally, we combine all the above tradeoffs into a flexible model that is efficient to solve and allows for bounding and/or optimizing the consumption of different resources

    Decentralising resource management in operating systems

    Get PDF
    This dissertation explores operating system mechanisms to allow resource-aware applications to be involved in the process of managing resources under the premise that these applications (1) potentially have some (implicit) notion of their future resource demands and (2) can adapt their resource demands. The general idea is to provide feedback to resource-aware applications so that they can proactively participate in the management of resources. This approach has the benefit that resource management policies can be removed from central entities and the operating system has only to provide mechanism. Furthermore, in contrast to centralised approaches, application specific features can be more easily exploited. To achieve this aim, I propose to deploy a microeconomic theory, namely congestion or shadow pricing, which has recently received attention for managing congestion in communication networks. Applications are charged based on the potential "damage" they cause to other consumers by using resources. Consumers interpret these congestion charges as feedback signals which they use to adjust their resource consumption. It can be shown theoretically that such a system with consumers merely acting in their own self-interest will converge to a social optimum. This dissertation focuses on the operating system mechanisms required to decentralise resource management this way. In particular it identifies four mechanisms: pricing & charging, credit accounting, resource usage accounting, and multiplexing. While the latter two are mechanisms generally required for the accurate management of resources, pricing & charging and credit accounting present novel mechanisms. It is argued that congestion prices are the correct economic model in this context and provide appropriate feedback to applications. The credit accounting mechanism is necessary to ensure the overall stability of the system by assigning value to credits

    A Policy-Based Resource Brokering Environment for Computational Grids

    Get PDF
    With the advances in networking infrastructure in general, and the Internet in particular, we can build grid environments that allow users to utilize a diverse set of distributed and heterogeneous resources. Since the focus of such environments is the efficient usage of the underlying resources, a critical component is the resource brokering environment that mediates the discovery, access and usage of these resources. With the consumer\u27s constraints, provider\u27s rules, distributed heterogeneous resources and the large number of scheduling choices, the resource brokering environment needs to decide where to place the user\u27s jobs and when to start their execution in a way that yields the best performance for the user and the best utilization for the resource provider. As brokering and scheduling are very complicated tasks, most current resource brokering environments are either specific to a particular grid environment or have limited features. This makes them unsuitable for large applications with heterogeneous requirements. In addition, most of these resource brokering environments lack flexibility. Policies at the resource-, application-, and system-levels cannot be specified and enforced to provide commitment to the guaranteed level of allocation that can help in attracting grid users and contribute to establishing credibility for existing grid environments. In this thesis, we propose and prototype a flexible and extensible Policy-based Resource Brokering Environment (PROBE) that can be utilized by various grid systems. In designing PROBE, we follow a policy-based approach that provides PROBE with the intelligence to not only match the user\u27s request with the right set of resources, but also to assure the guaranteed level of the allocation. PROBE looks at the task allocation as a Service Level Agreement (SLA) that needs to be enforced between the resource provider and the resource consumer. The policy-based framework is useful in a typical grid environment where resources, most of the time, are not dedicated. In implementing PROBE, we have utilized a layered architecture and façade design patterns. These along with the well-defined API, make the framework independent of any architecture and allow for the incorporation of different types of scheduling algorithms, applications and platform adaptors as the underlying environment requires. We have utilized XML as a base for all the specification needs. This provides a flexible mechanism to specify the heterogeneous resources and user\u27s requests along with their allocation constraints. We have developed XML-based specifications by which high-level internal structures of resources, jobs and policies can be specified. This provides interoperability in which a grid system can utilize PROBE to discover and use resources controlled by other grid systems. We have implemented a prototype of PROBE to demonstrate its feasibility. We also describe a test bed environment and the evaluation experiments that we have conducted to demonstrate the usefulness and effectiveness of our approach

    Automating telemetry- and trace-based analytics on large-scale distributed systems

    Get PDF
    Large-scale distributed systems---such as supercomputers, cloud computing platforms, and distributed applications---routinely suffer from slowdowns and crashes due to software and hardware problems, resulting in reduced efficiency and wasted resources. These large-scale systems typically deploy monitoring or tracing systems that gather a variety of statistics about the state of the hardware and the software. State-of-the-art methods either analyze this data manually, or design unique automated methods for each specific problem. This thesis builds on the vision that generalized automated analytics methods on the data sets collected from these complex computing systems provide critical information about the causes of the problems, and this analysis can then enable proactive management to improve performance, resilience, efficiency, or security significantly beyond current limits. This thesis seeks to design scalable, automated analytics methods and frameworks for large-scale distributed systems that minimize dependency on expert knowledge, automate parts of the solution process, and help make systems more resilient. In addition to analyzing data that is already collected from systems, our frameworks also identify what to collect from where in the system, such that the collected data would be concise and useful for manual analytics. We focus on two data sources for conducting analytics: numeric telemetry data, which is typically collected from operating system or hardware counters, and end-to-end traces collected from distributed applications. This thesis makes the following contributions in large-scale distributed systems: (1) Designing a framework for accurately diagnosing previously encountered performance variations, (2) designing a technique for detecting (unwanted) applications running on the systems, (3) developing a suite for reproducing performance variations that can be used to systematically develop analytics methods, (4) designing a method to explain predictions of black-box machine learning frameworks, and (5) constructing an end-to-end tracing framework that can dynamically adjust instrumentation for effective diagnosis of performance problems.2021-09-28T00:00:00

    Improving Performance of Feedback-Based Real-Time Networks using Model Checking and Reinforcement Learning

    Get PDF
    Traditionally, automatic control techniques arose due to need for automation in mechanical systems. These techniques rely on robust mathematical modelling of physical systems with the goal to drive their behaviour to desired set-points. Decades of research have successfully automated, optimized, and ensured safety of a wide variety of mechanical systems. Recent advancement in digital technology has made computers pervasive into every facet of life. As such, there have been many recent attempts to incorporate control techniques into digital technology. This thesis investigates the intersection and co-application of control theory and computer science to evaluate and improve performance of time-critical systems. The thesis applies two different research areas, namely, model checking and reinforcement learning to design and evaluate two unique real-time networks in conjunction with control technologies. The first is a camera surveillance system with the goal of constrained resource allocation to self-adaptive cameras. The second is a dual-delay real-time communication network with the goal of safe packet routing with minimal delays.The camera surveillance system consists of self-adaptive cameras and a centralized manager, in which the cameras capture a stream of images and transmit them to a central manager over a shared constrained communication channel. The event-based manager allocates fractions of the shared bandwidth to all cameras in the network. The thesis provides guarantees on the behaviour of the camera surveillance network through model checking. Disturbances that arise during image capture due to variations in capture scenes are modelled using probabilistic and non-deterministic Markov Decision Processes (MDPs). The different properties of the camera network such as the number of frame drops and bandwidth reallocations are evaluated through formal verification.The second part of the thesis explores packet routing for real-time networks constructed with nodes and directed edges. Each edge in the network consists of two different delays, a worst-case delay that captures high load characteristics, and a typical delay that captures the current network load. Each node in the network takes safe routing decisions by considering delays already encountered and the amount of remaining time. The thesis applies reinforcement learning to route packets through the network with minimal delays while ensuring the total path delay from source to destination does not exceed the pre-determined deadline of the packet. The reinforcement learning algorithm explores new edges to find optimal routing paths while ensuring safety through a simple pre-processing algorithm. The thesis shows that it is possible to apply powerful reinforcement learning techniques to time-critical systems with expert knowledge about the system

    Enabling technology for non-rigid registration during image-guided neurosurgery

    Get PDF
    In the context of image processing, non-rigid registration is an operation that attempts to align two or more images using spatially varying transformations. Non-rigid registration finds application in medical image processing to account for the deformations in the soft tissues of the imaged organs. During image-guided neurosurgery, non-rigid registration has the potential to assist in locating critical brain structures and improve identification of the tumor boundary. Robust non-rigid registration methods combine estimation of tissue displacement based on image intensities with the spatial regularization using biomechanical models of brain deformation. In practice, the use of such registration methods during neurosurgery is complicated by a number of issues: construction of the biomechanical model used in the registration from the image data, high computational demands of the application, and difficulties in assessing the registration results. In this dissertation we develop methods and tools that address some of these challenges, and provide components essential for the intra-operative application of a previously validated physics-based non-rigid registration method.;First, we study the problem of image-to-mesh conversion, which is required for constructing biomechanical model of the brain used during registration. We develop and analyze a number of methods suitable for solving this problem, and evaluate them using application-specific quantitative metrics. Second, we develop a high-performance implementation of the non-rigid registration algorithm and study the use of geographically distributed Grid resources for speculative registration computations. Using the high-performance implementation running on the remote computing resources we are able to deliver the results of registration within the time constraints of the neurosurgery. Finally, we present a method that estimates local alignment error between the two images of the same subject. We assess the utility of this method using multiple sources of ground truth to evaluate its potential to support speculative computations of non-rigid registration

    Advances in Grid Computing

    Get PDF
    This book approaches the grid computing with a perspective on the latest achievements in the field, providing an insight into the current research trends and advances, and presenting a large range of innovative research papers. The topics covered in this book include resource and data management, grid architectures and development, and grid-enabled applications. New ideas employing heuristic methods from swarm intelligence or genetic algorithm and quantum encryption are considered in order to explain two main aspects of grid computing: resource management and data management. The book addresses also some aspects of grid computing that regard architecture and development, and includes a diverse range of applications for grid computing, including possible human grid computing system, simulation of the fusion reaction, ubiquitous healthcare service provisioning and complex water systems

    Supercomputing futures : the next sharing paradigm for HPC resources : economic model, market analysis and consequences for the Grid

    Get PDF
    À la croisée des chemins du génie informatique, de la finance et de l'économétrie, cette thèse se veut fondamentalement un exercice en ingénierie économique dont l' objectif est de contribuer un système novateur, durable et adaptatif pour le partage de resources de calcul haute-performance. Empruntant à la finance fondamentale et à l'analyse technique, le modèle proposé construit des ratios et des indices de marché à partir de statistiques transactionnelles. Cette approche, encourageant les comportements stratégiques, pave la voie à une métaphore de partage plus efficace pour la Grid, où l'échange de ressources se voit maintenant pondéré. Le concept de monnaie de Grid, un instrument beaucoup plus liquide et utilisable que le troc de resources comme telles est proposé: les Grid Credits. Bien que les indices proposés ne doivent pas être considérés comme des indicateurs absolus et contraignants, ils permettent néanmoins aux négociants de se faire une idée de la valeur au marché des différentes resources avant de se positionner. Semblable sur de multiples facettes aux bourses de commodités, le Grid Exchange, tel que présenté, permet l'échange de resources via un mécanisme de double-encan. Néanmoins, comme les resources de super-calculateurs n'ont rien de standardisé, la plate-forme permet l'échange d'ensemble de commodités, appelés requirement sets, pour les clients, et component sets, pour les fournisseurs. Formellement, ce modèle économique n'est qu'une autre instance de la théorie des jeux non-coopératifs, qui atteint éventuellement ses points d'équilibre. Suivant les règles du "libre-marché", les utilisateurs sont encouragés à spéculer, achetant, ou vendant, à leur bon vouloir, l'utilisation des différentes composantes de superordinateurs. En fin de compte, ce nouveau paradigme de partage de resources pour la Grid dresse la table à une nouvelle économie et une foule de possibilités. Investissement et positionnement stratégique, courtiers, spéculateurs et même la couverture de risque technologique sont autant d'avenues qui s'ouvrent à l'horizon de la recherche dans le domaine

    Proceedings of the 5th bwHPC Symposium

    Get PDF
    In modern science, the demand for more powerful and integrated research infrastructures is growing constantly to address computational challenges in data analysis, modeling and simulation. The bwHPC initiative, founded by the Ministry of Science, Research and the Arts and the universities in Baden-Württemberg, is a state-wide federated approach aimed at assisting scientists with mastering these challenges. At the 5th bwHPC Symposium in September 2018, scientific users, technical operators and government representatives came together for two days at the University of Freiburg. The symposium provided an opportunity to present scientific results that were obtained with the help of bwHPC resources. Additionally, the symposium served as a platform for discussing and exchanging ideas concerning the use of these large scientific infrastructures as well as its further development

    A flight software development and simulation framework for advanced space systems

    Get PDF
    Thesis (Ph. D.)--Massachusetts Institute of Technology, Dept. of Aeronautics and Astronautics, 2002.Includes bibliographical references (p. 293-302).Distributed terrestrial computer systems employ middleware software to provide communications abstractions and reduce software interface complexity. Embedded applications are adopting the same approaches, but must make provisions to ensure that hard real-time temporal performance can be maintained. This thesis presents the development and validation of a middleware system tailored to spacecraft flight software development. Our middleware runs on the Generalized Flight Operations Processing Simulator (GFLOPS) and is called the GFLOPS Rapid Real-time Development Environment (GRRDE). GRRDE provides publish-subscribe communication services between software components. These services help to reduce the complexity of managing software interfaces. The hard real-time performance of these services has been verified with General Timed Automata modelling and extensive run-time testing. Several example applications illustrate the use of GRRDE to support advanced flight software development. Two technology-focused studies examine automatic code generation and autonomous fault protection within the GRRDE framework. A complex simulation of the TechSat 21 distributed spacebased radar mission highlights the utility of the approach for large-scale applications.by John Patrick Enright.Ph.D
    corecore