214,324 research outputs found

    A Pattern Language for High-Performance Computing Resilience

    Full text link
    High-performance computing systems (HPC) provide powerful capabilities for modeling, simulation, and data analytics for a broad class of computational problems. They enable extreme performance of the order of quadrillion floating-point arithmetic calculations per second by aggregating the power of millions of compute, memory, networking and storage components. With the rapidly growing scale and complexity of HPC systems for achieving even greater performance, ensuring their reliable operation in the face of system degradations and failures is a critical challenge. System fault events often lead the scientific applications to produce incorrect results, or may even cause their untimely termination. The sheer number of components in modern extreme-scale HPC systems and the complex interactions and dependencies among the hardware and software components, the applications, and the physical environment makes the design of practical solutions that support fault resilience a complex undertaking. To manage this complexity, we developed a methodology for designing HPC resilience solutions using design patterns. We codified the well-known techniques for handling faults, errors and failures that have been devised, applied and improved upon over the past three decades in the form of design patterns. In this paper, we present a pattern language to enable a structured approach to the development of HPC resilience solutions. The pattern language reveals the relations among the resilience patterns and provides the means to explore alternative techniques for handling a specific fault model that may have different efficiency and complexity characteristics. Using the pattern language enables the design and implementation of comprehensive resilience solutions as a set of interconnected resilience patterns that can be instantiated across layers of the system stack.Comment: Proceedings of the 22nd European Conference on Pattern Languages of Program

    Co-management: A Synthesis of the Lessons Learned from the DFID Fisheries Management Science Programme

    Get PDF
    For the last eleven years, the UK Department for International Development (DfID) have been funding research projects to support the sustainable management of fisheries resources (both inland and marine) in developing countries through the Fisheries Management Science Programme (FMSP). A number of these projects that have been commissioned in this time have examined fisheries co-management. While these projects have, for the most part, been implemented separately, the FMSP has provided an opportunity to synthesise and draw together some of the information generated by these projects. We feel that there is value in distilling some of the important lessons and describing some of the useful tools and examples and making these available through a single, accessible resource. The wealth of information generated means that it is impossible to cover everything in detail but it is hoped that this synthesis will at least provide an overview of the co-management process together with some useful information relating to implementing co-management in a developing country context and links to the more detailed re-sources available, in particular on information systems for co-managed fisheries, participatory fish stock assessment (ParFish) and adaptive learning that have, in particular, been drawn upon for this synthesis. This synthesis is aimed at anyone interested in fisheries management in a developing country context

    The complexity of scaling up an mHealth intervention: the case of SMS for Life in Tanzania from a health systems integration perspective

    Get PDF
    BACKGROUND: SMS for Life was one of the earliest large-scale implementations of mHealth innovations worldwide. Its goal was to increase visibility to antimalarial stock-outs through the use of SMS technology. The objective of this case study was to show the multiple innovations that SMS for Life brought to the Tanzanian public health sector and to discuss the challenges of scaling up that led to its discontinuation from a health systems perspective. METHODS: A qualitative case-study approach was used. This included a literature review, a document review of 61 project documents, a timeline of key events and the collection and analysis of 28 interviews with key stakeholders involved in or affected by the SMS for Life programme. Data collection was informed by the health system building blocks. We then carried out a thematic analysis using the WHO mHealth Assessment and Planning for Scale (MAPS) Toolkit as a framework. This served to identify the key reasons for the discontinuation of the programme. RESULTS: SMS for Life was reliable at scale and raised awareness of stock-outs with real-time monitoring. However, it was discontinued in 2015 after 4 years of a national rollout. The main reasons identified for the discontinuation were the programme's failure to adapt to the continuous changes in Tanzania's health system, the focus on stock-outs rather than ensuring appropriate stock management, and that it was perceived as costly by policy-makers. Despite its discontinuation, SMS for Life, together with co-existing technologies, triggered the development of the capacity to accommodate and integrate future technologies in the health system. CONCLUSION: This study shows the importance of engaging appropriate stakeholders from the outset, understanding and designing system-responsive interventions appropriately when scaling up and ensuring value to a broad range of health system actors. These shortcomings are common among digital health solutions and need to be better addressed in future implementations

    Low-Cost Air Quality Monitoring Tools: From Research to Practice (A Workshop Summary).

    Get PDF
    In May 2017, a two-day workshop was held in Los Angeles (California, U.S.A.) to gather practitioners who work with low-cost sensors used to make air quality measurements. The community of practice included individuals from academia, industry, non-profit groups, community-based organizations, and regulatory agencies. The group gathered to share knowledge developed from a variety of pilot projects in hopes of advancing the collective knowledge about how best to use low-cost air quality sensors. Panel discussion topics included: (1) best practices for deployment and calibration of low-cost sensor systems, (2) data standardization efforts and database design, (3) advances in sensor calibration, data management, and data analysis and visualization, and (4) lessons learned from research/community partnerships to encourage purposeful use of sensors and create change/action. Panel discussions summarized knowledge advances and project successes while also highlighting the questions, unresolved issues, and technological limitations that still remain within the low-cost air quality sensor arena

    Large-scale Complex IT Systems

    Get PDF
    This paper explores the issues around the construction of large-scale complex systems which are built as 'systems of systems' and suggests that there are fundamental reasons, derived from the inherent complexity in these systems, why our current software engineering methods and techniques cannot be scaled up to cope with the engineering challenges of constructing such systems. It then goes on to propose a research and education agenda for software engineering that identifies the major challenges and issues in the development of large-scale complex, software-intensive systems. Central to this is the notion that we cannot separate software from the socio-technical environment in which it is used.Comment: 12 pages, 2 figure
    • …
    corecore