214,324 research outputs found
A Pattern Language for High-Performance Computing Resilience
High-performance computing systems (HPC) provide powerful capabilities for
modeling, simulation, and data analytics for a broad class of computational
problems. They enable extreme performance of the order of quadrillion
floating-point arithmetic calculations per second by aggregating the power of
millions of compute, memory, networking and storage components. With the
rapidly growing scale and complexity of HPC systems for achieving even greater
performance, ensuring their reliable operation in the face of system
degradations and failures is a critical challenge. System fault events often
lead the scientific applications to produce incorrect results, or may even
cause their untimely termination. The sheer number of components in modern
extreme-scale HPC systems and the complex interactions and dependencies among
the hardware and software components, the applications, and the physical
environment makes the design of practical solutions that support fault
resilience a complex undertaking. To manage this complexity, we developed a
methodology for designing HPC resilience solutions using design patterns. We
codified the well-known techniques for handling faults, errors and failures
that have been devised, applied and improved upon over the past three decades
in the form of design patterns. In this paper, we present a pattern language to
enable a structured approach to the development of HPC resilience solutions.
The pattern language reveals the relations among the resilience patterns and
provides the means to explore alternative techniques for handling a specific
fault model that may have different efficiency and complexity characteristics.
Using the pattern language enables the design and implementation of
comprehensive resilience solutions as a set of interconnected resilience
patterns that can be instantiated across layers of the system stack.Comment: Proceedings of the 22nd European Conference on Pattern Languages of
Program
Co-management: A Synthesis of the Lessons Learned from the DFID Fisheries Management Science Programme
For the last eleven years, the UK Department for International Development (DfID) have been funding research projects to support the sustainable management of fisheries resources (both inland and marine) in developing countries through the Fisheries Management Science Programme (FMSP). A number of these projects that have been commissioned in this time have examined fisheries co-management. While these projects have, for the most part, been implemented separately, the FMSP has provided an opportunity to synthesise and draw together some of the information generated by these projects. We feel that there is value in distilling some of the important lessons and describing some of the useful tools and examples and making these available through a single, accessible resource. The wealth of information generated means that it is impossible to cover everything in detail but it is hoped that this synthesis will at least provide an overview of the co-management process together with some useful information relating to implementing co-management in a developing country context and links to the more detailed re-sources available, in particular on information systems for co-managed fisheries, participatory fish stock assessment (ParFish) and adaptive learning that have, in particular, been drawn upon for this synthesis. This synthesis is aimed at anyone interested in fisheries management in a developing country context
The complexity of scaling up an mHealth intervention: the case of SMS for Life in Tanzania from a health systems integration perspective
BACKGROUND: SMS for Life was one of the earliest large-scale implementations of mHealth innovations worldwide. Its goal was to increase visibility to antimalarial stock-outs through the use of SMS technology. The objective of this case study was to show the multiple innovations that SMS for Life brought to the Tanzanian public health sector and to discuss the challenges of scaling up that led to its discontinuation from a health systems perspective. METHODS: A qualitative case-study approach was used. This included a literature review, a document review of 61 project documents, a timeline of key events and the collection and analysis of 28 interviews with key stakeholders involved in or affected by the SMS for Life programme. Data collection was informed by the health system building blocks. We then carried out a thematic analysis using the WHO mHealth Assessment and Planning for Scale (MAPS) Toolkit as a framework. This served to identify the key reasons for the discontinuation of the programme. RESULTS: SMS for Life was reliable at scale and raised awareness of stock-outs with real-time monitoring. However, it was discontinued in 2015 after 4 years of a national rollout. The main reasons identified for the discontinuation were the programme's failure to adapt to the continuous changes in Tanzania's health system, the focus on stock-outs rather than ensuring appropriate stock management, and that it was perceived as costly by policy-makers. Despite its discontinuation, SMS for Life, together with co-existing technologies, triggered the development of the capacity to accommodate and integrate future technologies in the health system. CONCLUSION: This study shows the importance of engaging appropriate stakeholders from the outset, understanding and designing system-responsive interventions appropriately when scaling up and ensuring value to a broad range of health system actors. These shortcomings are common among digital health solutions and need to be better addressed in future implementations
Low-Cost Air Quality Monitoring Tools: From Research to Practice (A Workshop Summary).
In May 2017, a two-day workshop was held in Los Angeles (California, U.S.A.) to gather practitioners who work with low-cost sensors used to make air quality measurements. The community of practice included individuals from academia, industry, non-profit groups, community-based organizations, and regulatory agencies. The group gathered to share knowledge developed from a variety of pilot projects in hopes of advancing the collective knowledge about how best to use low-cost air quality sensors. Panel discussion topics included: (1) best practices for deployment and calibration of low-cost sensor systems, (2) data standardization efforts and database design, (3) advances in sensor calibration, data management, and data analysis and visualization, and (4) lessons learned from research/community partnerships to encourage purposeful use of sensors and create change/action. Panel discussions summarized knowledge advances and project successes while also highlighting the questions, unresolved issues, and technological limitations that still remain within the low-cost air quality sensor arena
Large-scale Complex IT Systems
This paper explores the issues around the construction of large-scale complex
systems which are built as 'systems of systems' and suggests that there are
fundamental reasons, derived from the inherent complexity in these systems, why
our current software engineering methods and techniques cannot be scaled up to
cope with the engineering challenges of constructing such systems. It then goes
on to propose a research and education agenda for software engineering that
identifies the major challenges and issues in the development of large-scale
complex, software-intensive systems. Central to this is the notion that we
cannot separate software from the socio-technical environment in which it is
used.Comment: 12 pages, 2 figure
- …