8,450 research outputs found

    Survivable algorithms and redundancy management in NASA's distributed computing systems

    Get PDF
    The design of survivable algorithms requires a solid foundation for executing them. While hardware techniques for fault-tolerant computing are relatively well understood, fault-tolerant operating systems, as well as fault-tolerant applications (survivable algorithms), are, by contrast, little understood, and much more work in this field is required. We outline some of our work that contributes to the foundation of ultrareliable operating systems and fault-tolerant algorithm design. We introduce our consensus-based framework for fault-tolerant system design. This is followed by a description of a hierarchical partitioning method for efficient consensus. A scheduler for redundancy management is introduced, and application-specific fault tolerance is described. We give an overview of our hybrid algorithm technique, which is an alternative to the formal approach given

    Tiling Optimization For Nested Loops On Gpus

    Get PDF
    Optimizing nested loops has been considered as an important topic and widely studied in parallel programming. With the development of GPU architectures, the performance of these computations can be significantly boosted with the massively parallel hardware. General matrix-matrix multiplication is a typical example where executing such an algorithm on GPUs outperforms the performance obtained on other multicore CPUs. However, achieving ideal performance on GPUs usually requires a lot of human effort to manage the massively parallel computation resources. Therefore, the efficient implementation of optimizing nested loops on GPUs became a popular topic in recent years. We present our work based on the tiling strategy in this dissertation to address three kinds of popular problems. Different kinds of computations bring in different latency issues where dependencies in the computation may result in insufficient parallelism and the performance of computations without dependencies may be degraded due to intensive memory accesses. In this thesis, we tackle the challenges for each kind of problem and believe that other computations performed in nested loops can also benefit from the presented techniques. We improve a parallel approximation algorithm for the problem of scheduling jobs on parallel identical machines to minimize makespan with a high-dimensional tiling method. The algorithm is designed and optimized for solving this kind of problem efficiently on GPUs. Because the algorithm is based on a higher-dimensional dynamic programming approach, where dimensionality refers to the number of variables in the dynamic programming equation characterizing the problem, the existing implementation suffers from the pain of dimensionality and cannot fully utilize GPU resources. We design a novel data-partitioning technique to accelerate the higher-dimensional dynamic programming component of the algorithm. Both the load imbalance and exceeding memory capacity issues are addressed in our GPU solution. We present performance results to demonstrate how our proposed design improves the GPU utilization and makes it possible to solve large higher-dimensional dynamic programming problems within the limited GPU memory. Experimental results show that the GPU implementation achieves up to 25X speedup compared to the best existing OpenMP implementation. In addition, we focus on optimizing wavefront parallelism on GPUs. Wavefront parallelism is a well-known technique for exploiting the concurrency of applications that execute nested loops with uniform data dependencies. Recent research on such applications, which range from sequence alignment tools to partial differential equation solvers, has used GPUs to benefit from the massively parallel computing resources. Wavefront parallelism faces the load imbalance issue because the parallelism is passing along the diagonal. The tiling method has been introduced as a popular solution to address this issue. However, the use of hyperplane tiles increases the cost of synchronization and leads to poor data locality. In this paper, we present a highly optimized implementation of the wavefront parallelism technique that harnesses the GPU architecture. A balanced workload and maximum resource utilization are achieved with an extremely low synchronization overhead. We design the kernel configuration to significantly reduce the minimum number of synchronizations required and also introduce an inter-block lock to minimize the overhead of each synchronization. We evaluate the performance of our proposed technique for four different applications: Sequence Alignment, Edit Distance, Summed-Area Table, and 2DSOR. The performance results demonstrate that our method achieves speedups of up to six times compared to the previous best-known hyperplane tiling-based GPU implementation. Finally, we extend the hyperplane tiling to high order 2D stencil computations. Unlike wavefront parallelism that has dependence in the spatial dimension, dependence remains only across two adjacent time steps along the temporal dimension in stencil computations. Even if the no-dependence property significantly increases the parallelism obtained in the spatial dimensions, full parallelism may not be efficient on GPUs. Due to the limited cache capacity owned by each streaming multiprocessor, full parallelism can be obtained on global memory only, which has high latency to access. Therefore, the tiling technique can be applied to improve the memory efficiency by caching the small tiled blocks. Because the widely studied tiling methods, like overlapped tiling and split tiling, have considerable computation overhead caused by load imbalance or extra operations, we propose a time skewed tiling method, which is designed upon the GPU architecture. We work around the serialized computation issue and coordinate the intra-tile parallelism and inter-tile parallelism to minimize the load imbalance caused by pipelined processing. Moreover, we address the high-order stencil computations in our development, which has not been comprehensively studied. The proposed method achieves up to 3.5X performance improvement when the stencil computation is performed on a Moore neighborhood pattern

    Exploring instance generation for automated planning

    Get PDF
    Funding: This work is supported by EPSRC grant EP/P015638/1. Nguyen Dang is a Leverhulme Early Career Fellow.Many of the core disciplines of artificial intelligence have sets of standard benchmark problems well known and widely used by the community when developing new algorithms. Constraint programming and automated planning are examples of these areas, where the behaviour of a new algorithm is measured by how it performs on these instances. Typically the efficiency of each solving method varies not only between problems, but also between instances of the same problem. Therefore, having a diverse set of instances is crucial to be able to effectively evaluate a new solving method. Current methods for automatic generation of instances for Constraint Programming problems start with a declarative model and search for instances with some desired attributes, such as hardness or size. We first explore the difficulties of adapting this approach to generate instances starting from problem specifications written in PDDL, the de-facto standard language of the automated planning community. We then propose a new approach where the whole planning problem description is modelled using Essence, an abstract modelling language that allows expressing high-level structures without committing to a particular low level representation in PDDL.Publisher PD

    Anytime synthetic projection: Maximizing the probability of goal satisfaction

    Get PDF
    A projection algorithm is presented for incremental control rule synthesis. The algorithm synthesizes an initial set of goal achieving control rules using a combination of situation probability and estimated remaining work as a search heuristic. This set of control rules has a certain probability of satisfying the given goal. The probability is incrementally increased by synthesizing additional control rules to handle 'error' situations the execution system is likely to encounter when following the initial control rules. By using situation probabilities, the algorithm achieves a computationally effective balance between the limited robustness of triangle tables and the absolute robustness of universal plans

    The socio-economic impact of technological innovation. Models and analysis of the digital technologies for cultural and creative industries.

    Get PDF
    The research activity synthesized in this book starts from the consideration that there is a growing need to verify how public investment in innovation can guarantee the best value for money and maximise the impact on economy and society. The cultural heritage sector represents a strategic target for the R&D investment in Europe and it is strongly needed to have also here a set of tools able to assess the socio-economic impact of projects’ activities. With the aim of supporting the maximisation of the research outputs effectiveness and efficiency, thanks to the MAXICULTURE project (FP7-ICT-2011-9-601070) , our research team analysed projects’ outputs both in terms of innovation and improvement related to the state of the art of the ICTs for creative and cultural sector, and in terms of transferability of results to the wider society in general and to the supply-industry in particular. During the research activates we: • performed the analysis of the DigiCult domain through the literature review and analysis of EC FP7 Call 1, Call 3, Call 6, Call 9 and Europeana projects; • developed the assessment methodology for the DigiCult projects’; • gathered the feedback from experts and projects on the methodology through webinars and online questionnaires; • developed the Self-Assessment Toolkit (SAT); • performed the assessment of 19 projects in the DigiCult domain by using the data gathered through the Self-Assessment Toolkit. The analysis produced interesting results such as: • the design of a specific Hype Cycle for the DigiCult projects; • a better understanding about the innovation dynamics in the sector; • the information on how to improve the diffusion of the knowledge generated by DigiCult projects; • the information on how to improve the socio-economic impact of DigiCult projects

    Analog layout design automation: ILP-based analog routers

    Get PDF
    The shrinking design window and high parasitic sensitivity in the advanced technology have imposed special challenges on the analog and radio frequency (RF) integrated circuit design. In this thesis, we propose a new methodology to address such a deficiency based on integer linear programming (ILP) but without compromising the capability of handling any special constraints for the analog routing problems. Distinct from the conventional methods, our algorithm utilizes adaptive resolutions for various routing regions. For a more congested region, a routing grid with higher resolution is employed, whereas a lower-resolution grid is adopted to a less crowded routing region. Moreover, we strengthen its speciality in handling interconnect width control so as to route the electrical nets based on analog constraints while considering proper interconnect width to address the acute interconnect parasitics, mismatch minimization, and electromigration effects simultaneously. In addition, to tackle the performance degradation due to layout dependent effects (LDEs) and take advantage of optical proximity correction (OPC) for resolution enhancement of subwavelength lithography, in this thesis we have also proposed an innovative LDE-aware analog layout migration scheme, which is equipped with our special routing methodology. The LDE constraints are first identified with aid of a special sensitivity analysis and then satisfied during the layout migration process. Afterwards the electrical nets are routed by an extended OPC-inclusive ILP-based analog router to improve the final layout image fidelity while the routability and analog constraints are respected in the meantime. The experimental results demonstrate the effectiveness and efficiency of our proposed methods in terms of both circuit performance and image quality compared to the previous works

    Modeling nitrogen loadings from agricultural soils in southwest China with modified DNDC

    Get PDF
    Degradation of water quality has been widely observed in China, and loadings of nitrogen (N) and other nutrients from agricultural systems play a key role in the water contamination. Process‐based biogeochemical models have been applied to quantify nutrient loading from nonpoint sources at the watershed scale. However, this effort is often hindered by the fact that few existing biogeochemical models of nutrient cycling are able to simulate the two‐dimensional soil hydrology. To overcome this challenge, we launched a new attempt to incorporate two fundamental hydrologic features, the Soil Conservation Service curve and the Modified Universal Soil Loss Equation functions, into a biogeochemistry model, Denitrification‐Decomposition (DNDC). These two features have been widely utilized to quantify surface runoff and soil erosion in a suite of hydrologic models. We incorporated these features in the DNDC model to allow the biogeochemical and hydrologic processes to exchange data at a daily time step. By including the new features, DNDC gained the additional ability to simulate both horizontal and vertical movements of water and nutrients. The revised DNDC was tested against data sets observed in a small watershed dominated by farmlands in a mountainous area of southwest China. The modeled surface runoff flow, subsurface drainage flow, sediment yield, and N loading were in agreement with observations. To further observe the behaviors of the new model, we conducted a sensitivity test with varied climate, soil, and management conditions. The results indicated that precipitation was the most sensitive factor determining the rate of N loading from the tested site. A Monte Carlo test was conducted to quantify the potential uncertainty derived by variations in four selected input parameters. This study demonstrates that it is feasible and effective to use enhanced biogeochemical models such as DNDC for quantifying N loadings by incorporating basic hydrological features into the model framework
    • …
    corecore