209 research outputs found

    Application of Supercomputer Technologies for Simulation of Socio-Economic Systems

    Full text link
    To date, an extensive experience has been accumulated in investigation of problems related to quality, assessment of management systems, modeling of economic system sustainability. The studies performed have created a basis for formation of a new research area — Economics of Quality. Its tools allow to use opportunities of model simulation for construction of the mathematical models adequately reflecting the role of quality in natural, technical, social regularities of functioning of the complex socioeconomic systems. Extensive application and development of models, and also system modeling with use of supercomputer technologies, on our deep belief, will bring the conducted researches of social and economic systems to essentially new level. Moreover, the current scientific research makes a significant contribution to model simulation of multi-agent social systems and that isn’t less important, it belongs to the priority areas in development of science and technology in our country. This article is devoted to the questions of supercomputer technologies application in public sciences, first of all, — regarding technical realization of the large-scale agent-focused models (AFM). The essence of this tool is that owing to increase in power of computers it became possible to describe the behavior of many separate fragments of a difficult system, as social and economic systems represent. The article also deals with the experience of foreign scientists and practicians in launching the AFM on supercomputers, and also the example of AFM developed in CEMI RAS, stages and methods of effective calculating kernel display of multi-agent system on architecture of a modern supercomputer will be analyzed. The experiments on the basis of model simulation on forecasting the population of St. Petersburg according to three scenarios as one of the major factors influencing the development of social and economic system and quality of life of the population are presented in the conclusion

    Porting Decision Tree Algorithms to Multicore using FastFlow

    Full text link
    The whole computer hardware industry embraced multicores. For these machines, the extreme optimisation of sequential algorithms is no longer sufficient to squeeze the real machine power, which can be only exploited via thread-level parallelism. Decision tree algorithms exhibit natural concurrency that makes them suitable to be parallelised. This paper presents an approach for easy-yet-efficient porting of an implementation of the C4.5 algorithm on multicores. The parallel porting requires minimal changes to the original sequential code, and it is able to exploit up to 7X speedup on an Intel dual-quad core machine.Comment: 18 pages + cove

    Parallel Optimisations of Perceived Quality Simulations

    Get PDF
    Processor architectures have changed significantly, with fast single core processors replaced by a diverse range of multicore processors. New architectures require code to be executed in parallel to realize these performance gains. This is straightforward for small applications, where analysis and refactoring is simple or existing tools can parallelise automatically, however the organic growth and large complicated data structures common in mature industrial applications can make parallelisation more difficult. One such application is studied, a mature Windows C++ application used for the visualisation of Perceived Quality (PQ). PQ simulations enable the visualisation of how manufacturing variations affect the look of the final product. The application is commonly used, however suffers from performance issues. Previous parallelisation attempts have failed. The issues associated with parallelising a mature industrial application are investigated. A methodology to investigate, analyse and evaluate the methods and tools available is produced. The shortfalls of these methods and tools are identified, and the methods used to overcome them explained. The parallel version of the software is evaluated for performance. Case studies centring on the significant use cases of the application help to understand the impact on the user. Automated compilers provided no parallelism, while the manual parallelisation using OpenMP required significant refactoring. A number of data dependency issues resulted in some serialised code. Performance scaled with the number of physical cores when applied to certain problems, however the unresolved bottlenecks resulted in mixed results for users. Use in verification did benefit, however those in early design stages did not. Without tools to aid analysis of complex data structures, parallelism could remain out of reach for industrial applications. Methods used here successfully, such as serialisation, and code isolation and serialisation, could be used effectively by such tools

    Mixing multi-core CPUs and GPUs for scientific simulation software

    Get PDF
    Recent technological and economic developments have led to widespread availability of multi-core CPUs and specialist accelerator processors such as graphical processing units (GPUs). The accelerated computational performance possible from these devices can be very high for some applications paradigms. Software languages and systems such as NVIDIA's CUDA and Khronos consortium's open compute language (OpenCL) support a number of individual parallel application programming paradigms. To scale up the performance of some complex systems simulations, a hybrid of multi-core CPUs for coarse-grained parallelism and very many core GPUs for data parallelism is necessary. We describe our use of hybrid applica- tions using threading approaches and multi-core CPUs to control independent GPU devices. We present speed-up data and discuss multi-threading software issues for the applications level programmer and o er some suggested areas for language development and integration between coarse-grained and ne-grained multi-thread systems. We discuss results from three common simulation algorithmic areas including: partial di erential equations; graph cluster metric calculations and random number generation. We report on programming experiences and selected performance for these algorithms on: single and multiple GPUs; multi-core CPUs; a CellBE; and using OpenCL. We discuss programmer usability issues and the outlook and trends in multi-core programming for scienti c applications developers

    Application of Supercomputer Technologies for Simulation of Socio-Economic Systems

    Get PDF
    To date, an extensive experience has been accumulated in investigation of problems related to quality, assessment of management systems, modeling of economic system sustainability.The studies performed have created a basis for formation of a new research area — Economics of Quality. Its tools allow to use opportunities of model simulation for construction of the mathematical models adequately reflecting the role of quality in natural, technical, social regularities of functioning of the complex socioeconomic systems.Extensive application and development of models, and also system modeling with use of supercomputer technologies, on our deep belief, will bring the conducted researches of social and economic systems to essentially new level. Moreover, the current scientific research makes a significant contribution to model simulation of multi-agent social systems and that isn’t less important, it belongs to the priority areas in development of science and technology in our country.This article is devoted to the questions of supercomputer technologies application in public sciences, first of all, — regarding technical realization of the large-scale agent-focused models (AFM). The essence of this tool is that owing to increase in power of computers it became possible to describe the behavior of many separate fragments of a difficult system, as social and economic systems represent. The article also deals with the experience of foreign scientists and practicians in launching the AFM on supercomputers, and also the example of AFM developed in CEMI RAS, stages and methods of effective calculating kernel display of multi-agent system on architecture of a modern supercomputer will be analyzed.The experiments on the basis of model simulation on forecasting the population of St. Petersburg according to three scenarios as one of the major factors influencing the development of social and economic system and quality of life of the population are presented in the conclusion

    A new parallelisation technique for heterogeneous CPUs

    Get PDF
    Parallelization has moved in recent years into the mainstream compilers, and the demand for parallelizing tools that can do a better job of automatic parallelization is higher than ever. During the last decade considerable attention has been focused on developing programming tools that support both explicit and implicit parallelism to keep up with the power of the new multiple core technology. Yet the success to develop automatic parallelising compilers has been limited mainly due to the complexity of the analytic process required to exploit available parallelism and manage other parallelisation measures such as data partitioning, alignment and synchronization. This dissertation investigates developing a programming tool that automatically parallelises large data structures on a heterogeneous architecture and whether a high-level programming language compiler can use this tool to exploit implicit parallelism and make use of the performance potential of the modern multicore technology. The work involved the development of a fully automatic parallelisation tool, called VSM, that completely hides the underlying details of general purpose heterogeneous architectures. The VSM implementation provides direct and simple access for users to parallelise array operations on the Cell’s accelerators without the need for any annotations or process directives. This work also involved the extension of the Glasgow Vector Pascal compiler to work with the VSM implementation as a one compiler system. The developed compiler system, which is called VP-Cell, takes a single source code and parallelises array expressions automatically. Several experiments were conducted using Vector Pascal benchmarks to show the validity of the VSM approach. The VP-Cell system achieved significant runtime performance on one accelerator as compared to the master processor’s performance and near-linear speedups over code runs on the Cell’s accelerators. Though VSM was mainly designed for developing parallelising compilers it also showed a considerable performance by running C code over the Cell’s accelerators

    Concurrent cell rate simulation of ATM telecommunications network.

    Get PDF
    PhDAbstract not availabl

    Opportunistic acceleration of array-centric Python computation in heterogeneous environments

    Get PDF
    Dynamic scripting languages, like Python, are growing in popularity and increasingly used by non-expert programmers. These languages provide high level abstractions such as safe memory management, dynamic type handling and array bounds checking. The reduction in boilerplate code enables the concise expression of computation compared to statically typed and compiled languages. This improves programmer productivity. Increasingly, scripting languages are used by domain experts to write numerically intensive code in a variety of domains (e.g. Economics, Zoology, Archaeology and Physics). These programs are often used not just for prototyping but also in deployment. However, such managed program execution comes with a significant performance penalty arising from the interpreter having to decode and dispatch based on dynamic type checking. Modern computer systems are increasingly equipped with accelerators such as GPUs. However, the massive speedups that can be achieved by GPU accelerators come at the cost of program complexity. Directly programming a GPU requires a deep understanding of the computational model of the underlying hardware architecture. While the complexity of such devices is abstracted by programming languages specialised for heterogeneous devices such as CUDA and OpenCL, these are dialects of the low-level C systems programming language used primarily by expert programmers. This thesis presents the design and implementation of ALPyNA, a loop parallelisation and GPU code generation framework. A novel staged parallelisation approach is used to aggressively parallelise each execution instance of a loop nest. Loop dependence relationships that cannot be inferred statically are deferred for runtime analysis. At runtime, these dependences are augmented with runtime information obtained by introspection and the loop nest is parallelised. Parallel GPU kernels are customised to the runtime dependence graph, JIT compiled and executed. A systematic analysis of the execution speed of loop nests is performed using 12 standard loop intensive benchmarks. The evaluation is performed on two CPU–GPU machines. One is a server grade machine while the other is a typical desktop. ALPyNA’s GPU kernels achieve orders of magnitude speedup over the baseline interpreter execution time (up to 16435x) and large speedups (up to 179.55x) over JIT compiled CPU code. The varied performance of JIT compiled GPU code motivates the need for a sophisticated cost model to select the device providing the best speedups at runtime for varying domain sizes. This thesis describes a novel lightweight analytical cost model to determine the fastest device to execute a loop nest at runtime. The ALPyNA Cost Model (ACM) adapts to runtime dependence analysis and is parameterised on the hardware characteristics of the underlying target CPU or GPU. The cost model also takes into account the relative rate at which the interpreter is able to supply the GPU with computational work. ACM is re-targetable to other accelerator devices and only requires minimal install time profiling
    • 

    corecore